[I] Optimising StarTree metric configuration [pinot]

via GitHub Thu, 07 Dec 2023 06:33:49 -0800


davecromberge opened a new issue, #12111:
URL: https://github.com/apache/pinot/issues/12111


   Problem
   ---------
   
   Some aggregation functions are closely related and can produce results from 
the same underlying metric.  In the StarTree index, the metric data is 
replicated for each function name pair.
   
   For example:
   
   ```
   {
     "dimensionsSplitOrder": [
        "team"
     ],
     "functionColumnPairs": [
         "DISTINCT_COUNT_CPC_SKETCH__players_cpc",
         "DISTINCT_COUNT_RAW_CPC_SKETCH__players_cpc"
      ],
   }
   ```
   
   If the "players" metric was an array of bytes, this would be replicated in 
the StarTree for each aggregation function above.  This increases storage and 
the resources used to construct and merge segments.
   
   Proposal
   ---------
   
   I would like to propose using the value aggregator name to remove redundant 
metric computation, and de-duplicate these to store the metric once.  This 
would impact the StarTree construction logic and the query evaluation logic.
   
   It would be nice to have feedback on this idea before I explore it further.
   
   /cc @Jackie-Jiang @snleee 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Optimising StarTree metric configuration [pinot]

Reply via email to