davecromberge opened a new issue, #12111:
URL: https://github.com/apache/pinot/issues/12111
Problem
---------
Some aggregation functions are closely related and can produce results from
the same underlying metric. In the StarTree index, the metric data is
replicated for each function name pair.
For example:
```
{
"dimensionsSplitOrder": [
"team"
],
"functionColumnPairs": [
"DISTINCT_COUNT_CPC_SKETCH__players_cpc",
"DISTINCT_COUNT_RAW_CPC_SKETCH__players_cpc"
],
}
```
If the "players" metric was an array of bytes, this would be replicated in
the StarTree for each aggregation function above. This increases storage and
the resources used to construct and merge segments.
Proposal
---------
I would like to propose using the value aggregator name to remove redundant
metric computation, and de-duplicate these to store the metric once. This
would impact the StarTree construction logic and the query evaluation logic.
It would be nice to have feedback on this idea before I explore it further.
/cc @Jackie-Jiang @snleee
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]