davecromberge opened a new pull request, #17825:
URL: https://github.com/apache/pinot/pull/17825
Reduces serialization overhead from O(N-1) to O(1) by merging all
sketches for a key in a single operation using zero-copy Memory.wrap().
Changes:
- Add aggregateBatch() and supportsBatchAggregation() to ValueAggregator
interface
- Implement batch aggregation for DistinctCountThetaSketchAggregator,
IntegerTupleSketchAggregator, and DistinctCountCPCSketchAggregator
- Modify RollupReducer to use batch aggregation when supported
- Add reducerMaxBatchSize config to SegmentProcessorConfig (default 500)
- Add JMH benchmarks comparing pairwise vs batch aggregation
Batch aggregation optimizations:
- Zero-copy sketch wrapping via Sketch.wrap(Memory.wrap(bytes))
- Theta-based sorting for early termination (Theta/Tuple sketches)
- Single final serialization instead of N-1 intermediate serializations
Benchmark results (500 sketches per key):
- Theta sketch: 189ms → 2ms (91x faster)
- Tuple sketch: 133ms → 11ms (12x faster)
- CPC sketch: 16ms → 4ms (4x faster)
Tag:
`performance`
`release-notes`:
- New configuration options
- Signature changes to public methods/interfaces
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]