davecromberge opened a new pull request, #17825:
URL: https://github.com/apache/pinot/pull/17825

    Reduces serialization overhead from O(N-1) to O(1) by merging all
    sketches for a key in a single operation using zero-copy Memory.wrap().
   
    Changes:
    - Add aggregateBatch() and supportsBatchAggregation() to ValueAggregator 
interface
    - Implement batch aggregation for DistinctCountThetaSketchAggregator, 
IntegerTupleSketchAggregator, and DistinctCountCPCSketchAggregator
    - Modify RollupReducer to use batch aggregation when supported
    - Add reducerMaxBatchSize config to SegmentProcessorConfig (default 500)
    - Add JMH benchmarks comparing pairwise vs batch aggregation
   
    Batch aggregation optimizations:
    - Zero-copy sketch wrapping via Sketch.wrap(Memory.wrap(bytes))
    - Theta-based sorting for early termination (Theta/Tuple sketches)
    - Single final serialization instead of N-1 intermediate serializations
   
    Benchmark results (500 sketches per key):
    - Theta sketch: 189ms → 2ms (91x faster)
    - Tuple sketch: 133ms → 11ms (12x faster)
    - CPC sketch: 16ms → 4ms (4x faster)
   
   Tag:
   `performance`
   
   `release-notes`:
   - New configuration options
   - Signature changes to public methods/interfaces
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to