praveenc7 opened a new pull request, #17411: URL: https://github.com/apache/pinot/pull/17411
## Summary ISSUE=https://github.com/apache/pinot/issues/17336 For dictionary-encoded columns, DISTINCT_COUNT_SMART_HLL currently uses a RoaringBitmap to deduplicate dictionary IDs before feeding values into HLL. While efficient for low cardinality, this approach becomes CPU-intensive for high cardinality (hundreds of thousands to millions of distinct values), where RoaringBitmap insertions dominate query execution time and negate the benefits of HLL. <img width="1560" height="785" alt="Screenshot 2025-12-16 at 3 59 43 PM" src="https://github.com/user-attachments/assets/2e4ea2b2-a594-4a3b-9b16-5af726b8d953" /> ## Proposal Introduce a cardinality-aware execution path for DISTINCT_COUNT_HLL: - Low cardinality → continue using RoaringBitmap (exact deduplication, memory-efficient) - High cardinality → bypass RoaringBitmap and update HLL directly ### Observed improvements - Reduces server-side CPU time by ~4x - 10× for high-cardinality queries (observed improvements from ~8s → ~700ms in prod benchmarks). ## Testing Done Added JMH benchmark covering: This JMH benchmark isolates server-side aggregation cost for the DistinctCountHLLAggregationFunction under controlled parameters: Each variation was run for 10 minutes recordCount: {100K, 500K, 1M, 5M, 10M, 25M} cardinalityRatioPercent: {1, 10, 30, 50, 80, 100} → Creates a record with configured cardinality useRoaringBitMap/HLL -> Controls on to run the test with useRoaringBitMap or HLL DictIds are pre-generated so benchmark timing includes only aggregation, not data generation. Sample plots : <img width="628" height="391" alt="Screenshot 2025-12-16 at 3 58 39 PM" src="https://github.com/user-attachments/assets/fcb37396-1f21-4f71-8fef-fa12f0dad255" /> Flame graph after optimization : Aggregate doesn't dominate CPU <img width="1537" height="836" alt="Screenshot 2025-12-16 at 4 01 42 PM" src="https://github.com/user-attachments/assets/8edcc39c-2539-479b-a180-f16b0e2b77f5" /> ## Benchmark Results (Average Latency, ms/op) ### Record Count = 100,000 | Cardinality | RoaringBitmap | Direct HLL | |---:|---:|---:| | 1,000 | 0.6 | 0.80 | | 10,000 | 0.71 | 0.87 | | 30,000 | 0.89 | 0.90 | | 50,000 | 1.05 | 0.96 | | 80,000 | 1.79 | 1.00 | | 100,000 | 1.91 | 1.05 | ### Record Count = 500,000 | Cardinality | RoaringBitmap | Direct HLL | |---:|---:|---:| | 5,000 | 1.45 | 2.85 | | 50,000 | 2.36 | 2.92 | | 150,000 | 5.53 | 3.16 | | 250,000 | 7.26 | 3.18 | | 400,000 | 9.59 | 3.18 | | 500,000 | 10.69 | 3.17 | ### Record Count = 1,000,000 | Cardinality | RoaringBitmap | Direct HLL | |---:|---:|---:| | 10,000 | 2.53 | 5.36 | | 100,000 | 6.69 | 5.44 | | 300,000 | 13.12 | 5.80 | | 500,000 | 15.92 | 5.78 | | 800,000 | 19.84 | 5.78 | | 1,000,000 | 22.11 | 5.71 | ### Record Count = 5,000,000 | Cardinality | RoaringBitmap | Direct HLL | |---:|---:|---:| | 50,000 | 11.51 | 25.12 | | 500,000 | 53.62 | 25.29 | | 1,500,000 | 75.60 | 26.13 | | 2,500,000 | 92.91 | 25.53 | | 4,000,000 | 113.21 | 25.24 | | 5,000,000 | 129.34 | 25.79 | ### Record Count = 10,000,000 | Cardinality | RoaringBitmap | Direct HLL | |---:|---:|---:| | 100,000 | 52.98 | 50.64 | | 1,000,000 | 117.68 | 50.61 | | 3,000,000 | 161.56 | 50.08 | | 5,000,000 | 206.71 | 51.14 | | 8,000,000 | 248.77 | 50.01 | | 10,000,000 | 278.78 | 50.37 | ### Record Count = 25,000,000 | Cardinality | RoaringBitmap | Direct HLL | |---:|---:|---:| | 250,000 | 199.06 | 125.82 | | 2,500,000 | 348.39 | 126.40 | | 7,500,000 | 466.14 | 124.74 | | 12,500,000 | 555.77 | 124.35 | | 20,000,000 | 679.43 | 124.99 | ### Recommendation: Based on the micro-benchmark results across record counts and cardinalities, 100K distinct values is a good default threshold to start with for switching away from the RoaringBitmap path. At this scale, RoaringBitmap remains efficient for low-cardinality cases, while higher cardinalities already show clear benefits from using direct HLL updates. This threshold provides a safe balance between preserving deduplication benefits for low cardinality and avoiding excessive bitmap maintenance cost for high-cardinality workloads -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
