tarun11Mavani opened a new pull request, #18760:
URL: https://github.com/apache/pinot/pull/18760
## Summary
Extends `FUNNEL_COUNT` to accept multiple columns in `CORRELATE_BY(col1,
col2, ...)`,
enabling funnel analysis that tracks users through steps within a composite
key
(e.g., per user per device category), not just a single dimension.
### Design
The single-key aggregation path is preserved as a zero-overhead fast path —
structurally
identical to the original single-column implementation — so existing queries
see no
regression. Multi-key support is added as a separate code path selected once
per block.
- **`AggregationStrategy`**: Split into two abstract methods (`addSingleKey`
/ `addMultiKey`)
with separate aggregation loops for single-key and multi-key, eliminating
per-row branching
on the dominant single-key path.
- **`DictIdsWrapper`**: Added composite-key mapping for multi-column
CORRELATE_BY. Uses
stride-based arithmetic when the product of dictionary sizes fits in
`int`, falling back
to a `HashMap<IntArrayList, Integer>` for large key spaces. Also adds
`toCompositeString`
for length-prefix encoded composite string keys used during result
extraction.
- **`SortedAggregationResult`**: Updated to handle multi-key by tracking
secondary keys via
a `HashMap` within each primary-key group (data is sorted on the primary
column only).
- **`BitmapAggregationStrategy`**, **`SortedAggregationStrategy`**,
**`ThetaSketchAggregationStrategy`**: Implement both `addSingleKey` and
`addMultiKey`.
- **`SetResultExtractionStrategy`**, **`BitmapResultExtractionStrategy`**:
Updated to
reverse-map composite IDs back to per-column dictionary values during
result extraction.
- **`FunnelCountSortedAggregationFunction`**: Propagates multi-dictionary
context through
the sorted aggregation result extraction pipeline.
### Example Query
```sql
SELECT FUNNEL_COUNT(
STEPS(step1_col, step2_col, step3_col),
CORRELATE_BY(user_id, device_category),
SETTINGS('theta_sketch')
) FROM myTable
```
### Test Plan
- Existing single-key funnel integration tests pass unchanged
- New multi-key integration tests: testMultiKeyOverall, testMultiKeyGroupBy,
testMultiKeyWithFilter, testMultiKeyEmptyResult
- All strategies tested: BITMAP, SORTED, THETA_SKETCH, SET
- JMH benchmarks verify zero regression on single-key path
- Multi-key path benchmarked for throughput baseline
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]