Dandandan opened a new pull request, #20964: URL: https://github.com/apache/datafusion/pull/20964
## Summary - Introduces `BatchedVec<T>`, a Vec-like container that stores elements in fixed-size batches (default 8192) - Updates `PrimitiveGroupsAccumulator` to use `BatchedVec` as proof of concept - Flat `usize` group index is decomposed into `(batch, offset)` internally via division/modulo Key operations on `BatchedVec`: - `ensure_capacity(n, default)` — grow lazily on demand - `get_unchecked_mut(index)` — O(1) indexed access via decomposition - `take_batch(idx)` — **O(1)** emission of one batch via `mem::take` - `take_all()` / `take_first(n)` — for existing `EmitTo::All` / `EmitTo::First(n)` This enables O(1) per-batch emission from hash aggregation without materializing all groups into one large batch, and without the O(n²/batch_size) shifting cost of repeated `EmitTo::First(batch_size)`. ## Test plan - [x] All 79 aggregate unit tests pass - [x] All 23 functions-aggregate-common tests pass (including new BatchedVec unit tests) - [ ] Extend to other accumulators (avg, count, variance, etc.) - [ ] Wire up `EmittingBatches` state in `row_hash.rs` to use `take_batch` - [ ] Extend to `GroupValues` implementations 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
