Dandandan opened a new pull request, #20964:
URL: https://github.com/apache/datafusion/pull/20964

   ## Summary
   
   - Introduces `BatchedVec<T>`, a Vec-like container that stores elements in 
fixed-size batches (default 8192)
   - Updates `PrimitiveGroupsAccumulator` to use `BatchedVec` as proof of 
concept
   - Flat `usize` group index is decomposed into `(batch, offset)` internally 
via division/modulo
   
   Key operations on `BatchedVec`:
   - `ensure_capacity(n, default)` — grow lazily on demand
   - `get_unchecked_mut(index)` — O(1) indexed access via decomposition
   - `take_batch(idx)` — **O(1)** emission of one batch via `mem::take`
   - `take_all()` / `take_first(n)` — for existing `EmitTo::All` / 
`EmitTo::First(n)`
   
   This enables O(1) per-batch emission from hash aggregation without 
materializing all groups into one large batch, and without the O(n²/batch_size) 
shifting cost of repeated `EmitTo::First(batch_size)`.
   
   ## Test plan
   - [x] All 79 aggregate unit tests pass
   - [x] All 23 functions-aggregate-common tests pass (including new BatchedVec 
unit tests)
   - [ ] Extend to other accumulators (avg, count, variance, etc.)
   - [ ] Wire up `EmittingBatches` state in `row_hash.rs` to use `take_batch`
   - [ ] Extend to `GroupValues` implementations
   
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to