hhhizzz opened a new pull request, #23274:
URL: https://github.com/apache/datafusion/pull/23274
## Which issue does this PR close?
- Related to #23249.
- Close #23251
## Rationale for this change
This PR follows up on the short-term fix already merged for #23249.
The remaining issue is that grouped hash aggregate output can still do too
much work before producing the next output batch. For final aggregate output,
we can bound each terminal emit to approximately one output batch instead of
materializing all remaining groups at once.
This PR adds a `FirstBlock` emission mode and uses it for final hash
aggregate output. Partial aggregate output intentionally keeps the existing
main-like behavior: materialize the partial state once with `EmitTo::All`, then
slice the resulting `RecordBatch`. This avoids repeatedly asking aggregate
state implementations to emit partial state in bounded chunks, which was slower
in earlier prototypes.
## What changes are included in this PR?
This PR includes:
- Add `EmitTo::FirstBlock(usize)` to describe bounded/block-oriented group
emission.
- Teach grouped accumulators and group-value implementations how to handle
`FirstBlock`.
- Use bounded `FirstBlock` emission for final grouped hash aggregate output.
- Keep partial grouped hash aggregate output materialized-once-then-sliced.
- Add `OutputtingMaterialized` state for hash aggregate output so partial
output can retain the materialized batch while emitting slices.
- Track memory used by materialized partial output and propagate reservation
failures.
- Add tests for:
- final output emitting in output-sized chunks,
- partial output materializing before slicing,
- partial materialized output respecting memory reservation limits.
## Are these changes tested?
Unit tests:
```bash
cargo test -p datafusion-physical-plan aggregates::hash_aggregate::tests --
--nocapture
cargo test -p datafusion-physical-plan
partial_grouped_aggregate_materializes_before_slicing -- --nocapture
```
## Are there any user-facing changes?
No user-facing API or SQL behavior changes are intended. This only changes
internal grouped hash aggregate output behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]