Nachiket-Roy opened a new pull request, #19751: URL: https://github.com/apache/datafusion/pull/19751
## Which issue does this PR close? Related to #19679 Related to #19695 ## Rationale for this change It was observed that downstream operators (such as `SortExec`) can receive extremely large `RecordBatch` outputs from `AggregateExec`, violating the implicit `batch_size` convention used throughout the physical plan. An initial hypothesis was that normalizing batch sizes at the `AggregateExec` -> `SortExec` boundary would be sufficient. However, while implementing and testing this, it became clear that: - `GroupedHashAggregateStream` allocates group keys and accumulator state before memory reservation checks are enforced - By the time emission, spilling, or streaming merge logic runs, memory pressure may already be unavoidable - Even when output batches are sliced to` batch_size`, internal state can exceed memory limits - As a result, downstream operators still observe oversized memory usage This indicates that the issue is not a simple emission bug, but a design boundary between: - aggregation state growth - memory reservation timing - `batch_size` as an output convention vs. a hard invariant This PR aims to: - reduce memory pressure in specific transitions - make failure modes more predictable - provide a basis for further discussion - ## What changes are included in this PR? - Releases hash-aggregation memory before switching to streaming merge - Ensures memory reservations are explicitly reset when transitioning execution phases - Avoids retaining aggregation state after it is no longer needed - Tightens execution-state transitions to prevent double emission and late memory checks - Improves internal invariants around spill and merge behavior Importantly, this PR does not claim to fully enforce `batch_size` at all stages, nor does it prevent all oversized intermediate states. Instead, it reduces memory pressure and clarifies current limitations. ## Are these changes tested? Yes, but edge cases revealed design limitations rather than simple bugs The PR is submitted primarily for review and discussion rather than as a final fix. ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
