pepijnve opened a new pull request, #19287: URL: https://github.com/apache/datafusion/pull/19287
## Which issue does this PR close? - Closes #19286. ## Rationale for this change GroupedHashAggregateStream currently always reports that it can spill to the memory tracking subsystem even though this is dependent on the aggregation mode and the grouping order. The optimistic logic in `group_aggregate_batch` also does not correctly take these conditions into account ## What changes are included in this PR? - Correctly set `MemoryConsumer::can_spill` to reflect actual spilling behaviour - Align behaviour of `group_aggregate_batch` and `spill_previous_if_necessary` ## Are these changes tested? Added additional test case to demonstrate problem. This may not actually be necessary since other tests started failing as well. Still working on correcting those. ## Are there any user-facing changes? Yes, memory exhaustion may be reported much earlier in the query pipeline than is currently the case. In my local tests with a per consumer memory limit of 32MiB, grouped aggregation would consume 480MiB in practice. This was then reported by ExternalSortExec which choked on trying to reserve that much memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
