Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]

via GitHub Fri, 19 Jun 2026 03:38:52 -0700


ariel-miculas commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4750775371


   > Yes, and actually I think it make few difference to performance after 
experiment before (some steps are improved like removing slice of record batch, 
removing Vec resizing, and some steps are regressed like we need to perform 2 
index op, and finally near to no difference will be made), and just a better 
memory management approach.
   
   I disagree, since the memory management is directly tied to performance via 
the spilling mechanism when running with memory limits configured. See 
https://github.com/apache/datafusion/issues/22526#issuecomment-4568611822
   The memory overaccounting issues caused by the current design of hash 
aggregation have a real performance impact in the downstream operators which 
are either:
   * forced to spill prematurely
   * outright fail because they don't have enough memory, see 
https://github.com/apache/datafusion/issues/22861
   
   So I believe the new "blocked" approach will have significant performance 
improvements in production-like workloads.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]

Reply via email to