2010YOUY01 commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4756783946

   > > Yes, and actually I think it make few difference to performance after 
experiment before (some steps are improved like removing slice of record batch, 
removing Vec resizing, and some steps are regressed like we need to perform 2 
index op, and finally near to no difference will be made), and just a better 
memory management approach.
   > 
   > I disagree, since the memory management is directly tied to performance 
via the spilling mechanism when running with memory limits configured. See 
[#22526 
(comment)](https://github.com/apache/datafusion/issues/22526#issuecomment-4568611822)
 The memory overaccounting issues caused by the current design of hash 
aggregation have a real performance impact in the downstream operators which 
are either:
   > 
   > * forced to spill prematurely
   > * outright fail because they don't have enough memory, see [Accurately 
reserve memory in the build side of hash joins 
#22861](https://github.com/apache/datafusion/issues/22861)
   > 
   > So I believe the new "blocked" approach will have significant performance 
improvements in production-like workloads.
   
   I agree we could proceed first without worrying too much about the benchmark 
numbers.
   
   This is like a tradeoff between micro-optimizations and algorithmic 
improvements to memory efficiency.
   
   I think completely giving up 10%-ish performance for architectural win is 
already a good idea. But realistically, I also believe it should be possible to 
avoid the regressions entirely with some low-level optimizations, but we'd 
better discuss those opportunities later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to