2010YOUY01 commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4756783946
> > Yes, and actually I think it make few difference to performance after experiment before (some steps are improved like removing slice of record batch, removing Vec resizing, and some steps are regressed like we need to perform 2 index op, and finally near to no difference will be made), and just a better memory management approach. > > I disagree, since the memory management is directly tied to performance via the spilling mechanism when running with memory limits configured. See [#22526 (comment)](https://github.com/apache/datafusion/issues/22526#issuecomment-4568611822) The memory overaccounting issues caused by the current design of hash aggregation have a real performance impact in the downstream operators which are either: > > * forced to spill prematurely > * outright fail because they don't have enough memory, see [Accurately reserve memory in the build side of hash joins #22861](https://github.com/apache/datafusion/issues/22861) > > So I believe the new "blocked" approach will have significant performance improvements in production-like workloads. I agree we could proceed first without worrying too much about the benchmark numbers. This is like a tradeoff between micro-optimizations and algorithmic improvements to memory efficiency. I think completely giving up 10%-ish performance for architectural win is already a good idea. But realistically, I also believe it should be possible to avoid the regressions entirely with some low-level optimizations, but we'd better discuss those opportunities later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
