ariel-miculas commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4750775371
> Yes, and actually I think it make few difference to performance after experiment before (some steps are improved like removing slice of record batch, removing Vec resizing, and some steps are regressed like we need to perform 2 index op, and finally near to no difference will be made), and just a better memory management approach. I disagree, since the memory management is directly tied to performance via the spilling mechanism when running with memory limits configured. See https://github.com/apache/datafusion/issues/22526#issuecomment-4568611822 The memory overaccounting issues caused by the current design of hash aggregation have a real performance impact in the downstream operators which are either: * forced to spill prematurely * outright fail because they don't have enough memory, see https://github.com/apache/datafusion/issues/22861 So I believe the new "blocked" approach will have significant performance improvements in production-like workloads. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
