alamb commented on PR #6155: URL: https://github.com/apache/arrow-rs/pull/6155#issuecomment-2278377857
> I think this is an interesting angle but I do wonder if there is something fishy in what DataFusion is doing here. I would say there is a lot of room for improvement. What is happening is that for high cardinality aggregates, the output of the hash aggregate operation is currently one giant contiguous RecordBatch which is then sliced There is more detail here https://github.com/apache/datafusion/issues/9562 (and @JasonLi-cn was looking into improving it) however it is tricky as doing so would imply the intermediate state of the group keys and the hash table would need to be chunked. This isn't impossible, just non trivial -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
