alamb commented on PR #6155:
URL: https://github.com/apache/arrow-rs/pull/6155#issuecomment-2278377857

   > I think this is an interesting angle but I do wonder if there is something 
fishy in what DataFusion is doing here. 
   
   I would say there is a lot of room for improvement. What is happening is 
that for high cardinality aggregates, the output of the hash aggregate 
operation is currently one giant contiguous RecordBatch which is then sliced
   
   There is more detail here https://github.com/apache/datafusion/issues/9562 
(and @JasonLi-cn  was looking into improving it) however it is tricky as doing 
so would imply the intermediate state of the group keys and the hash table 
would need to be chunked. This isn't impossible, just non trivial


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to