Re: [PR] lazily compute for null count(seems help to high cardinality aggr) [arrow-rs]

via GitHub Fri, 09 Aug 2024 10:06:13 -0700


alamb commented on PR #6155:
URL: https://github.com/apache/arrow-rs/pull/6155#issuecomment-2278377857


   > I think this is an interesting angle but I do wonder if there is something 
fishy in what DataFusion is doing here. 
   
   I would say there is a lot of room for improvement. What is happening is 
that for high cardinality aggregates, the output of the hash aggregate 
operation is currently one giant contiguous RecordBatch which is then sliced
   
   There is more detail here https://github.com/apache/datafusion/issues/9562 
(and @JasonLi-cn  was looking into improving it) however it is tricky as doing 
so would imply the intermediate state of the group keys and the hash table 
would need to be chunked. This isn't impossible, just non trivial


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] lazily compute for null count(seems help to high cardinality aggr) [arrow-rs]

Reply via email to