[GitHub] [arrow-datafusion] yjshen commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregation

via GitHub Mon, 26 Jun 2023 12:45:09 -0700


yjshen commented on issue #4973:
URL: 
https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1608128995


   High cardinality always comes with great memory consumption; how will the 
new design deal with memory limit? 
   - Adaptive sizing(perhaps?): How would the hash table header and states in 
each accumulator initialize and grow their sizes afterward? 
   - Spill mechanism: Do we spill based on partitions, based on fixed-sized 
state buffers, or something else?
   - Merging for final aggregate? Perhaps sort-based agg?
   
   It would be great if we could have a design doc to capture all features we 
need and the decisions we make for different characteristics of the queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregation

Reply via email to