yjshen commented on issue #4973: URL: https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1608128995
High cardinality always comes with great memory consumption; how will the new design deal with memory limit? - Adaptive sizing(perhaps?): How would the hash table header and states in each accumulator initialize and grow their sizes afterward? - Spill mechanism: Do we spill based on partitions, based on fixed-sized state buffers, or something else? - Merging for final aggregate? Perhaps sort-based agg? It would be great if we could have a design doc to capture all features we need and the decisions we make for different characteristics of the queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
