alamb commented on issue #11680: URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2260206257
Thank you @jayzhan211 -- that is some interesting results. I think it makes sense that reusing the hash values is helpful mostly for high cardinality aggregates as in that case the number of rows that need to be repartitioned /rehashed is high. > Alternative idea for improvement is, if we can combine partial group + repartition + final group in one operation. We could probably avoid converting to row once again in final group. I think this is the approach taken by systems like DuckDB as I understand it and I think it is quite intregruing to consider The challenge of the approach would be the software engineering required to manage the complexity of the combined multi-stage operator. I am not sure the functioanlity would be easy to combine without some more refactoring 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org