alamb commented on issue #11680:
URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2260206257

   Thank you @jayzhan211  -- that is some interesting results. 
   
   I think it makes sense that reusing the hash values is helpful mostly for 
high cardinality aggregates as in that case the number of rows that need to be 
repartitioned /rehashed is high.
   
   > Alternative idea for improvement is, if we can combine partial group + 
repartition + final group in one operation. We could probably avoid converting 
to row once again in final group.
   
   I think this is the approach taken by systems like DuckDB as I understand it 
and I think it is quite intregruing to consider
   
   The challenge of the approach would be the software engineering required to 
manage the complexity of the combined multi-stage operator. I am not sure the 
functioanlity would be easy to combine without some more refactoring 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to