cloud-fan commented on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-823357554


   I'm a bit worried about this solution:
   1. sorting the stream-side at runtime may lead to slow query plan because 
the sort is not whole-stage-codegen-ed.
   2. unlike SMJ, the output ordering can't be preserved if we sort the 
stream-side at runtime.
   
   I think the eventual goal is to enable shuffle hash join by default, but I'm 
not sure adding the fallback can achieve this goal. Do you have some real data 
to show the benefits?
   
   Another idea is to pick shuffle hash join in AQE when we know the 
per-partition size after shuffle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to