c21 commented on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-826251579


   btw if we worry about if too many sort merge join converted to shuffled hash 
join if enabling shuffled hash join by default. Please note we will only enable 
shuffled hash join if the estimated/run-time size of one side is less than 
`"spark.sql.autoBroadcastJoinThreshold" * "spark.sql.shuffle.partitions"`, and 
this side is 3x smaller than the other side 
(https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L258-L263).
 This is somewhat restricted rule, and from out side in practice, enabling 
shuffled hash join by default only incurs 25% of sort merge join getting 
converted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to