[
https://issues.apache.org/jira/browse/SPARK-37753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18021467#comment-18021467
]
Aparna Garg commented on SPARK-37753:
-------------------------------------
User 'Last-remote11' has created a pull request for this issue:
https://github.com/apache/spark/pull/52388
> Fine tune logic to demote Broadcast hash join in DynamicJoinSelection
> ----------------------------------------------------------------------
>
> Key: SPARK-37753
> URL: https://issues.apache.org/jira/browse/SPARK-37753
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.8
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Priority: Major
> Fix For: 3.3.0
>
>
> In the current implementation of {{DynamicJoinSelection}} the logic checks if
> one side of the join has high ratio of empty partitions and adds a
> NO_BROADCAST hint on that side since a shuffle join can short-circuit the
> local joins where one side is empty.
> This logic is doesn't make sense for all join type. For example, a Left
> Outer Join cannot short circuit if RHS is empty so we should not inhibit BHJ.
> On the other hand a LOJ executed as a shuffle join where the LHS has many
> empty can short circuit the local join so we should inhibit the BHJ because
> BHJ will use {{OptimizeShuffleWithLocalRead}} which will re-assemble LHS
> partitions as the were before the shuffle and thus may not have many empty
> ones any more.
> This supersedes SPARK-37193
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]