[
https://issues.apache.org/jira/browse/SPARK-37753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-37753:
-----------------------------------
Labels: pull-request-available (was: )
> Fine tune logic to demote Broadcast hash join in DynamicJoinSelection
> ----------------------------------------------------------------------
>
> Key: SPARK-37753
> URL: https://issues.apache.org/jira/browse/SPARK-37753
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.8
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.3.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In the current implementation of {{DynamicJoinSelection}} the logic checks if
> one side of the join has high ratio of empty partitions and adds a
> NO_BROADCAST hint on that side since a shuffle join can short-circuit the
> local joins where one side is empty.
> This logic is doesn't make sense for all join type. For example, a Left
> Outer Join cannot short circuit if RHS is empty so we should not inhibit BHJ.
> On the other hand a LOJ executed as a shuffle join where the LHS has many
> empty can short circuit the local join so we should inhibit the BHJ because
> BHJ will use {{OptimizeShuffleWithLocalRead}} which will re-assemble LHS
> partitions as the were before the shuffle and thus may not have many empty
> ones any more.
> This supersedes SPARK-37193
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]