[ https://issues.apache.org/jira/browse/SPARK-44307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740565#comment-17740565 ]
Nikita Awasthi commented on SPARK-44307: ---------------------------------------- User 'maheshk114' has created a pull request for this issue: https://github.com/apache/spark/pull/41860 > Bloom filter is not added for left outer join if the left side table is > smaller than broadcast threshold. > --------------------------------------------------------------------------------------------------------- > > Key: SPARK-44307 > URL: https://issues.apache.org/jira/browse/SPARK-44307 > Project: Spark > Issue Type: Bug > Components: Optimizer > Affects Versions: 3.4.1 > Reporter: mahesh kumar behera > Priority: Major > Fix For: 3.5.0 > > > In case of left outer join, even if the left side table is small enough to be > broadcasted, shuffle join is used. This is because of the property of the > left outer join. If the left side is broadcasted in left outer join, the > result generated will be wrong. But this is not taken care of in bloom > filter. While injecting the bloom filter, if lest side is smaller than > broadcast threshold, bloom filter is not added. It assumes that the left side > will be broadcast and there is no need for a bloom filter. This causes bloom > filter optimization to be missed in case of left outer join with small left > side and huge right-side table. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org