[ https://issues.apache.org/jira/browse/SPARK-28268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-28268: ------------------------------------ Assignee: Apache Spark > Rewrite non-correlated Semi/Anti join as Filter > ----------------------------------------------- > > Key: SPARK-28268 > URL: https://issues.apache.org/jira/browse/SPARK-28268 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Mingcong Han > Assignee: Apache Spark > Priority: Major > > When semi/anti join has a non-correlated join condition, we can convert it to > a Filter with a non-correlated Exists subquery. As the Exists subquery is > non-correlated, we can use a physical plan for it to avoid join. > Actually, this optimization is mainly for the non-correlated subqueries > (Exists/In). We currently rewrite Exists/InSubquery as semi/anti/existential > join, whether it is correlated or not. And they are mostly executed using a > BroadcastNestedLoopJoin which is really not a good choice. > Here are some examples: > 1. > {code:sql} > SELECT t1a > FROM t1 > SEMI JOIN t2 > ON t2a > 10 OR t2b = 'a' > {code} > => > {code:sql} > SELECT t1a > FROM t1 > WHERE EXISTS(SELECT 1 > FROM t2 > WHERE t2a > 10 OR t2b = 'a') > {code} > 2. > {code:sql} > SELECT t1a > FROM t1 > ANTI JOIN t2 > ON t1b > 10 AND t2b = 'b' > {code} > => > {code:sql} > SELECT t1a > FROM t1 > WHERE NOT(t1b > 10 > AND EXISTS(SELECT 1 > FROM t2 > WHERE t2b = 'b')) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org