Nattavut Sutyanyong created SPARK-19712: -------------------------------------------
Summary: EXISTS and Left Semi join do not produce the same plan Key: SPARK-19712 URL: https://issues.apache.org/jira/browse/SPARK-19712 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.1.0 Reporter: Nattavut Sutyanyong This problem is found during the development of SPARK-18874. The EXISTS form in the following query: {{sql("select * from t1 inner join t2 on t1.t1a=t2.t2a where exists (select 1 from t3 where t1.t1b=t3.t3b)")}} gives the optimized plan below: {code} == Optimized Logical Plan == Join Inner, (t1a#7 = t2a#25) :- Join LeftSemi, (t1b#8 = t3b#58) : :- Filter isnotnull(t1a#7) : : +- Relation[t1a#7,t1b#8,t1c#9] parquet : +- Project [1 AS 1#271, t3b#58] : +- Relation[t3a#57,t3b#58,t3c#59] parquet +- Filter isnotnull(t2a#25) +- Relation[t2a#25,t2b#26,t2c#27] parquet {code} whereas a semantically equivalent Left Semi join query below: {{sql("select * from t1 inner join t2 on t1.t1a=t2.t2a left semi join t3 on t1.t1b=t3.t3b")}} gives the following optimized plan: {code} == Optimized Logical Plan == Join LeftSemi, (t1b#8 = t3b#58) :- Join Inner, (t1a#7 = t2a#25) : :- Filter (isnotnull(t1b#8) && isnotnull(t1a#7)) : : +- Relation[t1a#7,t1b#8,t1c#9] parquet : +- Filter isnotnull(t2a#25) : +- Relation[t2a#25,t2b#26,t2c#27] parquet +- Project [t3b#58] +- Relation[t3a#57,t3b#58,t3c#59] parquet {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org