[jira] [Updated] (SPARK-29769) Spark SQL cannot handle "exists/not exists" condition when using "JOIN"
[ https://issues.apache.org/jira/browse/SPARK-29769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-29769: -- Description: In origin master, we can'y run sql use `EXISTS/NOT EXISTS` in Join's on condition: {code} create temporary view s1 as select * from values (1), (3), (5), (7), (9) as s1(id); create temporary view s2 as select * from values (1), (3), (4), (6), (9) as s2(id); create temporary view s3 as select * from values (3), (4), (6), (9) as s3(id); explain extended SELECT s1.id, s2.id as id2 FROM s1 LEFT OUTER JOIN s2 ON s1.id = s2.id AND EXISTS (SELECT * FROM s3 WHERE s3.id > 6) we will get == Parsed Logical Plan == 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, (('s1.id = 's2.id) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- 'UnresolvedRelation `s1` +- 'UnresolvedRelation `s2` == Analyzed Logical Plan == org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27; 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, ((id#0 = id#1) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- SubqueryAlias `s1` : +- Project [id#0] : +- SubqueryAlias `s1` :+- LocalRelation [id#0] +- SubqueryAlias `s2` +- Project [id#1] +- SubqueryAlias `s2` +- LocalRelation [id#1] org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27; 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, ((id#0 = id#1) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- SubqueryAlias `s1` : +- Project [id#0] : +- SubqueryAlias `s1` :+- LocalRelation [id#0] +- SubqueryAlias `s2` +- Project [id#1] +- SubqueryAlias `s2` +- LocalRelation [id#1] == Optimized Logical Plan == org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27; 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, ((id#0 = id#1) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- SubqueryAlias `s1` : +- Project [id#0] : +- SubqueryAlias `s1` :+- LocalRelation [id#0] +- SubqueryAlias `s2` +- Project [id#1] +- SubqueryAlias `s2` +- LocalRelation [id#1] == Physical Plan == org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27; 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, ((id#0 = id#1) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- SubqueryAlias `s1` : +- Project [id#0] : +- SubqueryAlias `s1` :+- LocalRelation [id#0] +- SubqueryAlias `s2` +- Project [id#1] +- SubqueryAlias `s2` +- LocalRelation [id#1] Time taken: 1.455 seconds, Fetched 1 row(s) {code} Since in analyzer , it won't solve join's condition's SubQuery in *Analyzer.ResolveSubquery*, then table *s3* was unresolved. After pr https://github.com/apache/spark/pull/25854/files We will solve subqueries in join condition and it will pass analyzer level. In current master, if we run sql above, we will get {code} == Parsed Logical Plan == 'Project ['s1.id, 's2.id AS id2#291] +- 'Join LeftOuter, (('s1.id = 's2.id) AND exists#290 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation [s3] :- 'UnresolvedRelation [s1] +- 'UnresolvedRelation [s2] == Analyzed Logical Plan == id: int, id2: int Project [id#244, id#250 AS id2#291] +- Join LeftOuter, ((id#244 = id#250) AND exists#290 []) : +- Project [id#256] : +- Filter (id#256 > 6) :+- SubqueryAlias `s3` : +- Project [value#253 AS id#256] : +- LocalRelation [value#253] :- SubqueryAlias `s1` : +- Project [value#241 AS id#244] : +- LocalRelation [value#241] +- SubqueryAlias `s2` +- Project [value#247 AS id#250] +- LocalRelation [value#247] == Optimized Logical Plan == Project [id#244, id#250 AS id2#291] +- Join LeftOuter, (exists#290 [] AND (id#244 = id#250)) : +- Project [value#253 AS id#256] : +- Filter (value#253 > 6) :+- LocalRelation [value#253] :- Project [value#241 AS id#244] : +- LocalRelation [value#241] +- Project [value#247 AS id#250] +- LocalRelation [value#247] == Physical Plan == *(2) Project [id#244, id#250 AS id2#291] +- *(2) BroadcastHashJoin [id#244], [id#250], LeftOuter, BuildRight, exists#290 [] : +- Project [value#253 AS id#256] : +- Filter (value#253 > 6) :+- LocalRelation [value#253] :- *(2) Project [value#241 AS id#244] : +- *(2) LocalTableScan [value#241] +- Broadca
[jira] [Updated] (SPARK-29769) Spark SQL cannot handle "exists/not exists" condition when using "JOIN"
[ https://issues.apache.org/jira/browse/SPARK-29769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-29769: -- Description: In origin master, we can'y run sql use `EXISTS/NOT EXISTS` in Join's on condition: {code} create temporary view s1 as select * from values (1), (3), (5), (7), (9) as s1(id); create temporary view s2 as select * from values (1), (3), (4), (6), (9) as s2(id); create temporary view s3 as select * from values (3), (4), (6), (9) as s3(id); explain extended SELECT s1.id, s2.id as id2 FROM s1 LEFT OUTER JOIN s2 ON s1.id = s2.id AND EXISTS (SELECT * FROM s3 WHERE s3.id > 6) we will get == Parsed Logical Plan == 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, (('s1.id = 's2.id) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- 'UnresolvedRelation `s1` +- 'UnresolvedRelation `s2` == Analyzed Logical Plan == org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27; 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, ((id#0 = id#1) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- SubqueryAlias `s1` : +- Project [id#0] : +- SubqueryAlias `s1` :+- LocalRelation [id#0] +- SubqueryAlias `s2` +- Project [id#1] +- SubqueryAlias `s2` +- LocalRelation [id#1] org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27; 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, ((id#0 = id#1) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- SubqueryAlias `s1` : +- Project [id#0] : +- SubqueryAlias `s1` :+- LocalRelation [id#0] +- SubqueryAlias `s2` +- Project [id#1] +- SubqueryAlias `s2` +- LocalRelation [id#1] == Optimized Logical Plan == org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27; 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, ((id#0 = id#1) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- SubqueryAlias `s1` : +- Project [id#0] : +- SubqueryAlias `s1` :+- LocalRelation [id#0] +- SubqueryAlias `s2` +- Project [id#1] +- SubqueryAlias `s2` +- LocalRelation [id#1] == Physical Plan == org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27; 'Project ['s1.id, 's2.id AS id2#4] +- 'Join LeftOuter, ((id#0 = id#1) && exists#3 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation `s3` :- SubqueryAlias `s1` : +- Project [id#0] : +- SubqueryAlias `s1` :+- LocalRelation [id#0] +- SubqueryAlias `s2` +- Project [id#1] +- SubqueryAlias `s2` +- LocalRelation [id#1] Time taken: 1.455 seconds, Fetched 1 row(s) {code} Since in analyzer , it won't solve join's condition's SubQuery in *Analyzer.ResolveSubquery*, then table *s3* was unresolved. After pr https://github.com/apache/spark/pull/25854/files We will solve subqueries in join condition and it will pass analyzer level. In current master, if we run sql above, we will get {code} == Parsed Logical Plan == 'Project ['s1.id, 's2.id AS id2#291] +- 'Join LeftOuter, (('s1.id = 's2.id) AND exists#290 []) : +- 'Project [*] : +- 'Filter ('s3.id > 6) :+- 'UnresolvedRelation [s3] :- 'UnresolvedRelation [s1] +- 'UnresolvedRelation [s2] == Analyzed Logical Plan == id: int, id2: int Project [id#244, id#250 AS id2#291] +- Join LeftOuter, ((id#244 = id#250) AND exists#290 []) : +- Project [id#256] : +- Filter (id#256 > 6) :+- SubqueryAlias `s3` : +- Project [value#253 AS id#256] : +- LocalRelation [value#253] :- SubqueryAlias `s1` : +- Project [value#241 AS id#244] : +- LocalRelation [value#241] +- SubqueryAlias `s2` +- Project [value#247 AS id#250] +- LocalRelation [value#247] == Optimized Logical Plan == Project [id#244, id#250 AS id2#291] +- Join LeftOuter, (exists#290 [] AND (id#244 = id#250)) : +- Project [value#253 AS id#256] : +- Filter (value#253 > 6) :+- LocalRelation [value#253] :- Project [value#241 AS id#244] : +- LocalRelation [value#241] +- Project [value#247 AS id#250] +- LocalRelation [value#247] == Physical Plan == *(2) Project [id#244, id#250 AS id2#291] +- *(2) BroadcastHashJoin [id#244], [id#250], LeftOuter, BuildRight, exists#290 [] : +- Project [value#253 AS id#256] : +- Filter (value#253 > 6) :+- LocalRelation [value#253] :- *(2) Project [value#241 AS id#244] : +- *(2) LocalTableScan [value#241] +- Broadca