[ https://issues.apache.org/jira/browse/SPARK-18614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703185#comment-15703185 ]
Nattavut Sutyanyong commented on SPARK-18614: --------------------------------------------- {{ExistenceJoin}} should be treated the same as {{LeftOuter}} and {{LeftAnti}}, not {{InnerLike}} and {{LeftSemi}}. This is not currently exposed because the rewrite of {{\[NOT\] EXISTS OR ...}} to {{ExistenceJoin}} happens in rule {{RewritePredicateSubquery}}, which is in a separate rule set and placed after the rule {{PushPredicateThroughJoin}}. During the transformation in the rule {{PushPredicateThroughJoin}}, an ExistenceJoin never exists. The semantics of {{ExistenceJoin}} says we need to preserve all the rows from the left table through the join operation as if it is a regular {{LeftOuter}} join. The {{ExistenceJoin}} augments the {{LeftOuter}} operation with a new column called {{exists}}, set to true when the join condition in the ON clause is true and false otherwise. The filter of any rows will happen in the {{Filter}} operation above the {{ExistenceJoin}}. Example: A(c1, c2): \{ (1, 1), (1, 2) \} // B can be any value as it is irrelevant in this example B(c1): \{ (NULL) \} {code:SQL} select A.* from A where exists (select 1 from B where A.c1 = A.c2) or A.c2=2 {code} In this example, the correct result is all the rows from A. If the pattern {{ExistenceJoin}} at line 935 in {{Optimizer.scala}} added by the work in SPARK-18597 is indeed active, the code will push down the predicate A.c1 = A.c2 to be a {{Filter}} on relation A, which will filter the row (1,2) from A. > Incorrect predicate pushdown thru ExistenceJoin > ----------------------------------------------- > > Key: SPARK-18614 > URL: https://issues.apache.org/jira/browse/SPARK-18614 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Nattavut Sutyanyong > Priority: Minor > Labels: correctness > > This is a follow-up work from SPARK-18597 to close a potential incorrect > rewrite in {{PushPredicateThroughJoin}} rule of the Optimizer phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org