[ 
https://issues.apache.org/jira/browse/SPARK-18614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703185#comment-15703185
 ] 

Nattavut Sutyanyong commented on SPARK-18614:
---------------------------------------------

{{ExistenceJoin}} should be treated the same as {{LeftOuter}} and {{LeftAnti}}, 
not {{InnerLike}} and {{LeftSemi}}. This is not currently exposed because the 
rewrite of {{\[NOT\] EXISTS OR ...}} to {{ExistenceJoin}} happens in rule 
{{RewritePredicateSubquery}}, which is in a separate rule set and placed after 
the rule {{PushPredicateThroughJoin}}. During the transformation in the rule 
{{PushPredicateThroughJoin}}, an ExistenceJoin never exists.

The semantics of {{ExistenceJoin}} says we need to preserve all the rows from 
the left table through the join operation as if it is a regular {{LeftOuter}} 
join. The {{ExistenceJoin}} augments the {{LeftOuter}} operation with a new 
column called {{exists}}, set to true when the join condition in the ON clause 
is true and false otherwise. The filter of any rows will happen in the 
{{Filter}} operation above the {{ExistenceJoin}}.

Example:

A(c1, c2): \{ (1, 1), (1, 2) \}
// B can be any value as it is irrelevant in this example
B(c1): \{ (NULL) \}

{code:SQL}
select A.*
from   A
where  exists (select 1 from B where A.c1 = A.c2)
       or A.c2=2
{code}

In this example, the correct result is all the rows from A. If the pattern 
{{ExistenceJoin}} at line 935 in {{Optimizer.scala}} added by the work in 
SPARK-18597 is indeed active, the code will push down the predicate A.c1 = A.c2 
to be a {{Filter}} on relation A, which will filter the row (1,2) from A.


> Incorrect predicate pushdown thru ExistenceJoin
> -----------------------------------------------
>
>                 Key: SPARK-18614
>                 URL: https://issues.apache.org/jira/browse/SPARK-18614
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Nattavut Sutyanyong
>            Priority: Minor
>              Labels: correctness
>
> This is a follow-up work from SPARK-18597 to close a potential incorrect 
> rewrite in {{PushPredicateThroughJoin}} rule of the Optimizer phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to