GitHub user nsyca opened a pull request:

    https://github.com/apache/spark/pull/16044

    Spark 18614

    ## What changes were proposed in this pull request?
    
    ExistenceJoin should be treated the same as LeftOuter and LeftAnti, not 
InnerLike and LeftSemi. This is not currently exposed because the rewrite of 
[NOT] EXISTS OR ... to ExistenceJoin happens in rule RewritePredicateSubquery, 
which is in a separate rule set and placed after the rule 
PushPredicateThroughJoin. During the transformation in the rule 
PushPredicateThroughJoin, an ExistenceJoin never exists.
    
    The semantics of ExistenceJoin says we need to preserve all the rows from 
the left table through the join operation as if it is a regular LeftOuter join. 
The ExistenceJoin augments the LeftOuter operation with a new column called 
exists, set to true when the join condition in the ON clause is true and false 
otherwise. The filter of any rows will happen in the Filter operation above the 
ExistenceJoin.
    
    Example:
    
    A(c1, c2): { (1, 1), (1, 2) }
    // B can be any value as it is irrelevant in this example
    B(c1): { (NULL) }
    
    select A.*
    from   A
    where  exists (select 1 from B where A.c1 = A.c2)
           or A.c2=2
    
    In this example, the correct result is all the rows from A. If the pattern 
ExistenceJoin around line 935 in Optimizer.scala is indeed active, the code 
will push down the predicate A.c1 = A.c2 to be a Filter on relation A, which 
will incorrectly filter the row (1,2) from A.
    
    ## How was this patch tested?
    
    Since this is not an exposed case, no new test cases is added. The scenario 
is discovered via a code review of another PR and confirmed to be valid with 
peer.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nsyca/spark spark-18614

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16044.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16044
    
----
commit b98865127a39bde885f9b1680cfe608629d59d51
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-07-29T21:43:56Z

    [SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect 
results
    
    ## What changes were proposed in this pull request?
    
    This patch fixes the incorrect results in the rule ResolveSubquery in 
Catalyst's Analysis phase.
    
    ## How was this patch tested?
    ./dev/run-tests
    a new unit test on the problematic pattern.

commit 069ed8f8e5f14dca7a15701945d42fc27fe82f3c
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-07-29T21:50:02Z

    [SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect 
results
    
    ## What changes were proposed in this pull request?
    
    This patch fixes the incorrect results in the rule ResolveSubquery in 
Catalyst's Analysis phase.
    
    ## How was this patch tested?
    ./dev/run-tests
    a new unit test on the problematic pattern.

commit edca333c081e6d4e53a91b496fba4a3ef4ee89ac
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-07-30T00:28:15Z

    New positive test cases

commit 64184fdb77c1a305bb2932e82582da28bb4c0e53
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-08-01T13:20:09Z

    Fix unit test case failure

commit 29f82b05c9e40e7934397257c674b260a8e8a996
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-08-05T17:42:01Z

    blocking TABLESAMPLE

commit ac43ab47907a1ccd6d22f920415fbb4de93d4720
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-08-05T21:10:19Z

    Fixing code styling

commit 631d396031e8bf627eb1f4872a4d3a17c144536c
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-08-07T18:39:44Z

    Correcting Scala test style

commit 7eb9b2dbba3633a1958e38e0019e3ce816300514
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-08-08T02:31:09Z

    One (last) attempt to correct the Scala style tests

commit 1387cf51541408ac20048064fa5e559836af932c
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-08-12T20:11:50Z

    Merge remote-tracking branch 'upstream/master'

commit 6d9bade4df8954987078c479274d90a7612cc772
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-04T03:51:54Z

    Merge remote-tracking branch 'upstream/master'

commit 9a1f80b12cdc9857f4b906688f8691a2db502fa5
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-04T16:14:25Z

    Merge remote-tracking branch 'upstream/master'

commit 3fe9429c009eb156ac89ef6732e9230d583ed5d0
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-05T00:52:45Z

    Merge remote-tracking branch 'upstream/master'

commit 0757b8134316f8b5c87ef1c023966304228a0eeb
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-11T16:31:36Z

    Merge remote-tracking branch 'upstream/master'

commit 35b77f0ca477bf6427e18588c4514a3f0209f426
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-12T03:36:14Z

    Merge remote-tracking branch 'upstream/master'

commit c63b8c627cb13253b3776aec57b8a73d685d7bd1
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-14T15:29:09Z

    Merge remote-tracking branch 'upstream/master'

commit f3351d5aba8b5b52f5e1b12a8e068e0d4a4ece08
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-18T17:03:23Z

    Merge remote-tracking branch 'upstream/master'

commit 9fc5c3305f7c23593b1ef93a43fd266b2d5bed5a
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-18T22:31:30Z

    Merge remote-tracking branch 'upstream/master'

commit 402e1d93fc184ea04d1822f2d185e2a50c9440ab
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-22T03:23:34Z

    Merge remote-tracking branch 'upstream/master'

commit b1172816e833d689e6e879e9e3fd16e1e9b3b178
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-23T15:37:11Z

    Merge remote-tracking branch 'upstream/master'

commit 3023399c20f696fa7c4f7396517eff6769a910dd
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-24T14:11:28Z

    Merge remote-tracking branch 'upstream/master'

commit 4b692f04f2ebfde72614af02ccef77f8a6d24fe4
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-25T18:53:47Z

    Merge remote-tracking branch 'upstream/master'

commit c8aadb5e89e59bfb23237dc702a571afa7ea6f0a
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-28T17:50:03Z

    Merge remote-tracking branch 'upstream/master'

commit 21816473b9ff2b458e568ffaaed4d83cab1c82c3
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-28T21:22:34Z

    Merge remote-tracking branch 'upstream/master'

commit 8e34c50df8eacbecb0000565dd83656f13e16ecf
Author: Nattavut Sutyanyong <nsy....@gmail.com>
Date:   2016-11-28T21:35:18Z

    ExistenceJoin should be treated as a special case of LeftOuter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to