GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21083

    [SPARK-21479][SPARK-23564][SQL] infer additional filters from constraints 
for join's children

    ## What changes were proposed in this pull request?
    
    The existing query constraints framework has 2 steps:
    1. propagate constraints bottom up.
    2. use constraints to infer additional filters for better data pruning.
    
    For step 2, it mostly helps with Join, because we can connect the 
constraints from children to the join condition and infer powerful filters to 
prune the data of the join sides. e.g., the left side has constraints `a = 1`, 
the join condition is `left.a = right.a`, then we can infer `right.a = 1` to 
the right side and prune the right side a lot.
    
    However, the current logic of inferring filters from constraints for Join 
is pretty weak. It infers the filters from Join's constraints. Some joins like 
left semi/anti exclude output from right side and the right side constraints 
will be lost here.
    
    This PR propose to check the left and right constraints individually, 
expand the constraints with join condition and add filters to children of join 
directly, instead of adding to the join condition.
    
    This reverts https://github.com/apache/spark/pull/20670 , covers 
https://github.com/apache/spark/pull/20717 and 
https://github.com/apache/spark/pull/20816
    
    This is inspired by the original PRs and the tests are all from these PRs. 
Thanks to the authors @mgaido91 @maryannxue @KaiXinXiaoLei !
    
    ## How was this patch tested?
    
    new tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark join

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21083.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21083
    
----
commit 2977a5e037eb862a530a777e349f328ffbda39bb
Author: Wenchen Fan <wenchen@...>
Date:   2018-04-16T16:15:04Z

    Revert "[SPARK-23405] Generate additional constraints for Join's children"
    
    This reverts commit cdcccd7b41c43d79edff2fec7a84cd00e9524f75.

commit b967955ec2c7d33f28845dd55a1a9b70c5c2ba03
Author: Wenchen Fan <wenchen@...>
Date:   2018-04-16T19:39:50Z

    fix join filter inference from constraints

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to