GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/21083
[SPARK-21479][SPARK-23564][SQL] infer additional filters from constraints for join's children ## What changes were proposed in this pull request? The existing query constraints framework has 2 steps: 1. propagate constraints bottom up. 2. use constraints to infer additional filters for better data pruning. For step 2, it mostly helps with Join, because we can connect the constraints from children to the join condition and infer powerful filters to prune the data of the join sides. e.g., the left side has constraints `a = 1`, the join condition is `left.a = right.a`, then we can infer `right.a = 1` to the right side and prune the right side a lot. However, the current logic of inferring filters from constraints for Join is pretty weak. It infers the filters from Join's constraints. Some joins like left semi/anti exclude output from right side and the right side constraints will be lost here. This PR propose to check the left and right constraints individually, expand the constraints with join condition and add filters to children of join directly, instead of adding to the join condition. This reverts https://github.com/apache/spark/pull/20670 , covers https://github.com/apache/spark/pull/20717 and https://github.com/apache/spark/pull/20816 This is inspired by the original PRs and the tests are all from these PRs. Thanks to the authors @mgaido91 @maryannxue @KaiXinXiaoLei ! ## How was this patch tested? new tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark join Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21083.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21083 ---- commit 2977a5e037eb862a530a777e349f328ffbda39bb Author: Wenchen Fan <wenchen@...> Date: 2018-04-16T16:15:04Z Revert "[SPARK-23405] Generate additional constraints for Join's children" This reverts commit cdcccd7b41c43d79edff2fec7a84cd00e9524f75. commit b967955ec2c7d33f28845dd55a1a9b70c5c2ba03 Author: Wenchen Fan <wenchen@...> Date: 2018-04-16T19:39:50Z fix join filter inference from constraints ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org