GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/15558
[SPARK-17357][SPARK-6624][SQL] Convert filter predicate to CNF in Optimizer for pushdown ## What changes were proposed in this pull request? This PR is proposed to solve the problem #14912 tried to solve before. Simply said, currently some predicates can not be correctly pushdown through operators due to its format is a bunch of ORs. A simple example is (a > 10) || (b > 2 && c == 3). If a datasource has attributes a and b, this filtering predicate cannot be pushdown. If we can convert it to CNF (a > 10 || b > 2) && (a > 10 || c == 3). Then we can push down (a > 10 || b > 2). To convert the predicate to CNF format can solve this formally instead of a hacky way on #14912. We have previous PRs for CNF conversion, such as #8200. Most of added tests in `CNFNormalizationSuite` are copied from #8200. ## How was this patch tested? Jenkins tests. Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 filter-cnf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15558.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15558 ---- commit baac6327b5a9c1a234e34da538a72d8ef87a9e35 Author: Liang-Chi Hsieh <vii...@gmail.com> Date: 2016-10-06T14:47:34Z Convert filter predicate to CNF in Optimizer. commit c0637b26808aed386c4d937ebca44958e9f89c09 Author: Liang-Chi Hsieh <vii...@gmail.com> Date: 2016-10-07T02:49:35Z Improve test. commit f0872fe8b208ddda6e2cb335f9c6a58a195a0960 Author: Liang-Chi Hsieh <vii...@gmail.com> Date: 2016-10-07T02:50:08Z improve test. commit 62a23691be61f33fa079520e00b573b4ad4aaf3e Author: Liang-Chi Hsieh <vii...@gmail.com> Date: 2016-10-19T15:35:01Z Merge remote-tracking branch 'upstream/master' into filter-cnf commit 5343947cfeb287e1f0e02e472cc2ada441c671a4 Author: Liang-Chi Hsieh <vii...@gmail.com> Date: 2016-10-19T15:36:53Z Add comments. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org