GitHub user antonoal opened a pull request: https://github.com/apache/spark/pull/11777
Added transitive closure transformation to Catalyst ## What changes were proposed in this pull request? A relatively simple transformation is missing from Catalyst's arsenal - generation of transitive predicates. For instance, if you have got the following query: `select * from table1 t1 join table2 t2 on t1.a = t2.b where t1.a = 42` then it is a fair assumption that t2.b also equals 42 hence an additional predicate could be generated. The additional predicate could in turn be pushed down through the join and improve performance of the whole query by filtering out the data before joining it. Such a transformation exists in Oracle DB. Please note, in this PR a transitive predicate would be created for the following operations: * a BinaryComparison (=, >=, etc.) to a foldable * in (1, 2, 3) where all the values in the sequence are foldable * Not of any of the above * Or of any of the above ## How was this patch tested? I've added a new TransitiveClosureSuite with a series of unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/antonoal/spark transitive-closure Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11777.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11777 ---- commit 7df4117749f7afc2e5e95190cf93a961b9c6ed3a Author: Alex Antonov <3091...@gmail.com> Date: 2016-03-16T21:53:38Z Added transitive closure transformation to Catalyst ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org