[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73316/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16998 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73316/testReport)** for PR 16998 at commit [`5be21b3`](https://github.com/apache/spark/commit/5be21b32d5b4e3e36e50317a385a554206967668). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73316/testReport)** for PR 16998 at commit [`5be21b3`](https://github.com/apache/spark/commit/5be21b32d5b4e3e36e50317a385a554206967668). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16998 @sameeragarwal That's correct. > By the way, as an aside we should probably allow constraint inference/propagation to be turned off via a conf flag to provide a quick work around against these kind of problems. As we use constraints in optimization, if we turn off constraint inference/propagation, wouldn't it miss optimization chance for query plans? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/16998 By the way, as an aside we should probably allow constraint inference/propagation to be turned off via a conf flag to provide a quick work around against these kind of problems. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/16998 @viirya please correct me if I'm wrong but scanning through this patch, it appears that the underlying problem is that duplicating and tracking aliased constraints using a `Set` tends to blow up quickly (causing regressions) and this patch is proposing an alternate data structure (`aliasedExpressionsInConstraints`) to keep track of aliases? For e.g., in your example where `a > b`, and `a` is aliased to `c` and `d`, we currently track constraints as `Set(a > b, c > b, d > b)` whereas you'd like it to be tracked as `Set(a > b)` and `Map(a, Set(c, d))`? Is that correct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16998 > I do not get your points. What does this mean? Constraint propagation is a bottom up mechanism for inferring the constraints. Can you elaborate your idea in the more formal way. We fully expand the constraints with aliased attributes now. For example, if there is a constraint "a > b", and current query plan aliases "a" to "c" and "d". The final constraints of this plan is "a > b", "c > b", "d > b". The values of those constraints are all the same, either all true or all false. So in case of inferring filters from the constraints, we only need "a > b", other aliased constraints "c > b", "d > b" are not necessary. > I did not read the code. Just wondering if we could miss the chance of plan optimization after this PR? What is the negative impact, if exists? The only one optimization I think would be affected is `PruneFilters`. `PruneFilters` will prune a condition if its child's constraints already contain the condition. Using above example to elaborate, if there is a `Filter` above the query plan and its condition is "c > b". As we only have "a > b" in the query plan's constraint, we can't prune the condition and the `Filter`. However, this is not a big impact and it can be easily solved. We can use a simple method to inquire if a given condition like "c > b" is contained in the fully expanded constraints of a query plan, without really fully expanding the constraints. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16998 > Another issue is we actually don't need the additional constraints at most of time. For example, if there is a constraint "a > b", and "a" is aliased to "c" and "d". When we use this constraint in filtering, we don't need all constraints "a > b", "c > b", "d > b". We only need "a > b" because if it is false, it is guaranteed that all other constraints are false too. I do not get your points. What does this mean? Constraint propagation is a bottom up mechanism for inferring the constraints. Can you elaborate your idea in the more formal way. I did not read the code. Just wondering if we could miss the chance of plan optimization after this PR? What is the negative impact, if exists? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16998 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73174/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73174/testReport)** for PR 16998 at commit [`6cb896f`](https://github.com/apache/spark/commit/6cb896ff062c0a0c46f6f6ac4b88fad165eeaac0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73174/testReport)** for PR 16998 at commit [`6cb896f`](https://github.com/apache/spark/commit/6cb896ff062c0a0c46f6f6ac4b88fad165eeaac0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16998 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73163/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16998 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73163/testReport)** for PR 16998 at commit [`d691c66`](https://github.com/apache/spark/commit/d691c66dd0092ad99751964a6b079193706b953c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73163/testReport)** for PR 16998 at commit [`d691c66`](https://github.com/apache/spark/commit/d691c66dd0092ad99751964a6b079193706b953c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73158 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73158/testReport)** for PR 16998 at commit [`24fb723`](https://github.com/apache/spark/commit/24fb723207d80a7c6068fd113430488b89ed9d0b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16998 @hvanhovell Yes. #16785 only does a limited improvement. Both #16785 and this are non-parallel approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16998 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16998 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73159/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73159 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73159/testReport)** for PR 16998 at commit [`917de74`](https://github.com/apache/spark/commit/917de74db0066f015ac814125f5cb2d85b7a5b85). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73159 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73159/testReport)** for PR 16998 at commit [`917de74`](https://github.com/apache/spark/commit/917de74db0066f015ac814125f5cb2d85b7a5b85). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16998 @viirya does this PR supersede #16785? I do like the non-parallel approach. I will try to take a more in-depth look at the end of the week (beginning of the next sprint). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16998 **[Test build #73158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73158/testReport)** for PR 16998 at commit [`24fb723`](https://github.com/apache/spark/commit/24fb723207d80a7c6068fd113430488b89ed9d0b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16998 cc @cloud-fan @hvanhovell @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org