[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-195497597 sounds good, thank you. In my branch, I try to address (2) by not adding new conditions if the child node(s) already have the given constraint. For (3), please note that pushing `b = c` down isn't useless if `a` comes from the left side of the join and `b` and `c` come from the right. It is of course useless if `b` and `c` come from different sides of the join. I think we can have a slightly smarter filter inference rule for joins to identify the latter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-195472409 A couple of issues I hit when I try to do it. 1. The first issue is the one I submitted earlier regarding the `IsNotNull` of compound expressions. 2. The second issue is the duplicate conditions could be added if we do not change the existing rules, e.g., `PushPredicateThroughJoin`. 3. Another issue is about `a=b` && `a=c`, we can infer `b=c`. However, the last one is useless and will burn extra costs. Ideally, we should select the predicates based on the cardinality. However, that is missing. Maybe more. Still trying to write more test cases. Hopefully, I can find all of them. Thank you! If you do not mind it, I will try it this weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-195466611 Shouldn't the existing rule for `PushPredicateThroughJoin` automatically take care of predicate pushdown? By the way, please feel free to take over my branch if you'd like :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-195462388 @sameeragarwal Yeah, it sounds like you already started working on it. True, I also did a similar thing, but we need to add extra handling for predicate push down. How about you first delivering your current changes? I will do the remaining part? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-195254941 @gatorsmile instead of having a special rule for join, we can probably infer all possible filters based on constraints (... something along the lines of https://github.com/sameeragarwal/spark/commit/ce4c9441f32ab39eaebd7fe2081cf1c789a1ef4a). This should now subsume the predicate transitivity optimization right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-195209412 Great! Based on this PR, it sounds like I can reopen my PR: https://github.com/apache/spark/pull/10490 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11618 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-195133900 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-195132692 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/11618#discussion_r55774324 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -62,6 +63,26 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infers an additional set of constraints from a given set of equality constraints. + * For e.g., if an operator has constraints of the form (`a = 5`, `a = b`), this returns an + * additional constraint of the form `b = 5` + */ + private def inferAdditionalConstraints(constraints: Set[Expression]): Set[Expression] = { +var inferredConstraints = Set.empty[Expression] +constraints.foreach { + case eq @ EqualTo(l: Attribute, r: Attribute) => +inferredConstraints ++= (constraints - eq).map(_ transform { + case a: Attribute if a.semanticEquals(l) => r --- End diff -- Just to infer equality based on the expression ids of their attribute references: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala#L44-L49. Is that not necessary in this case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11618#discussion_r55772321 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -62,6 +63,26 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infers an additional set of constraints from a given set of equality constraints. + * For e.g., if an operator has constraints of the form (`a = 5`, `a = b`), this returns an + * additional constraint of the form `b = 5` + */ + private def inferAdditionalConstraints(constraints: Set[Expression]): Set[Expression] = { +var inferredConstraints = Set.empty[Expression] +constraints.foreach { + case eq @ EqualTo(l: Attribute, r: Attribute) => +inferredConstraints ++= (constraints - eq).map(_ transform { + case a: Attribute if a.semanticEquals(l) => r --- End diff -- why semantic equals instead of if a == l? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194746983 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194746990 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52819/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194746695 **[Test build #52819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52819/consoleFull)** for PR 11618 at commit [`c57db5b`](https://github.com/apache/spark/commit/c57db5b4a3950ba5e653ff3b8cdfa33e157305ef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194717639 **[Test build #52819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52819/consoleFull)** for PR 11618 at commit [`c57db5b`](https://github.com/apache/spark/commit/c57db5b4a3950ba5e653ff3b8cdfa33e157305ef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/11618#discussion_r55642742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -62,6 +63,25 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infers an additional set of constraints from a given set of equality constraints. + * For e.g., if an operator has constraints of the form (`a = 5`, `a = b`), this returns an + * additional constraint of the form `b = 5` + */ + private def inferAdditionalConstraints(constraints: Set[Expression]): Set[Expression] = { +constraints.map { + case eq @ EqualTo(l: Attribute, r: Attribute) => +(constraints -- Set(eq)).map(_ transform { + case a: Attribute if a.semanticEquals(l) => r +}).union( + (constraints -- Set(eq)).map(_ transform { +case a: Attribute if a.semanticEquals(r) => l + })) + case _ => +Set.empty[Expression] +}.foldLeft(Set.empty[Expression])(_ union _) -- constraints --- End diff -- Thank you, fixed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11618#discussion_r55636107 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -62,6 +63,25 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infers an additional set of constraints from a given set of equality constraints. + * For e.g., if an operator has constraints of the form (`a = 5`, `a = b`), this returns an + * additional constraint of the form `b = 5` + */ + private def inferAdditionalConstraints(constraints: Set[Expression]): Set[Expression] = { +constraints.map { + case eq @ EqualTo(l: Attribute, r: Attribute) => +(constraints -- Set(eq)).map(_ transform { + case a: Attribute if a.semanticEquals(l) => r +}).union( + (constraints -- Set(eq)).map(_ transform { +case a: Attribute if a.semanticEquals(r) => l + })) + case _ => +Set.empty[Expression] +}.foldLeft(Set.empty[Expression])(_ union _) -- constraints --- End diff -- maybe not foldLeft -- most of the time fold is used it can be rewritten into something imperative that is simpler (and a lot faster) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194618549 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52784/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194618548 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194618308 **[Test build #52784 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52784/consoleFull)** for PR 11618 at commit [`64e3320`](https://github.com/apache/spark/commit/64e3320a248a3970b2ea3b006d026375b12f6ef6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194587185 **[Test build #52784 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52784/consoleFull)** for PR 11618 at commit [`64e3320`](https://github.com/apache/spark/commit/64e3320a248a3970b2ea3b006d026375b12f6ef6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194579131 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194579133 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52779/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194579043 **[Test build #52779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52779/consoleFull)** for PR 11618 at commit [`1149d20`](https://github.com/apache/spark/commit/1149d200a712c339dd73f9038388ddcf9e66dad4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194571800 **[Test build #52779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52779/consoleFull)** for PR 11618 at commit [`1149d20`](https://github.com/apache/spark/commit/1149d200a712c339dd73f9038388ddcf9e66dad4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11618#issuecomment-194570939 cc @nongli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13789] Infer additional constraints fro...
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/11618 [SPARK-13789] Infer additional constraints from attribute equality ## What changes were proposed in this pull request? This PR adds support for inferring an additional set of data constraints based on attribute equality. For e.g., if an operator has constraints of the form (`a = 5`, `a = b`), we can now automatically infer an additional constraint of the form `b = 5` ## How was this patch tested? Tested that new constraints are properly inferred for filters (by adding a new test) and equi-joins (by modifying an existing test) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark infer-isequal-constraints Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11618.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11618 commit 1149d200a712c339dd73f9038388ddcf9e66dad4 Author: Sameer Agarwal Date: 2016-03-09T23:44:11Z Infer additional constraints from attribute equality --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org