[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51068802 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -17,16 +17,31 @@ package org.apache.spark.sql.catalyst.plans -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn} +import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.trees.TreeNode import org.apache.spark.sql.types.{DataType, StructType} -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] { +abstract class QueryPlan[PlanType <: TreeNode[PlanType]] + extends TreeNode[PlanType] with PredicateHelper { self: PlanType => def output: Seq[Attribute] /** + * Extracts the output property from a given child. + */ + def extractConstraintsFromChild(child: QueryPlan[PlanType]): Set[Expression] = { +child.constraints.filter(_.references.subsetOf(outputSet)) + } + + /** + * An sequence of expressions that describes the data property of the output rows of this + * operator. For example, if the output of this operator is column `a`, an example `constraints` + * can be `Set(a > 10, a < 20)`. + */ + def constraints: Set[Expression] = Set.empty --- End diff -- This is probably going to be nontrivial to calculate for a large tree. We might consider having an internal method, `private def validConstraints` or something, that we expand / canonicalize into a `lazy val constraints` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51068566 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -146,6 +172,26 @@ case class Union(children: Seq[LogicalPlan]) extends LogicalPlan { val sizeInBytes = children.map(_.statistics.sizeInBytes).sum Statistics(sizeInBytes = sizeInBytes) } + + override def extractConstraintsFromChild(child: QueryPlan[LogicalPlan]): Set[Expression] = { +child.constraints.filter(_.references.subsetOf(child.outputSet)) + } + + def rewriteConstraints( + planA: LogicalPlan, + planB: LogicalPlan, + constraints: Set[Expression]): Set[Expression] = { +require(planA.output.size == planB.output.size) +val attributeRewrites = AttributeMap(planB.output.zip(planA.output)) +constraints.map(_ transform { + case a: Attribute => attributeRewrites(a) +}) + } + + override def constraints: Set[Expression] = { +children.map(child => rewriteConstraints(children.head, child, + extractConstraintsFromChild(child))).reduce(_ intersect _) --- End diff -- same style nit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51068499 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -88,6 +88,12 @@ case class Generate( case class Filter(condition: Expression, child: LogicalPlan) extends UnaryNode { override def output: Seq[Attribute] = child.output + + override def constraints: Set[Expression] = { +val newConstraint = splitConjunctivePredicates(condition).filter( + _.references.subsetOf(outputSet)).toSet --- End diff -- style nit: we typically avoid breaking in the middle of a function call and instead prefer to break in between calls (always pick the highest syntactic level) ```scala val newConstraint = splitConjunctivePredicates(condition) .filter(_.references.subsetOf(outputSet)) .toSet ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51068376 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -17,16 +17,31 @@ package org.apache.spark.sql.catalyst.plans -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn} +import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.trees.TreeNode import org.apache.spark.sql.types.{DataType, StructType} -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] { +abstract class QueryPlan[PlanType <: TreeNode[PlanType]] + extends TreeNode[PlanType] with PredicateHelper { self: PlanType => def output: Seq[Attribute] /** + * Extracts the output property from a given child. + */ + def extractConstraintsFromChild(child: QueryPlan[PlanType]): Set[Expression] = { --- End diff -- `protected`? Also I'm not sure I get the scala doc. Maybe `getReleventContraints` is a better name? It is taking the constraints and removing those that don't apply anymore because we removed columns right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-175375684 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50161/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-17537 **[Test build #50161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50161/consoleFull)** for PR 10844 at commit [`f15ef96`](https://github.com/apache/spark/commit/f15ef96657603b79a815853fba991835fe3ca50f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-175375683 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-175354244 **[Test build #50161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50161/consoleFull)** for PR 10844 at commit [`f15ef96`](https://github.com/apache/spark/commit/f15ef96657603b79a815853fba991835fe3ca50f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-174926676 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50083/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-174926671 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-174926239 **[Test build #50083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50083/consoleFull)** for PR 10844 at commit [`7fb2f9c`](https://github.com/apache/spark/commit/7fb2f9ce01aafe45d720212cb13120eb693a6c06). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-174883150 **[Test build #50083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50083/consoleFull)** for PR 10844 at commit [`7fb2f9c`](https://github.com/apache/spark/commit/7fb2f9ce01aafe45d720212cb13120eb693a6c06). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-173796157 @sameeragarwal I just saw it. Thank you! I will do the code changes after this is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/10844#issuecomment-173787321 @hvanhovell added, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org