[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51068802
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -17,16 +17,31 @@
 
 package org.apache.spark.sql.catalyst.plans
 
-import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, 
Expression, VirtualColumn}
+import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.trees.TreeNode
 import org.apache.spark.sql.types.{DataType, StructType}
 
-abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends 
TreeNode[PlanType] {
+abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
+  extends TreeNode[PlanType] with PredicateHelper {
   self: PlanType =>
 
   def output: Seq[Attribute]
 
   /**
+   * Extracts the output property from a given child.
+   */
+  def extractConstraintsFromChild(child: QueryPlan[PlanType]): 
Set[Expression] = {
+child.constraints.filter(_.references.subsetOf(outputSet))
+  }
+
+  /**
+   * An sequence of expressions that describes the data property of the 
output rows of this
+   * operator. For example, if the output of this operator is column `a`, 
an example `constraints`
+   * can be `Set(a > 10, a < 20)`.
+   */
+  def constraints: Set[Expression] = Set.empty
--- End diff --

This is probably going to be nontrivial to calculate for a large tree.  We 
might consider having an internal method, `private def validConstraints` or 
something, that we expand / canonicalize into a `lazy val constraints`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51068566
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -146,6 +172,26 @@ case class Union(children: Seq[LogicalPlan]) extends 
LogicalPlan {
 val sizeInBytes = children.map(_.statistics.sizeInBytes).sum
 Statistics(sizeInBytes = sizeInBytes)
   }
+
+  override def extractConstraintsFromChild(child: QueryPlan[LogicalPlan]): 
Set[Expression] = {
+child.constraints.filter(_.references.subsetOf(child.outputSet))
+  }
+
+  def rewriteConstraints(
+  planA: LogicalPlan,
+  planB: LogicalPlan,
+  constraints: Set[Expression]): Set[Expression] = {
+require(planA.output.size == planB.output.size)
+val attributeRewrites = AttributeMap(planB.output.zip(planA.output))
+constraints.map(_ transform {
+  case a: Attribute => attributeRewrites(a)
+})
+  }
+
+  override def constraints: Set[Expression] = {
+children.map(child => rewriteConstraints(children.head, child,
+  extractConstraintsFromChild(child))).reduce(_ intersect _)
--- End diff --

same style nit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51068499
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -88,6 +88,12 @@ case class Generate(
 
 case class Filter(condition: Expression, child: LogicalPlan) extends 
UnaryNode {
   override def output: Seq[Attribute] = child.output
+
+  override def constraints: Set[Expression] = {
+val newConstraint = splitConjunctivePredicates(condition).filter(
+  _.references.subsetOf(outputSet)).toSet
--- End diff --

style nit: we typically avoid breaking in the middle of a function call and 
instead prefer to break in between calls (always pick the highest syntactic 
level)

```scala
val newConstraint = splitConjunctivePredicates(condition)
  .filter(_.references.subsetOf(outputSet))
  .toSet
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51068376
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -17,16 +17,31 @@
 
 package org.apache.spark.sql.catalyst.plans
 
-import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, 
Expression, VirtualColumn}
+import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.trees.TreeNode
 import org.apache.spark.sql.types.{DataType, StructType}
 
-abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends 
TreeNode[PlanType] {
+abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
+  extends TreeNode[PlanType] with PredicateHelper {
   self: PlanType =>
 
   def output: Seq[Attribute]
 
   /**
+   * Extracts the output property from a given child.
+   */
+  def extractConstraintsFromChild(child: QueryPlan[PlanType]): 
Set[Expression] = {
--- End diff --

`protected`?

Also I'm not sure I get the scala doc.  Maybe `getReleventContraints` is a 
better name?  It is taking the constraints and removing those that don't apply 
anymore because we removed columns right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-175375684
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50161/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-17537
  
**[Test build #50161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50161/consoleFull)**
 for PR 10844 at commit 
[`f15ef96`](https://github.com/apache/spark/commit/f15ef96657603b79a815853fba991835fe3ca50f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-175375683
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-175354244
  
**[Test build #50161 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50161/consoleFull)**
 for PR 10844 at commit 
[`f15ef96`](https://github.com/apache/spark/commit/f15ef96657603b79a815853fba991835fe3ca50f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-174926676
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50083/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-174926671
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-174926239
  
**[Test build #50083 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50083/consoleFull)**
 for PR 10844 at commit 
[`7fb2f9c`](https://github.com/apache/spark/commit/7fb2f9ce01aafe45d720212cb13120eb693a6c06).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-174883150
  
**[Test build #50083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50083/consoleFull)**
 for PR 10844 at commit 
[`7fb2f9c`](https://github.com/apache/spark/commit/7fb2f9ce01aafe45d720212cb13120eb693a6c06).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-21 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-173796157
  
@sameeragarwal I just saw it. Thank you! I will do the code changes after 
this is merged. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-21 Thread sameeragarwal
Github user sameeragarwal commented on the pull request:

https://github.com/apache/spark/pull/10844#issuecomment-173787321
  
@hvanhovell added, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org