[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15857


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-15 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87998688
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -428,43 +428,47 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => 
foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from 
its children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// Allow all leafnodes
+case l: LeafNode =>
+  l
+
+// Whitelist of all nodes we are allowed to apply this rule to.
+case p @ (_: Project | _: Filter | _: SubqueryAlias | _: Aggregate 
| _: Window |
+  _: Sample | _: GlobalLimit | _: LocalLimit | _: Generate 
| _: Distinct |
+  _: AppendColumns | _: AppendColumnsWithObject | _: 
BroadcastHint |
+  _: RedistributeData | _: Repartition | _: Sort | _: 
TypedFilter) if !stop =>
+  p.transformExpressions(replaceFoldable)
+
+// Allow inner joins. We do not allow outer join, although its 
output attributes are
+// derived from its children, they are actually different 
attributes: the output of outer
+// join is not always picked from its children, but can also be 
null.
 // TODO(cloud-fan): It seems more reasonable to use new attributes 
as the output attributes
 // of outer join.
-case j @ Join(_, _, LeftOuter | RightOuter | FullOuter, _) =>
+case j @ Join(_, _, Inner, _) =>
+  j.transformExpressions(replaceFoldable)
+
+// We can fold the projections an expand holds. However expand 
changes the output columns
+// and often reuses the underlying attributes; so we cannot assume 
that a column is still
+// foldable after the expand has been applied.
+case expand: Expand if !stop =>
--- End diff --

I have added the TODO. I think that we really should revisit attribute 
reuse in general (it causes a lot of subtle bugs). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-15 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87998196
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -428,43 +428,47 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => 
foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from 
its children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// Allow all leafnodes
--- End diff --

Improved the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-15 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87983673
  
--- Diff: sql/core/src/test/resources/sql-tests/results/group-by.sql.out ---
@@ -131,3 +131,11 @@ FROM testData
 struct
 -- !query 13 output
 -0.2723801058145729-1.5069204152249134 1   3   
2.142857142857143   0.8095238095238094  0.8997354108424372  15  
7
+
+
+-- !query 14
+SELECT COUNT(DISTINCT b), COUNT(DISTINCT b, c) FROM (SELECT 1 AS a, 2 AS 
b, 3 AS c) GROUP BY a
--- End diff --

This PR fixes this bug (not really a regression, it always has been there). 
So I think I should at least add it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87934920
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala
 ---
@@ -118,14 +118,30 @@ class FoldablePropagationSuite extends PlanTest {
   Seq(
 testRelation.select(Literal(1).as('x), 'a).select('x + 'a),
 testRelation.select(Literal(2).as('x), 'a).select('x + 'a)))
-  .select('x)
 val optimized = Optimize.execute(query.analyze)
 val correctAnswer = Union(
   Seq(
 testRelation.select(Literal(1).as('x), 
'a).select((Literal(1).as('x) + 'a).as("(x + a)")),
 testRelation.select(Literal(2).as('x), 
'a).select((Literal(2).as('x) + 'a).as("(x + a)"
-  .select('x).analyze
--- End diff --

We are not checking whether it is resolvable or not. See the PR 
https://github.com/apache/spark/pull/15417


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87932854
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -428,43 +428,47 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => 
foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from 
its children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// Allow all leafnodes
--- End diff --

ah i see


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87932789
  
--- Diff: sql/core/src/test/resources/sql-tests/results/group-by.sql.out ---
@@ -131,3 +131,11 @@ FROM testData
 struct
 -- !query 13 output
 -0.2723801058145729-1.5069204152249134 1   3   
2.142857142857143   0.8095238095238094  0.8997354108424372  15  
7
+
+
+-- !query 14
+SELECT COUNT(DISTINCT b), COUNT(DISTINCT b, c) FROM (SELECT 1 AS a, 2 AS 
b, 3 AS c) GROUP BY a
--- End diff --

is it also a regression test? I think you are just fixing `Expand` in this 
PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87932636
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala
 ---
@@ -118,14 +118,30 @@ class FoldablePropagationSuite extends PlanTest {
   Seq(
 testRelation.select(Literal(1).as('x), 'a).select('x + 'a),
 testRelation.select(Literal(2).as('x), 'a).select('x + 'a)))
-  .select('x)
 val optimized = Optimize.execute(query.analyze)
 val correctAnswer = Union(
   Seq(
 testRelation.select(Literal(1).as('x), 
'a).select((Literal(1).as('x) + 'a).as("(x + a)")),
 testRelation.select(Literal(2).as('x), 
'a).select((Literal(2).as('x) + 'a).as("(x + a)"
-  .select('x).analyze
--- End diff --

how can this test pass before...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87932525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -428,43 +428,47 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => 
foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from 
its children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// Allow all leafnodes
--- End diff --

LeafNodes should not stop the folding process. That is what I am trying to 
dat.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87932494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -428,43 +428,47 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => 
foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from 
its children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// Allow all leafnodes
+case l: LeafNode =>
+  l
+
+// Whitelist of all nodes we are allowed to apply this rule to.
+case p @ (_: Project | _: Filter | _: SubqueryAlias | _: Aggregate 
| _: Window |
+  _: Sample | _: GlobalLimit | _: LocalLimit | _: Generate 
| _: Distinct |
+  _: AppendColumns | _: AppendColumnsWithObject | _: 
BroadcastHint |
+  _: RedistributeData | _: Repartition | _: Sort | _: 
TypedFilter) if !stop =>
+  p.transformExpressions(replaceFoldable)
+
+// Allow inner joins. We do not allow outer join, although its 
output attributes are
+// derived from its children, they are actually different 
attributes: the output of outer
+// join is not always picked from its children, but can also be 
null.
 // TODO(cloud-fan): It seems more reasonable to use new attributes 
as the output attributes
 // of outer join.
-case j @ Join(_, _, LeftOuter | RightOuter | FullOuter, _) =>
+case j @ Join(_, _, Inner, _) =>
+  j.transformExpressions(replaceFoldable)
+
+// We can fold the projections an expand holds. However expand 
changes the output columns
+// and often reuses the underlying attributes; so we cannot assume 
that a column is still
+// foldable after the expand has been applied.
+case expand: Expand if !stop =>
--- End diff --

should we add a TODO that `Expand` should always output new attributes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87865899
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -428,43 +428,47 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => 
foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from 
its children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// Allow all leafnodes
+case l: LeafNode =>
+  l
+
+// Whitelist of all nodes we are allowed to apply this rule to.
+case p @ (_: Project | _: Filter | _: SubqueryAlias | _: Aggregate 
| _: Window |
+  _: Sample | _: GlobalLimit | _: LocalLimit | _: Generate 
| _: Distinct |
--- End diff --

Let us keep it. I think your concern is right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87865485
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala
 ---
@@ -118,14 +118,30 @@ class FoldablePropagationSuite extends PlanTest {
   Seq(
 testRelation.select(Literal(1).as('x), 'a).select('x + 'a),
 testRelation.select(Literal(2).as('x), 'a).select('x + 'a)))
-  .select('x)
--- End diff --

Yeah let me change that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87865434
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -428,43 +428,47 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => 
foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from 
its children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// Allow all leafnodes
+case l: LeafNode =>
+  l
+
+// Whitelist of all nodes we are allowed to apply this rule to.
+case p @ (_: Project | _: Filter | _: SubqueryAlias | _: Aggregate 
| _: Window |
+  _: Sample | _: GlobalLimit | _: LocalLimit | _: Generate 
| _: Distinct |
--- End diff --

That is true. I only added the nodes of which I know they can be produced 
by analysis (like Distinct). I didn't want to rely on optimizer ordering. I can 
remove it if you want me to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87862736
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala
 ---
@@ -118,14 +118,30 @@ class FoldablePropagationSuite extends PlanTest {
   Seq(
 testRelation.select(Literal(1).as('x), 'a).select('x + 'a),
 testRelation.select(Literal(2).as('x), 'a).select('x + 'a)))
-  .select('x)
--- End diff --

```
  test("Propagate in subqueries of Union queries") {
val query = Union(
  Seq(
testRelation.select(Literal(1).as('x), 'a).select('x, 'x + 'a),
testRelation.select(Literal(2).as('x), 'a).select('x, 'x + 'a)))
  .select('x)
val optimized = Optimize.execute(query.analyze)
val correctAnswer = Union(
  Seq(
testRelation.select(Literal(1).as('x), 'a)
  .select(Literal(1).as('x), (Literal(1).as('x) + 'a).as("(x + 
a)")),
testRelation.select(Literal(2).as('x), 'a)
  .select(Literal(2).as('x), (Literal(2).as('x) + 'a).as("(x + 
a)"
  .select('x).analyze

comparePlans(optimized, correctAnswer)
  }
```

How about this? The `x` (under `Union`) is transformed as the alias of a 
literal but the top `x` (above `Union`) is not propagated. This can validate 
the white list does not include `Union`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87861276
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -428,43 +428,47 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => 
foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from 
its children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// Allow all leafnodes
+case l: LeafNode =>
+  l
+
+// Whitelist of all nodes we are allowed to apply this rule to.
+case p @ (_: Project | _: Filter | _: SubqueryAlias | _: Aggregate 
| _: Window |
+  _: Sample | _: GlobalLimit | _: LocalLimit | _: Generate 
| _: Distinct |
--- End diff --

`Distinct` will be replaced by our `Optimizer` at the beginning, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87809078
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala
 ---
@@ -118,14 +118,30 @@ class FoldablePropagationSuite extends PlanTest {
   Seq(
 testRelation.select(Literal(1).as('x), 'a).select('x + 'a),
 testRelation.select(Literal(2).as('x), 'a).select('x + 'a)))
-  .select('x)
--- End diff --

`x` does not exist. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87809029
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -428,43 +428,47 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   }
   case _ => Nil
 })
+val replaceFoldable: PartialFunction[Expression, Expression] = {
+  case a: AttributeReference if foldableMap.contains(a) => 
foldableMap(a)
+}
 
 if (foldableMap.isEmpty) {
   plan
 } else {
   var stop = false
   CleanupAliases(plan.transformUp {
-case u: Union =>
-  stop = true
-  u
-case c: Command =>
-  stop = true
-  c
-// For outer join, although its output attributes are derived from 
its children, they are
-// actually different attributes: the output of outer join is not 
always picked from its
-// children, but can also be null.
+// Allow all leafnodes
+case l: LeafNode =>
+  l
+
+// Whitelist of all nodes we are allowed to apply this rule to.
+case p @ (_: Project | _: Filter | _: SubqueryAlias | _: Aggregate 
| _: Window |
--- End diff --

@gatorsmile @dongjoon-hyun @cloud-fan I have created the following white 
list. Could you guys take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87746263
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -449,7 +452,7 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   stop = true
   j
 
-// These 3 operators take attributes as constructor parameters, 
and these attributes
+// These 4 operators take attributes as constructor parameters, 
and these attributes
--- End diff --

If you think so, I have no objection for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15857#discussion_r87743162
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -449,7 +452,7 @@ object FoldablePropagation extends Rule[LogicalPlan] {
   stop = true
   j
 
-// These 3 operators take attributes as constructor parameters, 
and these attributes
+// These 4 operators take attributes as constructor parameters, 
and these attributes
--- End diff --

shall we turn to whitelist? As I remember there have been more than 3 bugs 
reported on this optimizer rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15857: [SPARK-18300][SQL] Do not apply foldable propagat...

2016-11-11 Thread hvanhovell
GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/15857

[SPARK-18300][SQL] Do not apply foldable propagation with expand as a child.

## What changes were proposed in this pull request?
The `FoldablePropagation` optimizer rule, pulls foldable values out from 
under an `Expand`. This breaks the `Expand` in two ways:

- It rewrites the output attributes of the `Expand`. We explicitly define 
output attributes for `Expand`, these are (unfortunately) considered as part of 
the expressions of the `Expand` and can be rewritten.
- Expand can actually change the column (it will typically re-use the 
attributes or the underlying plan). This means that we cannot safely propagate 
the expressions from under an `Expand`.

## How was this patch tested?
Added tests to `FoldablePropagationSuite` and to `SQLQueryTestSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-18300

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15857.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15857


commit d98f8f930330b368fbeade07d435197b0cdd2228
Author: Herman van Hovell 
Date:   2016-11-11T23:39:44Z

Do not apply foldable propagation with expand as a child.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org