[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195168179
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195168182
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52885/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195167992
  
**[Test build #52885 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52885/consoleFull)**
 for PR 11565 at commit 
[`bd35ee7`](https://github.com/apache/spark/commit/bd35ee7144a9b88281f189748fd044df7971dbf3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11565


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195167265
  
nvm, that's my job:)
Thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195165526
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195165530
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52883/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195165383
  
**[Test build #52883 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52883/consoleFull)**
 for PR 11565 at commit 
[`6a59b42`](https://github.com/apache/spark/commit/6a59b4240eaf1c63c6ad2dcb3ce6876ef9f2d189).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195163951
  
Thank you for your detailed review! Actually, the problem was first found 
by you. I thought my PR can save your time, but it sounds like you still waste 
a lot of time on the review. Anyway, really thank you!  @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195152972
  
LGTM, thanks for working on it!
will merge after tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195149566
  
**[Test build #52885 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52885/consoleFull)**
 for PR 11565 at commit 
[`bd35ee7`](https://github.com/apache/spark/commit/bd35ee7144a9b88281f189748fd044df7971dbf3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55782088
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -384,6 +381,9 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(projectList, child) if sameOutput(child.output, 
p.output) => child
 
+// Eliminate no-op Window
+case w: Window if w.windowExpressions.isEmpty => w.child
--- End diff --

sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55781475
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -384,6 +381,9 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(projectList, child) if sameOutput(child.output, 
p.output) => child
 
+// Eliminate no-op Window
+case w: Window if w.windowExpressions.isEmpty => w.child
--- End diff --

move this rule near the rule that filter out useless window expressions, 
which makes people eaiser to understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-195144043
  
**[Test build #52883 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52883/consoleFull)**
 for PR 11565 at commit 
[`6a59b42`](https://github.com/apache/spark/commit/6a59b4240eaf1c63c6ad2dcb3ce6876ef9f2d189).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55776211
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -384,9 +378,21 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(projectList, child) if sameOutput(child.output, 
p.output) => child
 
+// Eliminate no-op Window
+case w: Window if w.windowExpressions.isEmpty => w.child
+
 // Can't prune the columns on LeafNode
 case p @ Project(_, l: LeafNode) => p
 
+// Prune windowExpressions and child of Window
+case p @ Project(_, w: Window) if (w.outputSet -- 
p.references).nonEmpty =>
+  val newWindowExprs = 
w.windowExpressions.filter(p.references.contains)
--- End diff --

Will do it.Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55776014
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -384,9 +378,21 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(projectList, child) if sameOutput(child.output, 
p.output) => child
 
+// Eliminate no-op Window
+case w: Window if w.windowExpressions.isEmpty => w.child
+
 // Can't prune the columns on LeafNode
 case p @ Project(_, l: LeafNode) => p
 
+// Prune windowExpressions and child of Window
+case p @ Project(_, w: Window) if (w.outputSet -- 
p.references).nonEmpty =>
+  val newWindowExprs = 
w.windowExpressions.filter(p.references.contains)
--- End diff --

After rethink about it, seems we can still separate it into 2 rules.
We can add a `def windowOutputSet = 
AttributeSet(windowExpressions.map(_.toAttribute))` to `Window` operator, and 
change the first case to `case p @ Project(_, w: Window) if (w.windowOutputSet 
-- p.references).nonEmpty =>` and filter out useless window expressions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194985436
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52831/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194985432
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194984345
  
**[Test build #52831 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52831/consoleFull)**
 for PR 11565 at commit 
[`fc96d84`](https://github.com/apache/spark/commit/fc96d84ac1ef6bec70e1af911c2cef0447f1514f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194941252
  
**[Test build #52831 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52831/consoleFull)**
 for PR 11565 at commit 
[`fc96d84`](https://github.com/apache/spark/commit/fc96d84ac1ef6bec70e1af911c2cef0447f1514f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194937923
  
Will add more DSL for `Window` in a separate PR, which is doing predicate 
push down for `Window`. Thank you for your detailed review! @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55706168
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
 ---
@@ -258,6 +259,71 @@ class ColumnPruningSuite extends PlanTest {
 comparePlans(optimized1, 
analysis.EliminateSubqueryAliases(correctAnswer1))
   }
 
+  test("Column pruning on Window with useless aggregate functions") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.groupBy('a, 'c, 'd)('a, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).select('a, 'c)
+
+val correctAnswer =
+  input.select('a, 'c, 'd).groupBy('a, 'c, 'd)('a, 'c).analyze
+
+val optimized = Optimize.execute(originalQuery.analyze)
+
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("Column pruning on Window with selected agg expressions") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.select('a, 'b, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).where('window > 1).select('a, 
'c)
+
+val correctAnswer =
+  input.select('a, 'b, 'c)
+.window(WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window) :: Nil,
+  'a :: Nil, 'b.asc :: Nil)
+.select('a, 'c, 'window).select('a, 'c, 'window, 'window)
+.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze
+
+val optimized = Optimize.execute(originalQuery.analyze)
+
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("Column pruning on Window in select") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.select('a, 'b, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).select('a, 'c)
+
+val correctAnswer =
+  input.select('a, 'c).analyze
--- End diff --

Yeah, will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55706094
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
 ---
@@ -258,6 +259,71 @@ class ColumnPruningSuite extends PlanTest {
 comparePlans(optimized1, 
analysis.EliminateSubqueryAliases(correctAnswer1))
   }
 
+  test("Column pruning on Window with useless aggregate functions") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.groupBy('a, 'c, 'd)('a, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).select('a, 'c)
+
+val correctAnswer =
+  input.select('a, 'c, 'd).groupBy('a, 'c, 'd)('a, 'c).analyze
+
+val optimized = Optimize.execute(originalQuery.analyze)
+
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("Column pruning on Window with selected agg expressions") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.select('a, 'b, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).where('window > 1).select('a, 
'c)
+
+val correctAnswer =
+  input.select('a, 'b, 'c)
+.window(WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window) :: Nil,
+  'a :: Nil, 'b.asc :: Nil)
+.select('a, 'c, 'window).select('a, 'c, 'window, 'window)
+.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze
--- End diff --

Will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194791853
  
LGTM exception 2 small comments. It will be better if we can add more DSL 
to build window expression, but I'm ok to do it in follow-ups.

cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55663422
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
 ---
@@ -258,6 +259,71 @@ class ColumnPruningSuite extends PlanTest {
 comparePlans(optimized1, 
analysis.EliminateSubqueryAliases(correctAnswer1))
   }
 
+  test("Column pruning on Window with useless aggregate functions") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.groupBy('a, 'c, 'd)('a, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).select('a, 'c)
+
+val correctAnswer =
+  input.select('a, 'c, 'd).groupBy('a, 'c, 'd)('a, 'c).analyze
+
+val optimized = Optimize.execute(originalQuery.analyze)
+
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("Column pruning on Window with selected agg expressions") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.select('a, 'b, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).where('window > 1).select('a, 
'c)
+
+val correctAnswer =
+  input.select('a, 'b, 'c)
+.window(WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window) :: Nil,
+  'a :: Nil, 'b.asc :: Nil)
+.select('a, 'c, 'window).select('a, 'c, 'window, 'window)
+.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze
+
+val optimized = Optimize.execute(originalQuery.analyze)
+
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("Column pruning on Window in select") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.select('a, 'b, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).select('a, 'c)
+
+val correctAnswer =
+  input.select('a, 'c).analyze
--- End diff --

nit: they can fit in one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55663370
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
 ---
@@ -258,6 +259,71 @@ class ColumnPruningSuite extends PlanTest {
 comparePlans(optimized1, 
analysis.EliminateSubqueryAliases(correctAnswer1))
   }
 
+  test("Column pruning on Window with useless aggregate functions") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.groupBy('a, 'c, 'd)('a, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).select('a, 'c)
+
+val correctAnswer =
+  input.select('a, 'c, 'd).groupBy('a, 'c, 'd)('a, 'c).analyze
+
+val optimized = Optimize.execute(originalQuery.analyze)
+
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("Column pruning on Window with selected agg expressions") {
+val input = LocalRelation('a.int, 'b.string, 'c.double, 'd.int)
+
+val originalQuery =
+  input.select('a, 'b, 'c, 'd,
+WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window)).where('window > 1).select('a, 
'c)
+
+val correctAnswer =
+  input.select('a, 'b, 'c)
+.window(WindowExpression(
+  AggregateExpression(Count('b), Complete, isDistinct = false),
+  WindowSpecDefinition( 'a :: Nil,
+SortOrder('b, Ascending) :: Nil,
+UnspecifiedFrame)).as('window) :: Nil,
+  'a :: Nil, 'b.asc :: Nil)
+.select('a, 'c, 'window).select('a, 'c, 'window, 'window)
+.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze
--- End diff --

It's weird to see 3 selects here, we can add `CollapseProject` rule into 
the optimizer in this test suite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194748290
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52820/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194748285
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194747990
  
**[Test build #52820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52820/consoleFull)**
 for PR 11565 at commit 
[`44326f1`](https://github.com/apache/spark/commit/44326f11dab33d686885b7f038a1034c1723338e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194720152
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194720154
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52818/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194720120
  
**[Test build #52818 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52818/consoleFull)**
 for PR 11565 at commit 
[`6cf6f44`](https://github.com/apache/spark/commit/6cf6f444697c0bc32dbc09906fe144563a7d66df).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-194718830
  
**[Test build #52820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52820/consoleFull)**
 for PR 11565 at commit 
[`44326f1`](https://github.com/apache/spark/commit/44326f11dab33d686885b7f038a1034c1723338e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55643000
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -384,9 +374,25 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(projectList, child) if sameOutput(child.output, 
p.output) => child
 
+// Eliminate no-op Window
+case w: Window if sameOutput(w.child.output, w.output) => w.child
+
+// Convert Aggregate to Project if no aggregate function exists
+case a: Aggregate if !containsAggregates(a.expressions) =>
+  Project(a.aggregateExpressions, a.child)
--- End diff --

Anyway, I think this is a rare case. Will not do it in this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SPARK-13797] [SQL] Remove proje...

2016-03-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55642840
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -384,9 +374,25 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(projectList, child) if sameOutput(child.output, 
p.output) => child
 
+// Eliminate no-op Window
+case w: Window if sameOutput(w.child.output, w.output) => w.child
+
+// Convert Aggregate to Project if no aggregate function exists
+case a: Aggregate if !containsAggregates(a.expressions) =>
+  Project(a.aggregateExpressions, a.child)
--- End diff --

It only makes sense when the grouping and aggregateExpressions are exactly 
identical. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org