[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-205729777
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54975/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-205729750
  
**[Test build #54975 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54975/consoleFull)**
 for PR 11298 at commit 
[`50071e2`](https://github.com/apache/spark/commit/50071e2cf9c76fd1919c437500b82bfd287b5f95).
 * This patch **fails R style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-205729764
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-205728972
  
**[Test build #54975 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54975/consoleFull)**
 for PR 11298 at commit 
[`50071e2`](https://github.com/apache/spark/commit/50071e2cf9c76fd1919c437500b82bfd287b5f95).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11298#discussion_r53902571
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -607,14 +607,21 @@ class Analyzer(
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
-
+  case sa @ Sort(_, _, _)
+if sa.order.exists(so => 
ResolveAggregateFunctions.containsAggregate(so.child)) => sa
   case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
 try {
   val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
   val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
   val missingAttrs = requiredAttrs -- child.outputSet
-  if (missingAttrs.nonEmpty) {
-// Add missing attributes and then project them away after the 
sort.
+  // resolve the unresolved functions for knowing if they are 
aggregate functions
+  val evaluatedSort =
+ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, 
child)).asInstanceOf[Sort]
+  if (evaluatedSort.order.exists(so =>
+  ResolveAggregateFunctions.containsAggregate(so.child))) {
+// Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
+s.copy(order = evaluatedSort.order)
+  } else if (missingAttrs.nonEmpty) {
--- End diff --

nevermind. I got it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11298#discussion_r53902128
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -607,14 +607,21 @@ class Analyzer(
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
-
+  case sa @ Sort(_, _, _)
+if sa.order.exists(so => 
ResolveAggregateFunctions.containsAggregate(so.child)) => sa
   case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
 try {
   val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
   val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
   val missingAttrs = requiredAttrs -- child.outputSet
-  if (missingAttrs.nonEmpty) {
-// Add missing attributes and then project them away after the 
sort.
+  // resolve the unresolved functions for knowing if they are 
aggregate functions
+  val evaluatedSort =
+ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, 
child)).asInstanceOf[Sort]
+  if (evaluatedSort.order.exists(so =>
+  ResolveAggregateFunctions.containsAggregate(so.child))) {
+// Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
+s.copy(order = evaluatedSort.order)
+  } else if (missingAttrs.nonEmpty) {
--- End diff --

Looks like if the sort order has non empty missingAttrs, it will be ignored 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11298#discussion_r53902037
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -607,14 +607,21 @@ class Analyzer(
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
-
+  case sa @ Sort(_, _, _)
+if sa.order.exists(so => 
ResolveAggregateFunctions.containsAggregate(so.child)) => sa
   case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
 try {
   val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
   val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
   val missingAttrs = requiredAttrs -- child.outputSet
-  if (missingAttrs.nonEmpty) {
-// Add missing attributes and then project them away after the 
sort.
+  // resolve the unresolved functions for knowing if they are 
aggregate functions
+  val evaluatedSort =
+ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, 
child)).asInstanceOf[Sort]
+  if (evaluatedSort.order.exists(so =>
+  ResolveAggregateFunctions.containsAggregate(so.child))) {
+// Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
+s.copy(order = evaluatedSort.order)
+  } else if (missingAttrs.nonEmpty) {
--- End diff --

But it is only possible when the condition 
`sort.order.exists(containsAggregate) || sort.child.isInstanceOf[Aggregate]` is 
true, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11298#discussion_r53900047
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -607,14 +607,21 @@ class Analyzer(
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
-
+  case sa @ Sort(_, _, _)
+if sa.order.exists(so => 
ResolveAggregateFunctions.containsAggregate(so.child)) => sa
   case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
 try {
   val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
   val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
   val missingAttrs = requiredAttrs -- child.outputSet
-  if (missingAttrs.nonEmpty) {
-// Add missing attributes and then project them away after the 
sort.
+  // resolve the unresolved functions for knowing if they are 
aggregate functions
+  val evaluatedSort =
+ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, 
child)).asInstanceOf[Sort]
+  if (evaluatedSort.order.exists(so =>
+  ResolveAggregateFunctions.containsAggregate(so.child))) {
+// Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
+s.copy(order = evaluatedSort.order)
+  } else if (missingAttrs.nonEmpty) {
--- End diff --

In this case, the missing attributes will be handled by 
`ResolveAggregateFunctions`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11298#discussion_r53899820
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -607,14 +607,21 @@ class Analyzer(
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
-
+  case sa @ Sort(_, _, _)
+if sa.order.exists(so => 
ResolveAggregateFunctions.containsAggregate(so.child)) => sa
   case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
 try {
   val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
   val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
   val missingAttrs = requiredAttrs -- child.outputSet
-  if (missingAttrs.nonEmpty) {
-// Add missing attributes and then project them away after the 
sort.
+  // resolve the unresolved functions for knowing if they are 
aggregate functions
+  val evaluatedSort =
+ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, 
child)).asInstanceOf[Sort]
+  if (evaluatedSort.order.exists(so =>
+  ResolveAggregateFunctions.containsAggregate(so.child))) {
--- End diff --

If we do not evaluate the functions, we are unable to know if this function 
is aggregate functions. Another idea might be a multi-pass solution. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11298#discussion_r53761104
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -607,14 +607,21 @@ class Analyzer(
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
-
+  case sa @ Sort(_, _, _)
+if sa.order.exists(so => 
ResolveAggregateFunctions.containsAggregate(so.child)) => sa
   case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
 try {
   val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
   val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
   val missingAttrs = requiredAttrs -- child.outputSet
-  if (missingAttrs.nonEmpty) {
-// Add missing attributes and then project them away after the 
sort.
+  // resolve the unresolved functions for knowing if they are 
aggregate functions
+  val evaluatedSort =
+ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, 
child)).asInstanceOf[Sort]
+  if (evaluatedSort.order.exists(so =>
+  ResolveAggregateFunctions.containsAggregate(so.child))) {
+// Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
+s.copy(order = evaluatedSort.order)
+  } else if (missingAttrs.nonEmpty) {
--- End diff --

Will this branch be run? If there are missing attributes but because the 
sort order contains aggregate functions so it always hit the first condition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11298#discussion_r53759608
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -607,14 +607,21 @@ class Analyzer(
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
-
+  case sa @ Sort(_, _, _)
+if sa.order.exists(so => 
ResolveAggregateFunctions.containsAggregate(so.child)) => sa
   case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
 try {
   val newOrder = order.map(resolveExpressionRecursively(_, 
child).asInstanceOf[SortOrder])
   val requiredAttrs = AttributeSet(newOrder).filter(_.resolved)
   val missingAttrs = requiredAttrs -- child.outputSet
-  if (missingAttrs.nonEmpty) {
-// Add missing attributes and then project them away after the 
sort.
+  // resolve the unresolved functions for knowing if they are 
aggregate functions
+  val evaluatedSort =
+ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, 
child)).asInstanceOf[Sort]
+  if (evaluatedSort.order.exists(so =>
+  ResolveAggregateFunctions.containsAggregate(so.child))) {
--- End diff --

Can we move this part of resolving functions of Sort as another pattern 
matching case? So this original pattern matching case can be untouched? I think 
it will more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-187031165
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51658/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-187031164
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-187030953
  
**[Test build #51658 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51658/consoleFull)**
 for PR 11298 at commit 
[`50071e2`](https://github.com/apache/spark/commit/50071e2cf9c76fd1919c437500b82bfd287b5f95).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-187009069
  
cc @davies @cloud-fan . This PR is like an incremental patch with few lines 
of code changes. I am not sure if the whole rewrite is worthy for resolving the 
above two issues only when there exist the other operators between `Sort` and 
`Aggregate`? After all, users can manually correct their query if they hit a 
resolution issue when missing attributes in Sort happen. We already cover most 
of cases. : ) 

Let me know if you want a whole rewrite or just an incremental fix. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11298#discussion_r53585798
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -607,14 +607,21 @@ class Analyzer(
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   // Skip sort with aggregate. This will be handled in 
ResolveAggregateFunctions
   case sa @ Sort(_, _, child: Aggregate) => sa
--- End diff --

The above two issues have been covered in the rule 
`ResolveAggregateFunctions` when `Aggregate` is the child of `Sort`. Thus, the 
current PR keeps the existing logics without any change. 

This PR is to introduce a fix for aggregate functions when there exists the 
other operators between `Aggregate` and `Sort`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-187005048
  
# Issue 2: mixture of aliases and real columns in order by clause
```SQL
SELECT key as k, value as v, sum(value) FROM src GROUP BY key, value ORDER 
BY k, value
```
```
== Parsed Logical Plan ==
'Sort ['k ASC,'value ASC], true
+- 'Aggregate ['key,'value], [unresolvedalias('key AS 
k#47,None),unresolvedalias('value AS 
v#48,None),unresolvedalias('sum('value),None)]
   +- 'UnresolvedRelation `src`, None
```
In the above case, we need to replace the actual column name `value` by the 
alias name `v`. Otherwise, we have to introduce `k` in all the nodes between 
`Sort` and `Aggregate`.
```
Project [k#47,v#48,_c2#54L]
+- Sort [k#47 ASC,v#48 ASC], true
   +- Aggregate [key#45,value#46], [key#45 AS k#47,value#46 AS 
v#48,(sum(cast(value#46 as bigint)),mode=Complete,isDistinct=false) AS _c2#54L]
  +- Subquery src
 +- Project [_1#43 AS key#45,_2#44 AS value#46]
+- LocalRelation [_1#43,_2#44], [[1,1],[-1,1]]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-187001189
  
# Issue 1:
```SQL
SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1
```
`key + 1` is not an aggregated function, but we still need to push it down 
into `Aggregate`, if we do not have `Project` and `Window` between `Sort` and 
`Aggregate`.
```
== Parsed Logical Plan ==
'Sort [('key + 1) ASC], true
+- 'Aggregate [('key + 1)], [unresolvedalias('MAX('value),None)]
   +- 'UnresolvedRelation `src`, None
```
When that happens, we need to treat it as an aggregated function. The plan 
will be like:
```
Project [_c0#50]
+- Sort [aggOrder#51 ASC], true
   +- Aggregate [(key#45 + 1)], 
[(max(value#46),mode=Complete,isDistinct=false) AS _c0#50,(key#45 + 1) AS 
aggOrder#51]
  +- Subquery src
 +- Project [_1#43 AS key#45,_2#44 AS value#46]
+- LocalRelation [_1#43,_2#44], [[1,1],[-1,1]]
```

Let me show another case that does not need to treat `key+1` as an 
aggregate function. 
```SQL
SELECT value FROM src ORDER BY key + 1
```
```
== Parsed Logical Plan ==
'Sort [('key + 1) ASC], true
+- 'Project [unresolvedalias('value,None)]
   +- 'UnresolvedRelation `src`, None
```
In this case, we just need to push `key` into `Project`. 
```
Project [value#46]
+- Sort [(key#45 + 1) ASC], true
   +- Project [value#46,key#45]
  +- Subquery src
 +- Project [_1#43 AS key#45,_2#44 AS value#46]
+- LocalRelation [_1#43,_2#44], [[1,1],[-1,1]]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-186998813
  
**[Test build #51658 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51658/consoleFull)**
 for PR 11298 at commit 
[`50071e2`](https://github.com/apache/spark/commit/50071e2cf9c76fd1919c437500b82bfd287b5f95).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11298#issuecomment-186996652
  
This PR is an incremental fix based on the previous solution for resolving 
missing attributes in Sort. This is not a clean fix, I like. However, to do a 
clean fix, we need to rewrite a lot of code in `ResolveSortReferences`. I am 
not sure if we should do it. Will explain the issues I hit when resolving this 
one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...

2016-02-21 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/11298

[SPARK-13428] [SQL] Pushing Down Aggregate Expressions in Sort into 
Aggregate

## What changes were proposed in this pull request?
When there exists the other operators between Sort and Aggregate, we are 
unable to push down the aggregate expressions in `Sort` into `Aggregate`. For 
example, in the following query, 

```SQL
select area, sum(product) over () as c from windowData
where product > 3 group by area, product
having avg(month) > 0 order by area, avg(month), product
```
The parsed logical plan is like
```
'Sort ['area ASC,'avg('month) ASC,'product ASC], true
+- 'Filter cast(('avg('month) > 0) as boolean)
   +- 'Aggregate ['area,'product], 
[unresolvedalias('area,None),unresolvedalias('sum('product) 
windowspecdefinition(UnspecifiedFrame) AS c#3,None)]
  +- 'Filter ('product > 3)
 +- 'UnresolvedRelation `windowData`, None
```
## How was the this patch tested?
Turn on a test case that `test("window function: Pushing aggregate 
Expressions in Sort to Aggregate")` exposes this issue.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark aggExprInSortBy

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11298


commit 50071e2cf9c76fd1919c437500b82bfd287b5f95
Author: gatorsmile 
Date:   2016-02-22T03:57:45Z

pushing down aggregate expressions in Sort into Aggregate Operator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org