[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-205729777 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54975/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-205729750 **[Test build #54975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54975/consoleFull)** for PR 11298 at commit [`50071e2`](https://github.com/apache/spark/commit/50071e2cf9c76fd1919c437500b82bfd287b5f95). * This patch **fails R style tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-205729764 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-205728972 **[Test build #54975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54975/consoleFull)** for PR 11298 at commit [`50071e2`](https://github.com/apache/spark/commit/50071e2cf9c76fd1919c437500b82bfd287b5f95). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11298#discussion_r53902571 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -607,14 +607,21 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions case sa @ Sort(_, _, child: Aggregate) => sa - + case sa @ Sort(_, _, _) +if sa.order.exists(so => ResolveAggregateFunctions.containsAggregate(so.child)) => sa case s @ Sort(order, _, child) if !s.resolved && child.resolved => try { val newOrder = order.map(resolveExpressionRecursively(_, child).asInstanceOf[SortOrder]) val requiredAttrs = AttributeSet(newOrder).filter(_.resolved) val missingAttrs = requiredAttrs -- child.outputSet - if (missingAttrs.nonEmpty) { -// Add missing attributes and then project them away after the sort. + // resolve the unresolved functions for knowing if they are aggregate functions + val evaluatedSort = +ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, child)).asInstanceOf[Sort] + if (evaluatedSort.order.exists(so => + ResolveAggregateFunctions.containsAggregate(so.child))) { +// Skip sort with aggregate. This will be handled in ResolveAggregateFunctions +s.copy(order = evaluatedSort.order) + } else if (missingAttrs.nonEmpty) { --- End diff -- nevermind. I got it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11298#discussion_r53902128 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -607,14 +607,21 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions case sa @ Sort(_, _, child: Aggregate) => sa - + case sa @ Sort(_, _, _) +if sa.order.exists(so => ResolveAggregateFunctions.containsAggregate(so.child)) => sa case s @ Sort(order, _, child) if !s.resolved && child.resolved => try { val newOrder = order.map(resolveExpressionRecursively(_, child).asInstanceOf[SortOrder]) val requiredAttrs = AttributeSet(newOrder).filter(_.resolved) val missingAttrs = requiredAttrs -- child.outputSet - if (missingAttrs.nonEmpty) { -// Add missing attributes and then project them away after the sort. + // resolve the unresolved functions for knowing if they are aggregate functions + val evaluatedSort = +ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, child)).asInstanceOf[Sort] + if (evaluatedSort.order.exists(so => + ResolveAggregateFunctions.containsAggregate(so.child))) { +// Skip sort with aggregate. This will be handled in ResolveAggregateFunctions +s.copy(order = evaluatedSort.order) + } else if (missingAttrs.nonEmpty) { --- End diff -- Looks like if the sort order has non empty missingAttrs, it will be ignored now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11298#discussion_r53902037 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -607,14 +607,21 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions case sa @ Sort(_, _, child: Aggregate) => sa - + case sa @ Sort(_, _, _) +if sa.order.exists(so => ResolveAggregateFunctions.containsAggregate(so.child)) => sa case s @ Sort(order, _, child) if !s.resolved && child.resolved => try { val newOrder = order.map(resolveExpressionRecursively(_, child).asInstanceOf[SortOrder]) val requiredAttrs = AttributeSet(newOrder).filter(_.resolved) val missingAttrs = requiredAttrs -- child.outputSet - if (missingAttrs.nonEmpty) { -// Add missing attributes and then project them away after the sort. + // resolve the unresolved functions for knowing if they are aggregate functions + val evaluatedSort = +ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, child)).asInstanceOf[Sort] + if (evaluatedSort.order.exists(so => + ResolveAggregateFunctions.containsAggregate(so.child))) { +// Skip sort with aggregate. This will be handled in ResolveAggregateFunctions +s.copy(order = evaluatedSort.order) + } else if (missingAttrs.nonEmpty) { --- End diff -- But it is only possible when the condition `sort.order.exists(containsAggregate) || sort.child.isInstanceOf[Aggregate]` is true, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/11298#discussion_r53900047 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -607,14 +607,21 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions case sa @ Sort(_, _, child: Aggregate) => sa - + case sa @ Sort(_, _, _) +if sa.order.exists(so => ResolveAggregateFunctions.containsAggregate(so.child)) => sa case s @ Sort(order, _, child) if !s.resolved && child.resolved => try { val newOrder = order.map(resolveExpressionRecursively(_, child).asInstanceOf[SortOrder]) val requiredAttrs = AttributeSet(newOrder).filter(_.resolved) val missingAttrs = requiredAttrs -- child.outputSet - if (missingAttrs.nonEmpty) { -// Add missing attributes and then project them away after the sort. + // resolve the unresolved functions for knowing if they are aggregate functions + val evaluatedSort = +ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, child)).asInstanceOf[Sort] + if (evaluatedSort.order.exists(so => + ResolveAggregateFunctions.containsAggregate(so.child))) { +// Skip sort with aggregate. This will be handled in ResolveAggregateFunctions +s.copy(order = evaluatedSort.order) + } else if (missingAttrs.nonEmpty) { --- End diff -- In this case, the missing attributes will be handled by `ResolveAggregateFunctions`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/11298#discussion_r53899820 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -607,14 +607,21 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions case sa @ Sort(_, _, child: Aggregate) => sa - + case sa @ Sort(_, _, _) +if sa.order.exists(so => ResolveAggregateFunctions.containsAggregate(so.child)) => sa case s @ Sort(order, _, child) if !s.resolved && child.resolved => try { val newOrder = order.map(resolveExpressionRecursively(_, child).asInstanceOf[SortOrder]) val requiredAttrs = AttributeSet(newOrder).filter(_.resolved) val missingAttrs = requiredAttrs -- child.outputSet - if (missingAttrs.nonEmpty) { -// Add missing attributes and then project them away after the sort. + // resolve the unresolved functions for knowing if they are aggregate functions + val evaluatedSort = +ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, child)).asInstanceOf[Sort] + if (evaluatedSort.order.exists(so => + ResolveAggregateFunctions.containsAggregate(so.child))) { --- End diff -- If we do not evaluate the functions, we are unable to know if this function is aggregate functions. Another idea might be a multi-pass solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11298#discussion_r53761104 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -607,14 +607,21 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions case sa @ Sort(_, _, child: Aggregate) => sa - + case sa @ Sort(_, _, _) +if sa.order.exists(so => ResolveAggregateFunctions.containsAggregate(so.child)) => sa case s @ Sort(order, _, child) if !s.resolved && child.resolved => try { val newOrder = order.map(resolveExpressionRecursively(_, child).asInstanceOf[SortOrder]) val requiredAttrs = AttributeSet(newOrder).filter(_.resolved) val missingAttrs = requiredAttrs -- child.outputSet - if (missingAttrs.nonEmpty) { -// Add missing attributes and then project them away after the sort. + // resolve the unresolved functions for knowing if they are aggregate functions + val evaluatedSort = +ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, child)).asInstanceOf[Sort] + if (evaluatedSort.order.exists(so => + ResolveAggregateFunctions.containsAggregate(so.child))) { +// Skip sort with aggregate. This will be handled in ResolveAggregateFunctions +s.copy(order = evaluatedSort.order) + } else if (missingAttrs.nonEmpty) { --- End diff -- Will this branch be run? If there are missing attributes but because the sort order contains aggregate functions so it always hit the first condition? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11298#discussion_r53759608 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -607,14 +607,21 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions case sa @ Sort(_, _, child: Aggregate) => sa - + case sa @ Sort(_, _, _) +if sa.order.exists(so => ResolveAggregateFunctions.containsAggregate(so.child)) => sa case s @ Sort(order, _, child) if !s.resolved && child.resolved => try { val newOrder = order.map(resolveExpressionRecursively(_, child).asInstanceOf[SortOrder]) val requiredAttrs = AttributeSet(newOrder).filter(_.resolved) val missingAttrs = requiredAttrs -- child.outputSet - if (missingAttrs.nonEmpty) { -// Add missing attributes and then project them away after the sort. + // resolve the unresolved functions for knowing if they are aggregate functions + val evaluatedSort = +ResolveFunctions.resolveFunctions(Sort(newOrder, s.global, child)).asInstanceOf[Sort] + if (evaluatedSort.order.exists(so => + ResolveAggregateFunctions.containsAggregate(so.child))) { --- End diff -- Can we move this part of resolving functions of Sort as another pattern matching case? So this original pattern matching case can be untouched? I think it will more clear. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-187031165 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51658/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-187031164 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-187030953 **[Test build #51658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51658/consoleFull)** for PR 11298 at commit [`50071e2`](https://github.com/apache/spark/commit/50071e2cf9c76fd1919c437500b82bfd287b5f95). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-187009069 cc @davies @cloud-fan . This PR is like an incremental patch with few lines of code changes. I am not sure if the whole rewrite is worthy for resolving the above two issues only when there exist the other operators between `Sort` and `Aggregate`? After all, users can manually correct their query if they hit a resolution issue when missing attributes in Sort happen. We already cover most of cases. : ) Let me know if you want a whole rewrite or just an incremental fix. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/11298#discussion_r53585798 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -607,14 +607,21 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions case sa @ Sort(_, _, child: Aggregate) => sa --- End diff -- The above two issues have been covered in the rule `ResolveAggregateFunctions` when `Aggregate` is the child of `Sort`. Thus, the current PR keeps the existing logics without any change. This PR is to introduce a fix for aggregate functions when there exists the other operators between `Aggregate` and `Sort`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-187005048 # Issue 2: mixture of aliases and real columns in order by clause ```SQL SELECT key as k, value as v, sum(value) FROM src GROUP BY key, value ORDER BY k, value ``` ``` == Parsed Logical Plan == 'Sort ['k ASC,'value ASC], true +- 'Aggregate ['key,'value], [unresolvedalias('key AS k#47,None),unresolvedalias('value AS v#48,None),unresolvedalias('sum('value),None)] +- 'UnresolvedRelation `src`, None ``` In the above case, we need to replace the actual column name `value` by the alias name `v`. Otherwise, we have to introduce `k` in all the nodes between `Sort` and `Aggregate`. ``` Project [k#47,v#48,_c2#54L] +- Sort [k#47 ASC,v#48 ASC], true +- Aggregate [key#45,value#46], [key#45 AS k#47,value#46 AS v#48,(sum(cast(value#46 as bigint)),mode=Complete,isDistinct=false) AS _c2#54L] +- Subquery src +- Project [_1#43 AS key#45,_2#44 AS value#46] +- LocalRelation [_1#43,_2#44], [[1,1],[-1,1]] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-187001189 # Issue 1: ```SQL SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1 ``` `key + 1` is not an aggregated function, but we still need to push it down into `Aggregate`, if we do not have `Project` and `Window` between `Sort` and `Aggregate`. ``` == Parsed Logical Plan == 'Sort [('key + 1) ASC], true +- 'Aggregate [('key + 1)], [unresolvedalias('MAX('value),None)] +- 'UnresolvedRelation `src`, None ``` When that happens, we need to treat it as an aggregated function. The plan will be like: ``` Project [_c0#50] +- Sort [aggOrder#51 ASC], true +- Aggregate [(key#45 + 1)], [(max(value#46),mode=Complete,isDistinct=false) AS _c0#50,(key#45 + 1) AS aggOrder#51] +- Subquery src +- Project [_1#43 AS key#45,_2#44 AS value#46] +- LocalRelation [_1#43,_2#44], [[1,1],[-1,1]] ``` Let me show another case that does not need to treat `key+1` as an aggregate function. ```SQL SELECT value FROM src ORDER BY key + 1 ``` ``` == Parsed Logical Plan == 'Sort [('key + 1) ASC], true +- 'Project [unresolvedalias('value,None)] +- 'UnresolvedRelation `src`, None ``` In this case, we just need to push `key` into `Project`. ``` Project [value#46] +- Sort [(key#45 + 1) ASC], true +- Project [value#46,key#45] +- Subquery src +- Project [_1#43 AS key#45,_2#44 AS value#46] +- LocalRelation [_1#43,_2#44], [[1,1],[-1,1]] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-186998813 **[Test build #51658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51658/consoleFull)** for PR 11298 at commit [`50071e2`](https://github.com/apache/spark/commit/50071e2cf9c76fd1919c437500b82bfd287b5f95). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11298#issuecomment-186996652 This PR is an incremental fix based on the previous solution for resolving missing attributes in Sort. This is not a clean fix, I like. However, to do a clean fix, we need to rewrite a lot of code in `ResolveSortReferences`. I am not sure if we should do it. Will explain the issues I hit when resolving this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13428] [SQL] Pushing Down Aggregate Exp...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/11298 [SPARK-13428] [SQL] Pushing Down Aggregate Expressions in Sort into Aggregate ## What changes were proposed in this pull request? When there exists the other operators between Sort and Aggregate, we are unable to push down the aggregate expressions in `Sort` into `Aggregate`. For example, in the following query, ```SQL select area, sum(product) over () as c from windowData where product > 3 group by area, product having avg(month) > 0 order by area, avg(month), product ``` The parsed logical plan is like ``` 'Sort ['area ASC,'avg('month) ASC,'product ASC], true +- 'Filter cast(('avg('month) > 0) as boolean) +- 'Aggregate ['area,'product], [unresolvedalias('area,None),unresolvedalias('sum('product) windowspecdefinition(UnspecifiedFrame) AS c#3,None)] +- 'Filter ('product > 3) +- 'UnresolvedRelation `windowData`, None ``` ## How was the this patch tested? Turn on a test case that `test("window function: Pushing aggregate Expressions in Sort to Aggregate")` exposes this issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark aggExprInSortBy Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11298.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11298 commit 50071e2cf9c76fd1919c437500b82bfd287b5f95 Author: gatorsmile Date: 2016-02-22T03:57:45Z pushing down aggregate expressions in Sort into Aggregate Operator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org