[GitHub] [spark] maropu commented on a change in pull request #28869: [SPARK-32031][SQL] Fix the wrong references of the PartialMerge/Final AggregateExpression

2020-06-19 Thread GitBox


maropu commented on a change in pull request #28869:
URL: https://github.com/apache/spark/pull/28869#discussion_r442764355



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortAggregateExec.scala
##
@@ -49,7 +49,7 @@ case class SortAggregateExec(
   override def producedAttributes: AttributeSet =
 AttributeSet(aggregateAttributes) ++
   
AttributeSet(resultExpressions.diff(groupingExpressions).map(_.toAttribute)) ++
-  AttributeSet(aggregateBufferAttributes)
+  AttributeSet(aggregateBufferAttributes) ++ super.producedAttributes

Review comment:
   (Tihs is not related to this PR though) `producedAttributes` looks the 
same between the three aggregate implementations, so could we move 
`producedAttributes` into `BaseAggregateExec`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28869: [SPARK-32031][SQL] Fix the wrong references of the PartialMerge/Final AggregateExpression

2020-06-19 Thread GitBox


maropu commented on a change in pull request #28869:
URL: https://github.com/apache/spark/pull/28869#discussion_r442761506



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala
##
@@ -53,15 +53,24 @@ trait BaseAggregateExec extends UnaryExecNode {
   // can't bind the `mergeExpressions` with the output of the partial 
aggregate, as they use
   // the `inputAggBufferAttributes` of the original `DeclarativeAggregate` 
before copy. Instead,
   // we shall use `inputAggBufferAttributes` after copy to match the new 
`mergeExpressions`.
-  val aggAttrs = aggregateExpressions
-// there're exactly four cases needs `inputAggBufferAttributes` from 
child according to the
-// agg planning in `AggUtils`: Partial -> Final, PartialMerge -> Final,
-// Partial -> PartialMerge, PartialMerge -> PartialMerge.
-.filter(a => a.mode == Final || a.mode == 
PartialMerge).map(_.aggregateFunction)
-.flatMap(_.inputAggBufferAttributes)
+  val aggAttrs = inputAggBufferAttributes
   child.output.dropRight(aggAttrs.length) ++ aggAttrs
 } else {
   child.output
 }
   }
+
+  protected def inputAggBufferAttributes: Seq[Attribute] = {

Review comment:
   Also, we cannot use `val` (or `lazy val`) here instead of `def`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28869: [SPARK-32031][SQL] Fix the wrong references of the PartialMerge/Final AggregateExpression

2020-06-19 Thread GitBox


maropu commented on a change in pull request #28869:
URL: https://github.com/apache/spark/pull/28869#discussion_r442761087



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala
##
@@ -53,15 +53,24 @@ trait BaseAggregateExec extends UnaryExecNode {
   // can't bind the `mergeExpressions` with the output of the partial 
aggregate, as they use
   // the `inputAggBufferAttributes` of the original `DeclarativeAggregate` 
before copy. Instead,
   // we shall use `inputAggBufferAttributes` after copy to match the new 
`mergeExpressions`.
-  val aggAttrs = aggregateExpressions
-// there're exactly four cases needs `inputAggBufferAttributes` from 
child according to the
-// agg planning in `AggUtils`: Partial -> Final, PartialMerge -> Final,
-// Partial -> PartialMerge, PartialMerge -> PartialMerge.
-.filter(a => a.mode == Final || a.mode == 
PartialMerge).map(_.aggregateFunction)
-.flatMap(_.inputAggBufferAttributes)
+  val aggAttrs = inputAggBufferAttributes
   child.output.dropRight(aggAttrs.length) ++ aggAttrs
 } else {
   child.output
 }
   }
+
+  protected def inputAggBufferAttributes: Seq[Attribute] = {

Review comment:
   `private`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28869: [SPARK-32031][SQL] Fix the wrong references of the PartialMerge/Final AggregateExpression

2020-06-19 Thread GitBox


maropu commented on a change in pull request #28869:
URL: https://github.com/apache/spark/pull/28869#discussion_r442757181



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
##
@@ -140,7 +140,7 @@ case class AggregateExpression(
   override lazy val references: AttributeSet = {
 val aggAttributes = mode match {
   case Partial | Complete => aggregateFunction.references
-  case PartialMerge | Final => 
AttributeSet(aggregateFunction.aggBufferAttributes)
+  case PartialMerge | Final => 
AttributeSet(aggregateFunction.inputAggBufferAttributes)

Review comment:
   Ah, I see. This fix looks correct. Nice catch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28869: [SPARK-32031][SQL] Fix the wrong references of the PartialMerge/Final AggregateExpression

2020-06-19 Thread GitBox


maropu commented on a change in pull request #28869:
URL: https://github.com/apache/spark/pull/28869#discussion_r442756818



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala
##
@@ -53,15 +53,24 @@ trait BaseAggregateExec extends UnaryExecNode {
   // can't bind the `mergeExpressions` with the output of the partial 
aggregate, as they use
   // the `inputAggBufferAttributes` of the original `DeclarativeAggregate` 
before copy. Instead,
   // we shall use `inputAggBufferAttributes` after copy to match the new 
`mergeExpressions`.
-  val aggAttrs = aggregateExpressions
-// there're exactly four cases needs `inputAggBufferAttributes` from 
child according to the
-// agg planning in `AggUtils`: Partial -> Final, PartialMerge -> Final,
-// Partial -> PartialMerge, PartialMerge -> PartialMerge.
-.filter(a => a.mode == Final || a.mode == 
PartialMerge).map(_.aggregateFunction)
-.flatMap(_.inputAggBufferAttributes)
+  val aggAttrs = inputAggBufferAttributes
   child.output.dropRight(aggAttrs.length) ++ aggAttrs
 } else {
   child.output
 }
   }
+
+  protected def inputAggBufferAttributes: Seq[Attribute] = {
+aggregateExpressions
+  // there're exactly four cases needs `inputAggBufferAttributes` from 
child according to the
+  // agg planning in `AggUtils`: Partial -> Final, PartialMerge -> Final,
+  // Partial -> PartialMerge, PartialMerge -> PartialMerge.
+  .filter(a => a.mode == Final || a.mode == PartialMerge)
+  .flatMap(_.aggregateFunction.inputAggBufferAttributes)
+  }
+
+  override def producedAttributes: AttributeSet =
+  // it's not empty when the inputAggBufferAttributes is from the child 
Aggregate, which contains
+  // subquery in AggregateFunction. See SPARK-31620 for more details.

Review comment:
   nit: need indents here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28869: [SPARK-32031][SQL] Fix the wrong references of the PartialMerge/Final AggregateExpression

2020-06-19 Thread GitBox


maropu commented on a change in pull request #28869:
URL: https://github.com/apache/spark/pull/28869#discussion_r442753564



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala
##
@@ -53,15 +53,24 @@ trait BaseAggregateExec extends UnaryExecNode {
   // can't bind the `mergeExpressions` with the output of the partial 
aggregate, as they use
   // the `inputAggBufferAttributes` of the original `DeclarativeAggregate` 
before copy. Instead,
   // we shall use `inputAggBufferAttributes` after copy to match the new 
`mergeExpressions`.
-  val aggAttrs = aggregateExpressions
-// there're exactly four cases needs `inputAggBufferAttributes` from 
child according to the
-// agg planning in `AggUtils`: Partial -> Final, PartialMerge -> Final,
-// Partial -> PartialMerge, PartialMerge -> PartialMerge.
-.filter(a => a.mode == Final || a.mode == 
PartialMerge).map(_.aggregateFunction)
-.flatMap(_.inputAggBufferAttributes)
+  val aggAttrs = inputAggBufferAttributes
   child.output.dropRight(aggAttrs.length) ++ aggAttrs

Review comment:
   nit: `child.output.dropRight(inputAggBufferAttributes.length) ++ 
inputAggBufferAttributes `?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org