Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19082#discussion_r143326742 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -797,26 +904,44 @@ case class HashAggregateExec( def updateRowInFastHashMap(isVectorized: Boolean): Option[String] = { - ctx.INPUT_ROW = fastRowBuffer + // We need to copy the aggregation row buffer to a local row first because each aggregate + // function directly updates the buffer when it finishes. + val localRowBuffer = ctx.freshName("localFastRowBuffer") + val initLocalRowBuffer = s"InternalRow $localRowBuffer = $fastRowBuffer.copy();" --- End diff -- Why we need to copy the row buffer? You let `updateExpr` bound to the local copied row buffer, but the evaluation is happened in split functions. Isn't possible the `updateExpr` can't find the local variable of the copied row buffer in the functions?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org