cloud-fan commented on a change in pull request #32242: URL: https://github.com/apache/spark/pull/32242#discussion_r618036534
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ########## @@ -128,6 +128,16 @@ case class HashAggregateExec( // all the mode of aggregate expressions private val modes = aggregateExpressions.map(_.mode).distinct + // This is for testing final aggregate with number-of-rows-based fall back as specified in + // `testFallbackStartsAt`. In this scenario, there might be same keys exist in both fast and + // regular hash map. So the aggregation buffers from both maps need to be merged together + // to avoid correctness issue. + // + // This scenario only happens in unit test with number-of-rows-based fall back. + // There should not be same keys in both maps with size-based fall back in production. + private val isTestFinalAggregateWithFallback: Boolean = testFallbackStartsAt.isDefined && Review comment: My idea is to simulate the size-based fallback: "no space" -> "reach the capacity/limit" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org