Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22630#discussion_r222733405 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -159,6 +159,10 @@ case class HashAggregateExec( // don't need a stop check before aggregating. override def needStopCheck: Boolean = false + // Aggregate operator always consumes all the input rows before outputting any result, so its + // upstream operators can keep producing data, even if there is a limit after Aggregate. --- End diff -- let's say the query is `range -> limit -> agg -> limit`. So `agg` does consume all the inputs, from the first `limit`. The range will have a stop check w.r.t. to first `limit`, not the second `limit`. If there is no limit before `agg`, then `range` will not have a stop check.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org