[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

cloud-fan Thu, 04 Oct 2018 09:09:58 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22630#discussion_r222733405
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
    @@ -159,6 +159,10 @@ case class HashAggregateExec(
       // don't need a stop check before aggregating.
       override def needStopCheck: Boolean = false
     
    +  // Aggregate operator always consumes all the input rows before 
outputting any result, so its
    +  // upstream operators can keep producing data, even if there is a limit 
after Aggregate.
    --- End diff --
    
    let's say the query is `range -> limit -> agg -> limit`.
    
    So `agg` does consume all the inputs, from the first `limit`. The range 
will have a stop check w.r.t. to first `limit`, not the second `limit`. If 
there is no limit before `agg`, then `range` will not have a stop check.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

Reply via email to