[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

cloud-fan Fri, 05 Oct 2018 07:18:53 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22630#discussion_r223022650
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
    @@ -345,6 +345,27 @@ trait CodegenSupport extends SparkPlan {
        * don't require shouldStop() in the loop of producing rows.
        */
       def needStopCheck: Boolean = parent.needStopCheck
    +
    +  /**
    +   * A sequence of checks which evaluate to true if the downstream Limit 
operators have not received
    +   * enough records and reached the limit. If current node is a data 
producing node, it can leverage
    +   * this information to stop producing data and complete the data flow 
earlier. Common data
    +   * producing nodes are leaf nodes like Range and Scan, and blocking 
nodes like Sort and Aggregate.
    +   * These checks should be put into the loop condition of the data 
producing loop.
    +   */
    +  def limitNotReachedChecks: Seq[String] = parent.limitNotReachedChecks
    +
    +  /**
    +   * A helper method to generate the data producing loop condition 
according to the
    +   * limit-not-reached checks.
    +   */
    +  final def limitNotReachedCond: String = {
    +    if (parent.limitNotReachedChecks.isEmpty) {
    +      ""
    +    } else {
    +      parent.limitNotReachedChecks.mkString(" && ", " && ", "")
    --- End diff --
    
    then we will have a lot of places generating the initial `&&`. If we do 
have a different context in the future, we can use `limitNotReachedChecks` 
directly.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

Reply via email to