[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

cloud-fan Mon, 24 Sep 2018 19:37:32 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22524#discussion_r220044149
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
    @@ -465,13 +465,18 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
           |   $initRangeFuncName(partitionIndex);
           | }
           |
    -      | while (true) {
    +      | while (true && !stopEarly()) {
           |   long $range = $batchEnd - $number;
           |   if ($range != 0L) {
           |     int $localEnd = (int)($range / ${step}L);
           |     for (int $localIdx = 0; $localIdx < $localEnd; $localIdx++) {
           |       long $value = ((long)$localIdx * ${step}L) + $number;
    +      |       $numOutput.add(1);
    --- End diff --
    
    This is very likely to hit perf regression since it's not a tight loop 
anymore.
    
    We want the range operator to stop earlier for better performance, but it 
doesn't mean the range operator must return exactly the `limit` number of 
records. Since the range operator is already returning data in batch, I think 
we can stop earlier in a batch granularity.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

Reply via email to