Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22219#discussion_r212856186
  
    --- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 ---
    @@ -289,6 +289,14 @@ private[hive] class SparkExecuteStatementOperation(
           sqlContext.sparkContext.cancelJobGroup(statementId)
         }
       }
    +
    +  private def getResultIterator(): Iterator[SparkRow] = {
    +    val (totalRowCount, iterResult) = result.collectCountAndIterator()
    +    val batchCollectLimit =
    +      
sqlContext.getConf(SQLConf.THRIFTSERVER_BATCH_COLLECTION_LIMIT.key).toLong
    +    resultList = if (totalRowCount < batchCollectLimit) 
Some(iterResult.toArray) else None
    +    if (resultList.isDefined) resultList.get.iterator else iterResult
    --- End diff --
    
    When incremental collect is disabled, and users want to use `FETCH_FIRST`, 
we expect the returned rows are cached and can get iterators again. Now 
`FETCH_FIRST` will trigger re-execution no matter incremental collect or not. I 
think this maybe performance regression in some cases.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to