Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22219#discussion_r212856186 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala --- @@ -289,6 +289,14 @@ private[hive] class SparkExecuteStatementOperation( sqlContext.sparkContext.cancelJobGroup(statementId) } } + + private def getResultIterator(): Iterator[SparkRow] = { + val (totalRowCount, iterResult) = result.collectCountAndIterator() + val batchCollectLimit = + sqlContext.getConf(SQLConf.THRIFTSERVER_BATCH_COLLECTION_LIMIT.key).toLong + resultList = if (totalRowCount < batchCollectLimit) Some(iterResult.toArray) else None + if (resultList.isDefined) resultList.get.iterator else iterResult --- End diff -- When incremental collect is disabled, and users want to use `FETCH_FIRST`, we expect the returned rows are cached and can get iterators again. Now `FETCH_FIRST` will trigger re-execution no matter incremental collect or not. I think this maybe performance regression in some cases.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org