juliuszsompolski commented on a change in pull request #26014: [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request URL: https://github.com/apache/spark/pull/26014#discussion_r333412752
########## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala ########## @@ -684,6 +685,92 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { assert(e.getMessage.contains("org.apache.spark.sql.catalyst.parser.ParseException")) } } + + test("ThriftCLIService FetchResults FETCH_FIRST, FETCH_NEXT, FETCH_PRIOR") { + def checkResult(rows: RowSet, start: Long, end: Long): Unit = { + assert(rows.getStartOffset() == start) + assert(rows.numRows() == end - start) + rows.iterator.asScala.zip((start until end).iterator).foreach { case (row, v) => + assert(row(0).asInstanceOf[Long] === v) + } + } + + withCLIServiceClient { client => + val user = System.getProperty("user.name") + val sessionHandle = client.openSession(user, "") + + val confOverlay = new java.util.HashMap[java.lang.String, java.lang.String] + val operationHandle = client.executeStatement( + sessionHandle, + "SELECT * FROM range(10)", + confOverlay) // 10 rows result with sequence 0, 1, 2, ..., 9 + var rows: RowSet = null + + // Fetch 5 rows with FETCH_NEXT + rows = client.fetchResults( + operationHandle, FetchOrientation.FETCH_NEXT, 5, FetchType.QUERY_OUTPUT) + checkResult(rows, 0, 5) // fetched [0, 5) + + // Fetch another 2 rows with FETCH_NEXT + rows = client.fetchResults( + operationHandle, FetchOrientation.FETCH_NEXT, 2, FetchType.QUERY_OUTPUT) + checkResult(rows, 5, 7) // fetched [5, 7) + + // FETCH_PRIOR 3 rows Review comment: @wangyum this is expected. `FETCH_PRIOR` of the Thriftserver is not the same as FETCH PRIOR in the cursor of the client. Fetch in Thriftserver operates in batches of rows, and the cursor in the client caches these batches and returns results row by row. Let's say it's batching by maxRows=100, and we returned row 99. The client has rows [0, 100) from the first batch and is at row 99. The next FETCH NEXT on the cursor will have to call FETCH_NEXT to the Thriftserver to get a batch of rows [100, 200) and return row 100 to the client. Another FETCH NEXT will return row 101 from the batch without having to call FETCH_NEXT on the Thriftserver. Another FETCH NEXT on the cursor will return row 102. Then FETCH PRIOR will return row 101 again. Then FETCH PRIOR will return row 100. Only then, another FETCH PRIOR should return row 99, but the cursor doesn't have its current batch. Then it has to call FETCH_PRIOR on Thriftserver to get rows [0, 99) again. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org