hvanhovell commented on code in PR #40610: URL: https://github.com/apache/spark/pull/40610#discussion_r1153980266
########## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ########## @@ -134,24 +134,41 @@ private[sql] class SparkResult[T]( /** * Returns an iterator over the contents of the result. */ - def iterator: java.util.Iterator[T] with AutoCloseable = { + def iterator: java.util.Iterator[T] with AutoCloseable = + buildIterator(destructive = false) + + /** + * Returns an destructive iterator over the contents of the result. + */ + def destructiveIterator: java.util.Iterator[T] with AutoCloseable = + buildIterator(destructive = true) + + private def buildIterator(destructive: Boolean): java.util.Iterator[T] with AutoCloseable = { new java.util.Iterator[T] with AutoCloseable { - private[this] var batchIndex: Int = -1 private[this] var iterator: java.util.Iterator[InternalRow] = Collections.emptyIterator() private[this] var deserializer: Deserializer[T] = _ + private[this] var currentBatch: ColumnarBatch = _ + private[this] val _destructive: Boolean = destructive + override def hasNext: Boolean = { if (iterator.hasNext) { return true } - val nextBatchIndex = batchIndex + 1 + val batchIndex = batches.indexOf(currentBatch) Review Comment: I have been looking at this a for a bit now. I am not sure if I like it. There are two issues: - In destructive mode you know the location of the current batch. It should be at index = 0. In non destructive mode the index should be `batchIndex`. We are not doing anything with that information. - The removal can be pretty expensive since we are removing from the head. I am wondering if we can use a better suited data structure here. You could use a map, since that will give you cheap removals, and fairly fast lookups. Alternatively we could implement something a-kin to a linkedlist (I don't think you can use a stock linked list since those don't like updates during iteration). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org