anishshri-db commented on code in PR #47641: URL: https://github.com/apache/spark/pull/47641#discussion_r1706379388
########## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ########## @@ -198,7 +204,37 @@ case class TransformWithStateExec( getOutputRow(obj) } ImplicitGroupingKeyTracker.removeImplicitKey() - mappedIterator + + // Wrapper to ensure that the implicit key is set when the methods on the iterator + // are called. Inside of processNewData, we use a GroupedIterator, so handleInputRows + // is only called once per key. As such, we only have to set the implicit key when + // the first call to hasNext is made, and we have to remove it when hasNext returns + // false. + // + // Note: if we ever start to interleave the processing of the iterators we get back + // from handleInputRows (i.e. we don't process each iterator all at once), then this + // iterator will need to set/unset the implicit key every time hasNext/next is called, + // not just at the first and last calls to hasNext. + new Iterator[InternalRow] { + var hasStarted = false + + override def hasNext: Boolean = { + if (!hasStarted) { + hasStarted = true + ImplicitGroupingKeyTracker.setImplicitKey(keyObj) + } + + val hasNext = mappedIterator.hasNext + if (!hasNext) { + ImplicitGroupingKeyTracker.removeImplicitKey() + } + hasNext + } + + override def next(): InternalRow = { + mappedIterator.next() Review Comment: I was thinking we also add this block here: ``` if (!hasStarted) { hasStarted = true ImplicitGroupingKeyTracker.setImplicitKey(keyObj) } mappedIterator.next() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org