Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

via GitHub Wed, 24 Jul 2024 08:49:15 -0700


ericm-db commented on code in PR #47445:
URL: https://github.com/apache/spark/pull/47445#discussion_r1690054569



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/metadata/StateMetadataSource.scala:
##########
@@ -188,29 +191,54 @@ class StateMetadataPartitionReader(
     } else Array.empty
   }
 
-  private def allOperatorStateMetadata: Array[OperatorStateMetadata] = {
+  private[sql] def allOperatorStateMetadata: Array[OperatorStateMetadata] = {
     val stateDir = new Path(checkpointLocation, "state")
     val opIds = fileManager
       .list(stateDir, pathNameCanBeParsedAsLongFilter).map(f => 
pathToLong(f.getPath)).sorted
-    opIds.map { opId =>
-      new OperatorStateMetadataReader(new Path(stateDir, opId.toString), 
hadoopConf).read()
+    opIds.flatMap { opId =>

Review Comment:
   I see, is this the case in which we have added an operator after the first 
batch of the query? Like what we expect to happen?
   I'm not sure what the behavior should be in this case. I think just 
'dropping' the operator makes sense, but I can also conform to the previous 
interface, and fail the query if the operator does not have any metadata. What 
do you think @anishshri-db ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

Reply via email to