Re: [PR] [SPARK-51904][SS] Removing async metadata purging for StateSchemaV3 and ignoring non-batch files when listing OperatorMetadata files [spark]

via GitHub Thu, 24 Apr 2025 19:06:40 -0700


HeartSaVioR commented on code in PR #50700:
URL: https://github.com/apache/spark/pull/50700#discussion_r2059455296



##########
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala:
##########
@@ -1830,6 +1831,19 @@ abstract class TransformWithStateSuite extends 
StateStoreMetricsTest
           CheckNewAnswer(("a", "1")),
           StopStream
         )
+
+        val hadoopConf = spark.sessionState.newHadoopConf()

Review Comment:
   I understand this is to check whether we filter the files which aren't bound 
to offset log / schema log / etc. Shall we leave a comment for this? Also, 
shall we check whether we do not filter them out during pruning?



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala:
##########
@@ -419,10 +426,15 @@ class OperatorStateMetadataV2FileManager(
   }
 
   private def deleteSchemaFiles(thresholdBatchId: Long): Unit = {
+    if (thresholdBatchId <= 0) {
+      return
+    }
+    // filter for numeric filenames (StateSchemaV3 files) and ignore 
non-numeric ones

Review Comment:
   > filter for numeric filenames (StateSchemaV3 files)
   
   filter? filter "out"? Do we remove StateSchemaV3 file?



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala:
##########


Review Comment:
   Here we now just adjust the criteria of filtering the file. Do we exclude 
StateSchemaV3 files in this PR, and if then, what is the file name pattern?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-51904][SS] Removing async metadata purging for StateSchemaV3 and ignoring non-batch files when listing OperatorMetadata files [spark]

Reply via email to