HeartSaVioR commented on code in PR #50700:
URL: https://github.com/apache/spark/pull/50700#discussion_r2059455296
##########
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala:
##########
@@ -1830,6 +1831,19 @@ abstract class TransformWithStateSuite extends
StateStoreMetricsTest
CheckNewAnswer(("a", "1")),
StopStream
)
+
+ val hadoopConf = spark.sessionState.newHadoopConf()
Review Comment:
I understand this is to check whether we filter the files which aren't bound
to offset log / schema log / etc. Shall we leave a comment for this? Also,
shall we check whether we do not filter them out during pruning?
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala:
##########
@@ -419,10 +426,15 @@ class OperatorStateMetadataV2FileManager(
}
private def deleteSchemaFiles(thresholdBatchId: Long): Unit = {
+ if (thresholdBatchId <= 0) {
+ return
+ }
+ // filter for numeric filenames (StateSchemaV3 files) and ignore
non-numeric ones
Review Comment:
> filter for numeric filenames (StateSchemaV3 files)
filter? filter "out"? Do we remove StateSchemaV3 file?
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala:
##########
Review Comment:
Here we now just adjust the criteria of filtering the file. Do we exclude
StateSchemaV3 files in this PR, and if then, what is the file name pattern?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]