Yuchen Liu created SPARK-48542:
----------------------------------

             Summary: Give snapshotStartBatchId and snapshotPartitionId to the 
state data source
                 Key: SPARK-48542
                 URL: https://issues.apache.org/jira/browse/SPARK-48542
             Project: Spark
          Issue Type: New Feature
          Components: SQL, Structured Streaming
    Affects Versions: 4.0.0
         Environment: This should work for both HDFS state store and RocksDB 
state store.
            Reporter: Yuchen Liu


Right now, to read a version of the state data, the state source will try to 
find the first snapshot file before the given version and construct it using 
the delta files. In some debugging scenarios, users need more granular control 
on how to reconstruct the given state, for example they want to start from a 
specific snapshot instead of the closest one. One use case is to find whether a 
snapshot has been corrupted after committing.

This task introduces two options {{snapshotStartBatchId}} and 
{{snapshotPartitionId}} to the state data source. By specifying them, users can 
control the starting batch id of the snapshot and partition id of the state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to