[jira] [Created] (SPARK-48542) Give snapshotStartBatchId and snapshotPartitionId to the state data source

Yuchen Liu (Jira) Wed, 05 Jun 2024 09:59:04 -0700

Yuchen Liu created SPARK-48542:
----------------------------------

             Summary: Give snapshotStartBatchId and snapshotPartitionId to the 
state data source
                 Key: SPARK-48542
                 URL: https://issues.apache.org/jira/browse/SPARK-48542
             Project: Spark
          Issue Type: New Feature
          Components: SQL, Structured Streaming
    Affects Versions: 4.0.0
         Environment: This should work for both HDFS state store and RocksDB 
state store.
            Reporter: Yuchen Liu



Right now, to read a version of the state data, the state source will try to 
find the first snapshot file before the given version and construct it using 
the delta files. In some debugging scenarios, users need more granular control 
on how to reconstruct the given state, for example they want to start from a 
specific snapshot instead of the closest one. One use case is to find whether a 
snapshot has been corrupted after committing.

This task introduces two options {{snapshotStartBatchId}} and 
{{snapshotPartitionId}} to the state data source. By specifying them, users can 
control the starting batch id of the snapshot and partition id of the state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48542) Give snapshotStartBatchId and snapshotPartitionId to the state data source

Reply via email to