[jira] [Commented] (SPARK-43421) Implement changelog checkpointing for RocksDB state store

Hudson (Jira) Tue, 09 May 2023 13:29:09 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-43421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721085#comment-17721085
 ]


Hudson commented on SPARK-43421:
--------------------------------

User 'chaoqin-li1123' has created a pull request for this issue:
https://github.com/apache/spark/pull/41099

> Implement changelog checkpointing for RocksDB state store
> ---------------------------------------------------------
>
>                 Key: SPARK-43421
>                 URL: https://issues.apache.org/jira/browse/SPARK-43421
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.4.0
>            Reporter: Chaoqin Li
>            Priority: Major
>
> We have identified state checkpointing latency as one of the major 
> performance bottlenecks for stateful streaming queries. Currently, RocksDB 
> state store pauses the RocksDB instances to upload a snapshot to the cloud 
> when committing a batch, which is heavy weight and has unpredictable 
> performance.
> In order to reduce the checkpoint duration and end to end latency, we propose 
> to
> 1. During state commit, make the state of a microbatch durable by syncing the 
> changelog instead of the state snapshot to the checkpoint directory.
> 2. Upload snapshot in the background to enable changelog purging and faster 
> failure recovery.
> In this way, we allow the RocksDB instance to run uninterruptibly, which 
> improves RocksDB operation performance. This also dramatically reduces the 
> commit time and batch duration because we are uploading a smaller amount of 
> data during state commit. With this change, stateful query with RocksDB state 
> store will have lower and more predictable latency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43421) Implement changelog checkpointing for RocksDB state store

Reply via email to