[ https://issues.apache.org/jira/browse/SPARK-43421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721085#comment-17721085 ]
Hudson commented on SPARK-43421: -------------------------------- User 'chaoqin-li1123' has created a pull request for this issue: https://github.com/apache/spark/pull/41099 > Implement changelog checkpointing for RocksDB state store > --------------------------------------------------------- > > Key: SPARK-43421 > URL: https://issues.apache.org/jira/browse/SPARK-43421 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.4.0 > Reporter: Chaoqin Li > Priority: Major > > We have identified state checkpointing latency as one of the major > performance bottlenecks for stateful streaming queries. Currently, RocksDB > state store pauses the RocksDB instances to upload a snapshot to the cloud > when committing a batch, which is heavy weight and has unpredictable > performance. > In order to reduce the checkpoint duration and end to end latency, we propose > to > 1. During state commit, make the state of a microbatch durable by syncing the > changelog instead of the state snapshot to the checkpoint directory. > 2. Upload snapshot in the background to enable changelog purging and faster > failure recovery. > In this way, we allow the RocksDB instance to run uninterruptibly, which > improves RocksDB operation performance. This also dramatically reduces the > commit time and batch duration because we are uploading a smaller amount of > data during state commit. With this change, stateful query with RocksDB state > store will have lower and more predictable latency. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org