[ https://issues.apache.org/jira/browse/SPARK-48586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Riya Verma updated SPARK-48586: ------------------------------- Description: Currently the lock of the *RocksDB* state store is acquired when uploading the snapshot inside maintenance tasks when change log checkpointing is enabled, which causes lock contention between query processing tasks and state maintenance thread. We want to eliminate lock contention to decrease latency of streaming queries so lock acquisition inside maintenance tasks should be avoided. This can introduce race conditions between task and maintenance threads. By making a deep copy of {{versionToRocksDBFiles}} and {{localFilesToDfsFiles}} in {*}RocksDBFileManager{*}, we can ensure that the file manager state is not updated by task thread when background snapshot uploading tasks attempt to upload a snapshot. (was: Currently the lock of the *RocksDB* state store is acquired when uploading the snapshot inside maintenance tasks when change log checkpointing is enabled, which causes lock contention between query processing tasks and state maintenance thread. To eliminate the lock contention, lock acquisition inside maintenance tasks should be avoided. To prevent race conditions between task and maintenance threads, we can ensure that *RocksDBFileManager* has a linear history by ensuring a deep copy of *RocksDBFileManager* every time a previous version is loaded. The original file manager is not affected by future state update. The new file manager is not affected by background snapshot uploading tasks that attempt to upload a snapshot.) > Remove lock acquisition in doMaintenance() by making a deep copy of file > mappings in RocksDBFileManager in load() > ----------------------------------------------------------------------------------------------------------------- > > Key: SPARK-48586 > URL: https://issues.apache.org/jira/browse/SPARK-48586 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.4.3 > Reporter: Riya Verma > Priority: Major > Labels: pull-request-available > > Currently the lock of the *RocksDB* state store is acquired when uploading > the snapshot inside maintenance tasks when change log checkpointing is > enabled, which causes lock contention between query processing tasks and > state maintenance thread. We want to eliminate lock contention to decrease > latency of streaming queries so lock acquisition inside maintenance tasks > should be avoided. This can introduce race conditions between task and > maintenance threads. By making a deep copy of {{versionToRocksDBFiles}} and > {{localFilesToDfsFiles}} in {*}RocksDBFileManager{*}, we can ensure that the > file manager state is not updated by task thread when background snapshot > uploading tasks attempt to upload a snapshot. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org