[ https://issues.apache.org/jira/browse/SPARK-48931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Riya Verma updated SPARK-48931: ------------------------------- Description: Currently, during the state store maintenance process, we find which old version files of the RocksDB state store to delete by listing all existing snapshotted version files in the checkpoint directory every 1 minute by default. The frequent list calls in the cloud can result in high costs. To address this concern and reduce the cost associated with state store maintenance, we should aim to minimize the frequency of listing object stores inside the maintenance task. To minimize the frequency, we will try to accumulate versions to delete and only call list when the number of versions to delete reaches a configured threshold. (was: Currently, during the state store maintenance process, we find which old version files of the RocksDB state store to delete by listing all existing snapshotted version files in the checkpoint directory every 1 minute by default. The frequent list calls in the cloud can result in high costs. To address this concern and reduce the cost associated with state store maintenance, we should aim to minimize the frequency of listing object stores inside the maintenance task. To minimize the frequency, we will try to accumulate versions to delete and only call list inside *deleteOldVersions* when the number of versions to delete reaches a configured threshold. ) > Reduce Cloud Store List API cost for state store maintenance task > ----------------------------------------------------------------- > > Key: SPARK-48931 > URL: https://issues.apache.org/jira/browse/SPARK-48931 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.4.3 > Reporter: Riya Verma > Priority: Major > > Currently, during the state store maintenance process, we find which old > version files of the RocksDB state store to delete by listing all existing > snapshotted version files in the checkpoint directory every 1 minute by > default. The frequent list calls in the cloud can result in high costs. To > address this concern and reduce the cost associated with state store > maintenance, we should aim to minimize the frequency of listing object stores > inside the maintenance task. To minimize the frequency, we will try to > accumulate versions to delete and only call list when the number of versions > to delete reaches a configured threshold. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org