[ 
https://issues.apache.org/jira/browse/SPARK-48931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riya Verma updated SPARK-48931:
-------------------------------
    Description: Currently, during the state store maintenance process, we find 
which old version files of the RocksDB state store to delete by listing all 
existing snapshotted version files in the checkpoint directory every 1 minute 
by default. The frequent list calls in the cloud can result in high costs. To 
address this concern and reduce the cost associated with state store 
maintenance, we should aim to minimize the frequency of listing object stores 
inside the maintenance task. To minimize the frequency, we will try to 
accumulate versions to delete and only call list when the number of versions to 
delete reaches a configured threshold.   (was: Currently, during the state 
store maintenance process, we find which old version files of the RocksDB state 
store to delete by listing all existing snapshotted version files in the 
checkpoint directory every 1 minute by default. The frequent list calls in the 
cloud can result in high costs. To address this concern and reduce the cost 
associated with state store maintenance, we should aim to minimize the 
frequency of listing object stores inside the maintenance task. To minimize the 
frequency, we will try to accumulate versions to delete and only call list 
inside *deleteOldVersions* when the number of versions to delete reaches a 
configured threshold. )

> Reduce Cloud Store List API cost for state store maintenance task
> -----------------------------------------------------------------
>
>                 Key: SPARK-48931
>                 URL: https://issues.apache.org/jira/browse/SPARK-48931
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.4.3
>            Reporter: Riya Verma
>            Priority: Major
>
> Currently, during the state store maintenance process, we find which old 
> version files of the RocksDB state store to delete by listing all existing 
> snapshotted version files in the checkpoint directory every 1 minute by 
> default. The frequent list calls in the cloud can result in high costs. To 
> address this concern and reduce the cost associated with state store 
> maintenance, we should aim to minimize the frequency of listing object stores 
> inside the maintenance task. To minimize the frequency, we will try to 
> accumulate versions to delete and only call list when the number of versions 
> to delete reaches a configured threshold. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to