Jungtaek Lim created SPARK-24441:
------------------------------------

             Summary: Expose total size of states in 
HDFSBackedStateStoreProvider
                 Key: SPARK-24441
                 URL: https://issues.apache.org/jira/browse/SPARK-24441
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 2.3.0
            Reporter: Jungtaek Lim


While Spark exposes state metrics for single state, Spark still doesn't expose 
overall memory usage of state (loadedMaps) in HDFSBackedStateStoreProvider. 

Since HDFSBackedStateStoreProvider caches multiple versions of entire state in 
hashmap, this can occupy much memory than single version of state. Based on the 
default value of minVersionsToRetain, the size of cache map can grow more than 
100 times of the size of single state. It would be better to expose it as well 
so that end users can determine actual memory usage for state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to