[ https://issues.apache.org/jira/browse/SPARK-24441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tathagata Das resolved SPARK-24441. ----------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 21469 [https://github.com/apache/spark/pull/21469] > Expose total estimated size of states in HDFSBackedStateStoreProvider > --------------------------------------------------------------------- > > Key: SPARK-24441 > URL: https://issues.apache.org/jira/browse/SPARK-24441 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 2.3.0 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Major > Fix For: 3.0.0 > > > While Spark exposes state metrics for single state, Spark still doesn't > expose overall memory usage of state (loadedMaps) in > HDFSBackedStateStoreProvider. > The rationalize of the patch is that state backed by > HDFSBackedStateStoreProvider will consume more memory than the number what we > can get from query status due to caching multiple versions of states. The > memory footprint to be much larger than query status reports in situations > where the state store is getting a lot of updates: while shallow-copying map > incurs additional small memory usages due to the size of map entities and > references, but row objects will still be shared across the versions. If > there're lots of updates between batches, less row objects will be shared and > more row objects will exist in memory consuming much memory then what we > expect. > It would be better to expose it as well so that end users can determine > actual memory usage for state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org