[ https://issues.apache.org/jira/browse/SPARK-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vikram Agrawal resolved SPARK-28120. ------------------------------------ Resolution: Later The implementation will be submitted to https://spark-packages.org. > RocksDB state storage > --------------------- > > Key: SPARK-28120 > URL: https://issues.apache.org/jira/browse/SPARK-28120 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming > Affects Versions: 3.0.0 > Reporter: Vikram Agrawal > Priority: Major > > SPARK-13809 introduced a framework for state management for computing > Streaming Aggregates. The default implementation was in-memory hashmap which > was backed up in HDFS complaint file system at the end of every micro-batch. > Current implementation suffers from Performance and Latency Issues. It uses > Executor JVM memory to store the states. State store size is limited by the > size of the executor memory. Also > Executor JVM memory is shared by state storage and other tasks operations. > State storage size will impact the performance of task execution > Moreover, GC pauses, executor failures, OOM issues are common when the size > of state storage increases which increases overall latency of a micro-batch > RocksDb is an embedded DB which can provide major performance improvements. > Other major streaming frameworks have rocksdb as default state storage. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org