[ 
https://issues.apache.org/jira/browse/SPARK-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Agrawal resolved SPARK-28120.
------------------------------------
    Resolution: Later

The implementation will be submitted to https://spark-packages.org. 

> RocksDB state storage
> ---------------------
>
>                 Key: SPARK-28120
>                 URL: https://issues.apache.org/jira/browse/SPARK-28120
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Vikram Agrawal
>            Priority: Major
>
> SPARK-13809 introduced a framework for state management for computing 
> Streaming Aggregates. The default implementation was in-memory hashmap which 
> was backed up in HDFS complaint file system at the end of every micro-batch. 
> Current implementation suffers from Performance and Latency Issues. It uses 
> Executor JVM memory to store the states. State store size is limited by the 
> size of the executor memory. Also
> Executor JVM memory is shared by state storage and other tasks operations. 
> State storage size will impact the performance of task execution
> Moreover, GC pauses, executor failures, OOM issues are common when the size 
> of state storage increases which increases overall latency of a micro-batch
> RocksDb is an embedded DB which can provide major performance improvements. 
> Other major streaming frameworks have rocksdb as default state storage.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to