[ https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326427#comment-17326427 ]
Yuanjian Li commented on SPARK-34198: ------------------------------------- [~viirya] Since the RocksDBStateStore can solve the major drawbacks for the current HDFS based one. I think it's a better choice to directly add it as a build-in RocksDBStateStoreProvider. It's also convenient for the end-users to choose it directly. If you agree, I will change the description and the title for this ticket. > Add RocksDB StateStore as external module > ----------------------------------------- > > Key: SPARK-34198 > URL: https://issues.apache.org/jira/browse/SPARK-34198 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming > Affects Versions: 3.2.0 > Reporter: L. C. Hsieh > Priority: Major > > Currently Spark SS only has one built-in StateStore implementation > HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As > there are more and more streaming applications, some of them requires to use > large state in stateful operations such as streaming aggregation and join. > Several other major streaming frameworks already use RocksDB for state > management. So it is proven to be good choice for large state usage. But > Spark SS still lacks of a built-in state store for the requirement. > We would like to explore the possibility to add RocksDB-based StateStore into > Spark SS. For the concern about adding RocksDB as a direct dependency, our > plan is to add this StateStore as an external module first. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org