+1 adding it any way. On Mon, 8 Feb 2021, 21:54 Holden Karau, <hol...@pigscanfly.ca> wrote:
> +1 for an external module. > > On Mon, Feb 8, 2021 at 11:51 AM Cheng Su <chen...@fb.com.invalid> wrote: > >> +1 for (2) adding to external module. >> >> I think this feature is useful and popular in practice, and option 2 is >> not conflict with previous concern for dependency. >> >> >> >> Thanks, >> >> Cheng Su >> >> >> >> *From: *Dongjoon Hyun <dongjoon.h...@gmail.com> >> *Date: *Monday, February 8, 2021 at 10:39 AM >> *To: *Jacek Laskowski <ja...@japila.pl> >> *Cc: *Liang-Chi Hsieh <vii...@gmail.com>, dev <dev@spark.apache.org> >> *Subject: *Re: [DISCUSS] Add RocksDB StateStore >> >> >> >> Thank you, Liang-chi and all. >> >> >> >> +1 for (2) external module design because it can deliver the new feature >> in a safe way. >> >> >> >> Bests, >> >> Dongjoon >> >> >> >> On Mon, Feb 8, 2021 at 9:00 AM Jacek Laskowski <ja...@japila.pl> wrote: >> >> Hi, >> >> >> >> I'm "okay to add RocksDB StateStore as external module". See no reason >> not to. >> >> >> Pozdrawiam, >> >> Jacek Laskowski >> >> ---- >> >> https://about.me/JacekLaskowski >> >> "The Internals Of" Online Books <https://books.japila.pl/> >> >> Follow me on https://twitter.com/jaceklaskowski >> >> >> <https://twitter.com/jaceklaskowski> >> >> >> >> >> >> On Tue, Feb 2, 2021 at 9:32 AM Liang-Chi Hsieh <vii...@gmail.com> wrote: >> >> Hi devs, >> >> In Spark structured streaming, we need state store for state management >> for >> stateful operators such streaming aggregates, joins, etc. We have one and >> only one state store implementation now. It is in-memory hashmap which was >> backed up in HDFS complaint file system at the end of every micro-batch. >> >> As it basically uses in-memory map to store states, memory consumption is >> a >> serious issue and state store size is limited by the size of the executor >> memory. Moreover, state store using more memory means it may impact the >> performance of task execution that requires memory too. >> >> Internally we see more streaming applications that requires large state in >> stateful operations. For such requirements, we need a StateStore not rely >> on >> memory to store states. >> >> This seems to be also true externally as several other major streaming >> frameworks already use RocksDB for state management. RocksDB is an >> embedded >> DB and streaming engines can use it to store state instead of memory >> storage. >> >> So seems to me, it is proven to be good choice for large state usage. But >> Spark SS still lacks of a built-in state store for the requirement. >> >> Previously there was one attempt SPARK-28120 to add RocksDB StateStore >> into >> Spark SS. IIUC, it was pushed back due to two concerns: extra code >> maintenance cost and it introduces RocksDB dependency. >> >> For the first concern, as more users require to use the feature, it should >> be highly used code in SS and more developers will look at it. For second >> one, we propose (SPARK-34198) to add it as an external module to relieve >> the >> dependency concern. >> >> Because it was pushed back previously, I'm going to raise this discussion >> to >> know what people think about it now, in advance of submitting any code. >> >> I think there might be some possible opinions: >> >> 1. okay to add RocksDB StateStore into sql core module >> 2. not okay for 1, but okay to add RocksDB StateStore as external module >> 3. either 1 or 2 is okay >> 4. not okay to add RocksDB StateStore, no matter into sql core or as >> external module >> >> Please let us know if you have some thoughts. >> >> Thank you. >> >> Liang-Chi Hsieh >> >> >> >> >> -- >> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >