I'm good with this too. 2021년 2월 9일 (화) 오후 4:16, DB Tsai <dbt...@dbtsai.com>님이 작성:
> +1 to add it as an external module so people can test it out and give > feedback easier. > > On Mon, Feb 8, 2021 at 10:22 PM Gabor Somogyi <gabor.g.somo...@gmail.com> > wrote: > > > > +1 adding it any way. > > > > On Mon, 8 Feb 2021, 21:54 Holden Karau, <hol...@pigscanfly.ca> wrote: > >> > >> +1 for an external module. > >> > >> On Mon, Feb 8, 2021 at 11:51 AM Cheng Su <chen...@fb.com.invalid> > wrote: > >>> > >>> +1 for (2) adding to external module. > >>> > >>> I think this feature is useful and popular in practice, and option 2 > is not conflict with previous concern for dependency. > >>> > >>> > >>> > >>> Thanks, > >>> > >>> Cheng Su > >>> > >>> > >>> > >>> From: Dongjoon Hyun <dongjoon.h...@gmail.com> > >>> Date: Monday, February 8, 2021 at 10:39 AM > >>> To: Jacek Laskowski <ja...@japila.pl> > >>> Cc: Liang-Chi Hsieh <vii...@gmail.com>, dev <dev@spark.apache.org> > >>> Subject: Re: [DISCUSS] Add RocksDB StateStore > >>> > >>> > >>> > >>> Thank you, Liang-chi and all. > >>> > >>> > >>> > >>> +1 for (2) external module design because it can deliver the new > feature in a safe way. > >>> > >>> > >>> > >>> Bests, > >>> > >>> Dongjoon > >>> > >>> > >>> > >>> On Mon, Feb 8, 2021 at 9:00 AM Jacek Laskowski <ja...@japila.pl> > wrote: > >>> > >>> Hi, > >>> > >>> > >>> > >>> I'm "okay to add RocksDB StateStore as external module". See no reason > not to. > >>> > >>> > >>> Pozdrawiam, > >>> > >>> Jacek Laskowski > >>> > >>> ---- > >>> > >>> https://about.me/JacekLaskowski > >>> > >>> "The Internals Of" Online Books > >>> > >>> Follow me on https://twitter.com/jaceklaskowski > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Tue, Feb 2, 2021 at 9:32 AM Liang-Chi Hsieh <vii...@gmail.com> > wrote: > >>> > >>> Hi devs, > >>> > >>> In Spark structured streaming, we need state store for state > management for > >>> stateful operators such streaming aggregates, joins, etc. We have one > and > >>> only one state store implementation now. It is in-memory hashmap which > was > >>> backed up in HDFS complaint file system at the end of every > micro-batch. > >>> > >>> As it basically uses in-memory map to store states, memory consumption > is a > >>> serious issue and state store size is limited by the size of the > executor > >>> memory. Moreover, state store using more memory means it may impact the > >>> performance of task execution that requires memory too. > >>> > >>> Internally we see more streaming applications that requires large > state in > >>> stateful operations. For such requirements, we need a StateStore not > rely on > >>> memory to store states. > >>> > >>> This seems to be also true externally as several other major streaming > >>> frameworks already use RocksDB for state management. RocksDB is an > embedded > >>> DB and streaming engines can use it to store state instead of memory > >>> storage. > >>> > >>> So seems to me, it is proven to be good choice for large state usage. > But > >>> Spark SS still lacks of a built-in state store for the requirement. > >>> > >>> Previously there was one attempt SPARK-28120 to add RocksDB StateStore > into > >>> Spark SS. IIUC, it was pushed back due to two concerns: extra code > >>> maintenance cost and it introduces RocksDB dependency. > >>> > >>> For the first concern, as more users require to use the feature, it > should > >>> be highly used code in SS and more developers will look at it. For > second > >>> one, we propose (SPARK-34198) to add it as an external module to > relieve the > >>> dependency concern. > >>> > >>> Because it was pushed back previously, I'm going to raise this > discussion to > >>> know what people think about it now, in advance of submitting any code. > >>> > >>> I think there might be some possible opinions: > >>> > >>> 1. okay to add RocksDB StateStore into sql core module > >>> 2. not okay for 1, but okay to add RocksDB StateStore as external > module > >>> 3. either 1 or 2 is okay > >>> 4. not okay to add RocksDB StateStore, no matter into sql core or as > >>> external module > >>> > >>> Please let us know if you have some thoughts. > >>> > >>> Thank you. > >>> > >>> Liang-Chi Hsieh > >>> > >>> > >>> > >>> > >>> -- > >>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> > >> > >> > >> -- > >> Twitter: https://twitter.com/holdenkarau > >> Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > > > -- > Sincerely, > > DB Tsai > ---------------------------------------------------------- > Web: https://www.dbtsai.com > PGP Key ID: 42E5B25A8F7A82C1 > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >