Thanks all for the reviews and comments! bq. From the implementation plan, it looks like this exists purely in a new module and does not require any changes in other parts of Flink's code. Can you confirm that? Confirmed, thanks!
Best Regards, Yu On Thu, 15 Aug 2019 at 18:04, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > +1 to start a VOTE for this FLIP. > > Given the properties of this new state backend and that it will exist as a > new module without touching the original heap backend, I don't see a harm > in including this. > Regarding design of the feature, I've already mentioned my comments in the > original discussion thread. > > Cheers, > Gordon > > On Thu, Aug 15, 2019 at 5:53 PM Yun Tang <myas...@live.com> wrote: > > > Big +1 for this feature. > > > > Our customers including me, have ever met dilemma where we have to use > > window to aggregate events in applications like real-time monitoring. The > > larger of timer and window state, the poor performance of RocksDB. > However, > > switching to use FsStateBackend would always make me feel fear about the > > OOM errors. > > > > Look forward for more powerful enrichment to state-backend, and help > Flink > > to achieve better performance together. > > > > Best > > Yun Tang > > ________________________________ > > From: Stephan Ewen <se...@apache.org> > > Sent: Thursday, August 15, 2019 23:07 > > To: dev <dev@flink.apache.org> > > Subject: Re: [DISCUSS] FLIP-50: Spill-able Heap Keyed State Backend > > > > +1 for this feature. I think this will be appreciated by users, as a way > to > > use the HeapStateBackend with a safety-net against OOM errors. > > And having had major production exposure is great. > > > > From the implementation plan, it looks like this exists purely in a new > > module and does not require any changes in other parts of Flink's code. > Can > > you confirm that? > > > > Other that that, I have no further questions and we could proceed to vote > > on this FLIP, from my side. > > > > Best, > > Stephan > > > > > > On Tue, Aug 13, 2019 at 10:00 PM Yu Li <car...@gmail.com> wrote: > > > > > Sorry for forgetting to give the link of the FLIP, here it is: > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend > > > > > > Thanks! > > > > > > Best Regards, > > > Yu > > > > > > > > > On Tue, 13 Aug 2019 at 18:06, Yu Li <car...@gmail.com> wrote: > > > > > > > Hi All, > > > > > > > > We ever held a discussion about this feature before [1] but now > opening > > > > another thread because after a second thought introducing a new > backend > > > > instead of modifying the existing heap backend is a better option to > > > > prevent causing any regression or surprise to existing in-production > > > usage. > > > > And since introducing a new backend is relatively big change, we > regard > > > it > > > > as a FLIP and need another discussion and voting process according to > > our > > > > newly drafted bylaw [2]. > > > > > > > > Please allow me to quote the brief description from the old thread > [1] > > > for > > > > the convenience of those who noticed this feature for the first time: > > > > > > > > > > > > *HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink, > > > > since state lives as Java objects on the heap in > HeapKeyedStateBackend > > > and > > > > the de/serialization only happens during state snapshot and restore, > it > > > > outperforms RocksDBKeyeStateBackend when all data could reside in > > > memory.**However, > > > > along with the advantage, HeapKeyedStateBackend also has its > > > shortcomings, > > > > and the most painful one is the difficulty to estimate the maximum > heap > > > > size (Xmx) to set, and we will suffer from GC impact once the heap > > memory > > > > is not enough to hold all state data. There’re several (inevitable) > > > causes > > > > for such scenario, including (but not limited to):* > > > > > > > > > > > > > > > > ** Memory overhead of Java object representation (tens of times of > the > > > > serialized data size).* Data flood caused by burst traffic.* Data > > > > accumulation caused by source malfunction.**To resolve this problem, > we > > > > proposed a solution to support spilling state data to disk before > heap > > > > memory is exhausted. We will monitor the heap usage and choose the > > > coldest > > > > data to spill, and reload them when heap memory is regained after > data > > > > removing or TTL expiration, automatically. Furthermore, *to prevent > > > > causing unexpected regression to existing usage of > > HeapKeyedStateBackend, > > > > we plan to introduce a new SpillableHeapKeyedStateBackend and change > it > > > to > > > > default in future if proven to be stable. > > > > > > > > Please let us know your point of the feature and any comment is > > > > welcomed/appreciated. Thanks. > > > > > > > > [1] https://s.apache.org/pxeif > > > > [2] > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026 > > > > > > > > Best Regards, > > > > Yu > > > > > > > > > >