Sorry for forgetting to give the link of the FLIP, here it is:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend

Thanks!

Best Regards,
Yu


On Tue, 13 Aug 2019 at 18:06, Yu Li <car...@gmail.com> wrote:

> Hi All,
>
> We ever held a discussion about this feature before [1] but now opening
> another thread because after a second thought introducing a new backend
> instead of modifying the existing heap backend is a better option to
> prevent causing any regression or surprise to existing in-production usage.
> And since introducing a new backend is relatively big change, we regard it
> as a FLIP and need another discussion and voting process according to our
> newly drafted bylaw [2].
>
> Please allow me to quote the brief description from the old thread [1] for
> the convenience of those who noticed this feature for the first time:
>
>
> *HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink,
> since state lives as Java objects on the heap in HeapKeyedStateBackend and
> the de/serialization only happens during state snapshot and restore, it
> outperforms RocksDBKeyeStateBackend when all data could reside in 
> memory.**However,
> along with the advantage, HeapKeyedStateBackend also has its shortcomings,
> and the most painful one is the difficulty to estimate the maximum heap
> size (Xmx) to set, and we will suffer from GC impact once the heap memory
> is not enough to hold all state data. There’re several (inevitable) causes
> for such scenario, including (but not limited to):*
>
>
>
> ** Memory overhead of Java object representation (tens of times of the
> serialized data size).* Data flood caused by burst traffic.* Data
> accumulation caused by source malfunction.**To resolve this problem, we
> proposed a solution to support spilling state data to disk before heap
> memory is exhausted. We will monitor the heap usage and choose the coldest
> data to spill, and reload them when heap memory is regained after data
> removing or TTL expiration, automatically. Furthermore, *to prevent
> causing unexpected regression to existing usage of HeapKeyedStateBackend,
> we plan to introduce a new SpillableHeapKeyedStateBackend and change it to
> default in future if proven to be stable.
>
> Please let us know your point of the feature and any comment is
> welcomed/appreciated. Thanks.
>
> [1] https://s.apache.org/pxeif
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
>
> Best Regards,
> Yu
>

Reply via email to