Hi Aljoscha, I think we may need to divide `DATAPROC` into `OPERATOR` and `STATE_BACKEND`, because they have different scope (slot vs. operator). But @Xintong Song <tonysong...@gmail.com> may have more insights on it.
Best, Jark On Mon, 4 Jan 2021 at 20:44, Aljoscha Krettek <aljos...@apache.org> wrote: > I agree, we should allow streaming operators to use managed memory for > other use cases. > > Do you think we need an additional "consumer" setting or that they would > just use `DATAPROC` and decide by themselves what to use the memory for? > > Best, > Aljoscha > > On 2020/12/22 17:14, Jark Wu wrote: > >Hi all, > > > >I found that currently the managed memory can only be used in 3 workloads > >[1]: > >- state backends for streaming jobs > >- sorting, hash tables for batch jobs > >- python UDFs > > > >And the configuration option `taskmanager.memory.managed.consumer-weights` > >only allows values: PYTHON and DATAPROC (state in streaming or algorithms > >in batch). > >I'm confused why it doesn't allow streaming operators to use managed > memory > >for purposes other than state backends. > > > >The background is that we are planning to use some batch algorithms > >(sorting & bytes hash table) to improve the performance of streaming SQL > >operators, especially for the mini-batch operators. > >Currently, the mini-batch operators are buffering input records and > >accumulators in heap (i.e. Java HashMap) which is not efficient and there > >are potential risks of full GC and OOM. > >With the managed memory, we can fully use the memory to buffer more data > >without worrying about OOM and improve the performance a lot. > > > >What do you think about allowing streaming operators to use managed memory > >and exposing it in configuration. > > > >Best, > >Jark > > > >[1]: > > > https://ci.apache.org/projects/flink/flink-docs-master/deployment/memory/mem_setup_tm.html#managed-memory >