Hi Aljoscha,

I think we may need to divide `DATAPROC` into `OPERATOR` and
`STATE_BACKEND`, because they have different scope (slot vs. operator).
But @Xintong Song <tonysong...@gmail.com> may have more insights on it.

Best,
Jark


On Mon, 4 Jan 2021 at 20:44, Aljoscha Krettek <aljos...@apache.org> wrote:

> I agree, we should allow streaming operators to use managed memory for
> other use cases.
>
> Do you think we need an additional "consumer" setting or that they would
> just use `DATAPROC` and decide by themselves what to use the memory for?
>
> Best,
> Aljoscha
>
> On 2020/12/22 17:14, Jark Wu wrote:
> >Hi all,
> >
> >I found that currently the managed memory can only be used in 3 workloads
> >[1]:
> >- state backends for streaming jobs
> >- sorting, hash tables for batch jobs
> >- python UDFs
> >
> >And the configuration option `taskmanager.memory.managed.consumer-weights`
> >only allows values: PYTHON and DATAPROC (state in streaming or algorithms
> >in batch).
> >I'm confused why it doesn't allow streaming operators to use managed
> memory
> >for purposes other than state backends.
> >
> >The background is that we are planning to use some batch algorithms
> >(sorting & bytes hash table) to improve the performance of streaming SQL
> >operators, especially for the mini-batch operators.
> >Currently, the mini-batch operators are buffering input records and
> >accumulators in heap (i.e. Java HashMap) which is not efficient and there
> >are potential risks of full GC and OOM.
> >With the managed memory, we can fully use the memory to buffer more data
> >without worrying about OOM and improve the performance a lot.
> >
> >What do you think about allowing streaming operators to use managed memory
> >and exposing it in configuration.
> >
> >Best,
> >Jark
> >
> >[1]:
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/memory/mem_setup_tm.html#managed-memory
>

Reply via email to