+1 for Jark's and Xintong's proposal.

Would the default weight for OPERATOR and STATE_BACKEND be the same value?

Cheers,
Till

On Tue, Jan 5, 2021 at 6:39 AM Jingsong Li <jingsongl...@gmail.com> wrote:

> +1 for allowing streaming operators to use managed memory.
>
> The memory use of streams requires some hierarchy, and the bottom layer is
> undoubtedly the current StateBackend.
> Let the stream operators freely use the managed memory, which will make the
> memory management model to be unified and give the operator free space.
>
> Xingtong's proposal looks good to me. +1 to split `DATAPROC` into
> `STATE_BACKEND` or `OPERATOR`.
>
> Best,
> Jingsong
>
> On Tue, Jan 5, 2021 at 12:33 PM Jark Wu <imj...@gmail.com> wrote:
>
> > +1 to Xingtong's proposal!
> >
> > Best,
> > Jark
> >
> > On Tue, 5 Jan 2021 at 12:13, Xintong Song <tonysong...@gmail.com> wrote:
> >
> > > +1 for allowing streaming operators to use managed memory.
> > >
> > > As for the consumer names, I'm afraid using `DATAPROC` for both
> streaming
> > > ops and state backends will not work. Currently, RocksDB state backend
> > uses
> > > a shared piece of memory for all the states within that slot. It's not
> > the
> > > operator's decision how much memory it uses for the states.
> > >
> > > I would suggest the following. (IIUC, the same as what Jark proposed)
> > > * `OPERATOR` for both streaming and bath operators
> > > * `STATE_BACKEND` for state backends
> > > * `PYTHON` for python processes
> > > * `DATAPROC` as a legacy key for state backend or batch operators if
> > > `STATE_BACKEND` or `OPERATOR` are not specified.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Jan 5, 2021 at 11:23 AM Jark Wu <imj...@gmail.com> wrote:
> > >
> > > > Hi Aljoscha,
> > > >
> > > > I think we may need to divide `DATAPROC` into `OPERATOR` and
> > > > `STATE_BACKEND`, because they have different scope (slot vs.
> operator).
> > > > But @Xintong Song <tonysong...@gmail.com> may have more insights on
> > it.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > > On Mon, 4 Jan 2021 at 20:44, Aljoscha Krettek <aljos...@apache.org>
> > > wrote:
> > > >
> > > >> I agree, we should allow streaming operators to use managed memory
> for
> > > >> other use cases.
> > > >>
> > > >> Do you think we need an additional "consumer" setting or that they
> > would
> > > >> just use `DATAPROC` and decide by themselves what to use the memory
> > for?
> > > >>
> > > >> Best,
> > > >> Aljoscha
> > > >>
> > > >> On 2020/12/22 17:14, Jark Wu wrote:
> > > >> >Hi all,
> > > >> >
> > > >> >I found that currently the managed memory can only be used in 3
> > > workloads
> > > >> >[1]:
> > > >> >- state backends for streaming jobs
> > > >> >- sorting, hash tables for batch jobs
> > > >> >- python UDFs
> > > >> >
> > > >> >And the configuration option
> > > >> `taskmanager.memory.managed.consumer-weights`
> > > >> >only allows values: PYTHON and DATAPROC (state in streaming or
> > > algorithms
> > > >> >in batch).
> > > >> >I'm confused why it doesn't allow streaming operators to use
> managed
> > > >> memory
> > > >> >for purposes other than state backends.
> > > >> >
> > > >> >The background is that we are planning to use some batch algorithms
> > > >> >(sorting & bytes hash table) to improve the performance of
> streaming
> > > SQL
> > > >> >operators, especially for the mini-batch operators.
> > > >> >Currently, the mini-batch operators are buffering input records and
> > > >> >accumulators in heap (i.e. Java HashMap) which is not efficient and
> > > there
> > > >> >are potential risks of full GC and OOM.
> > > >> >With the managed memory, we can fully use the memory to buffer more
> > > data
> > > >> >without worrying about OOM and improve the performance a lot.
> > > >> >
> > > >> >What do you think about allowing streaming operators to use managed
> > > >> memory
> > > >> >and exposing it in configuration.
> > > >> >
> > > >> >Best,
> > > >> >Jark
> > > >> >
> > > >> >[1]:
> > > >> >
> > > >>
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/memory/mem_setup_tm.html#managed-memory
> > > >>
> > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

Reply via email to