Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Yu Li Wed, 02 Sep 2020 01:53:57 -0700

Thanks for compiling the FLIP Xintong, and +1 for the updated doc.

Just one supplement for the RocksDB state backend part:


It's true that currently we're using managed memory at the slot scope.
However, IMHO, we may support setting weights for different stateful
operators (for advanced usage) in future. For example, users may choose to
set higher weights for join operator over aggregation operator, to give
more memory to those with bigger states. In this case, we may also use
managed memory at the operator scope for state backends. And if I
understand correctly, the current design could cover this case well.

Best Regards,
Yu


On Wed, 2 Sep 2020 at 15:39, Xintong Song <tonysong...@gmail.com> wrote:

> Thanks all for the feedback and discussion.
>
> I have updated the FLIP, with the following changes.
>
>    - Choose the main proposal over the alternative approach
>    - Combine weights of RocksDB and batch operators
>    - Expose weights through configuration options, rather than via
>    ExecutionConfig.
>    - Add implementation plan.
>
> Please help take another look.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Sep 2, 2020 at 2:41 PM Xintong Song <tonysong...@gmail.com> wrote:
>
> > Thanks for the inputs, Aljoscha & Till.
> >
> >
> > # Weight Configuration
> >
> >
> > I think exposing the knobs incrementally is a good idea. However, I'm not
> > sure about non-configurable as the first step.
> >
> >
> > Currently, users can tune memory for rocksdb
> > ('taskmanager.memory.managed.size') and python
> > ('python.fn-execution.[framework|buffer].memory.size') separately, which
> > practically means any combination of rocksdb and python memory sizes. If
> we
> > switch to non-configurable weights, that will be a regression compared to
> > 1.11.
> >
> >
> > Therefore, I think exposing via configuration options might be a good
> > first step. And we can discuss exposing via ExecutionConfig if later we
> see
> > that requirement.
> >
> >
> > # Naming of Weights
> >
> >
> > I'm neutral for "Flink/Internal memory".
> >
> >
> > I think the reason we can combine weights for batch algorithms and state
> > backends is that they are never mixed together. My only concern
> > for "Flink/Internal memory", which might not be a problem at the moment,
> is
> > that what if new memory use cases appear in the future, which can also be
> > described by "Flink/Internal memory" but is not guaranteed not mixed with
> > batch algorithms or state backends?
> >
> >
> > Anyway, I think the naming should not block this FLIP, as long as we have
> > consensus on combining the two weights for rocksdb and batch algorithms.
> We
> > can keep the naming discussion open until the implementation phase.
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Sep 1, 2020 at 10:19 PM Till Rohrmann <trohrm...@apache.org>
> > wrote:
> >
> >> Thanks for creating this FLIP Xintong.
> >>
> >> I agree with the previous comments that the memory configuration should
> be
> >> as easy as possible. Every new knob has the potential to confuse users
> >> and/or allows him to shoot himself in the foot. Consequently, I am +1
> for
> >> the first proposal in the FLIP since it is simpler.
> >>
> >> Also +1 for Stephan's proposal to combine batch operator's and
> >> RocksDB's memory usage into one weight.
> >>
> >> Concerning the names for the two weights, I fear that we are facing one
> of
> >> the two hard things in computer science. To add another idea, we could
> >> name
> >> them "Flink memory"/"Internal memory" and "Python memory".
> >>
> >> For the sake of making the scope of the FLIP as small as possible and to
> >> develop the feature incrementally, I think that Aljoscha's proposal to
> >> make
> >> it non-configurable for the first step sounds like a good idea. As a
> next
> >> step (and also if we see need), we can make the memory weights
> >> configurable
> >> via the configuration. And last, we could expose it via the
> >> ExecutionConfig
> >> if it is required.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Tue, Sep 1, 2020 at 2:24 PM Aljoscha Krettek <aljos...@apache.org>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > playing devils advocate here: should we even make the memory weights
> >> > configurable? We could go with weights that should make sense for most
> >> > cases in the first version and only introduce configurable weights
> when
> >> > (if) users need them.
> >> >
> >> > Regarding where/how things are configured, I think that most things
> >> > should be a ConfigOption first (Thanks cc'in me, Stephan!). This makes
> >> > them configurable via flink-conf.yaml and via command line parameters,
> >> > for example "bin/flink run -D memory.foo=bla ...". We can think about
> >> > offering programmatic API for cases where it makes sense, of course.
> >> >
> >> > Regarding naming one of the configurable weights
> >> > "StateBackend-BatchAlgorithm". I think it's not a good idea to be that
> >> > specific because the option will not age well. For example when we
> want
> >> > to change which group of memory consumers are configured together or
> >> > when we add something new.
> >> >
> >> > Best,
> >> > Aljoscha
> >> >
> >> > On 31.08.20 08:13, Xintong Song wrote:
> >> > > Thanks for the feedbacks, @Stephan
> >> > >
> >> > >
> >> > >    - There is a push to make as much as possible configurable via
> the
> >> > main
> >> > >> configuration, and not only in code. Specifically values for
> >> operations
> >> > and
> >> > >> tuning.
> >> > >>      I think it would be more important to have such memory weights
> >> in
> >> > the
> >> > >> config, compared to in the program API. /cc Aljoscha
> >> > >
> >> > >
> >> > > I can see the benefit that having memory weights in the main
> >> > configuration
> >> > > makes tuning easier, which makes great sense to me. On the other
> hand,
> >> > what
> >> > > we lose is the flexibility to have different weights for jobs
> running
> >> in
> >> > > the same Flink cluster. It seems to me the problem is that we don't
> >> have
> >> > an
> >> > > easy way to overwrite job-specific configurations without touching
> the
> >> > > codes.
> >> > >
> >> > >
> >> > > Given the current status, what if we make the memory weights
> >> configurable
> >> > > through both the main configuration and the programming API? The
> main
> >> > > configuration should take effect iff the weights are not explicitly
> >> > > specified through the programming API. In this way, job cluster
> users
> >> can
> >> > > easily tune the weight through the main configuration, while session
> >> > > cluster users, if they want to have different weights for jobs, can
> >> still
> >> > > overwrite the weight through execution configs.
> >> > >
> >> > >
> >> > >    - My recommendation would be to keep this as simple as possible.
> >> This
> >> > >> will make a lot of configuration code harder, and make it harder
> for
> >> > users
> >> > >> to understand Flink's memory model.
> >> > >>      Making things as easy for users to understand is very
> important
> >> in
> >> > my
> >> > >> opinion. In that regard, the main proposal in the FLIP seems better
> >> than
> >> > >> the alternative proposal listed at the end of the FLIP page.
> >> > >
> >> > > +1 from my side.
> >> > >
> >> > >
> >> > >    - For the simplicity, we could go even further and simply have
> two
> >> > memory
> >> > >> users at the moment: The operator algorithm/data-structure and the
> >> > external
> >> > >> language process (Python for now).
> >> > >>      We never have batch algos and RocksDB mixed, having this as
> >> > separate
> >> > >> options is confusing as it suggests this can be combined
> >> arbitrarily. I
> >> > >> also think that a slim possibility that we may ever combine this in
> >> the
> >> > >> future is not enough reason to make it more complex/confusing.
> >> > >
> >> > >
> >> > > Good point. +1 for combining batch/rocksdb weights, for they're
> never
> >> > mixed
> >> > > together. We can even just name it "StateBackend-BatchAlgorithm" for
> >> > > explicitly.
> >> > >
> >> > >
> >> > > For "external language process", I'm not entirely sure. Future
> >> external
> >> > > languages are possibly mixed with python processes. To avoid later
> >> > > considering how to share external language memory across different
> >> > > languages, I would suggest to present the concept as "python memory"
> >> > rather
> >> > > than "external language process memory".
> >> > >
> >> > >
> >> > > Thank you~
> >> > >
> >> > > Xintong Song
> >> > >
> >> > >
> >> > >
> >> > > On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org>
> >> wrote:
> >> > >
> >> > >> Thanks for driving this proposal. A few thoughts on the current
> >> design:
> >> > >>
> >> > >>    - There is a push to make as much as possible configurable via
> the
> >> > main
> >> > >> configuration, and not only in code. Specifically values for
> >> operations
> >> > and
> >> > >> tuning.
> >> > >>      I think it would be more important to have such memory weights
> >> in
> >> > the
> >> > >> config, compared to in the program API. /cc Aljoscha
> >> > >>
> >> > >>    - My recommendation would be to keep this as simple as possible.
> >> This
> >> > >> will make a lot of configuration code harder, and make it harder
> for
> >> > users
> >> > >> to understand Flink's memory model.
> >> > >>      Making things as easy for users to understand is very
> important
> >> in
> >> > my
> >> > >> opinion. In that regard, the main proposal in the FLIP seems better
> >> than
> >> > >> the alternative proposal listed at the end of the FLIP page.
> >> > >>
> >> > >>    - For the simplicity, we could go even further and simply have
> two
> >> > memory
> >> > >> users at the moment: The operator algorithm/data-structure and the
> >> > external
> >> > >> language process (Python for now).
> >> > >>      We never have batch algos and RocksDB mixed, having this as
> >> > separate
> >> > >> options is confusing as it suggests this can be combined
> >> arbitrarily. I
> >> > >> also think that a slim possibility that we may ever combine this in
> >> the
> >> > >> future is not enough reason to make it more complex/confusing.
> >> > >>
> >> > >>    - I am also not aware of any plans to combine the network and
> >> > operator
> >> > >> memory. Not that it would be infeasible to do this, but I think
> this
> >> > would
> >> > >> also be orthogonal to this change, and I am not sure this would be
> >> > solved
> >> > >> with static weights. So trying to get network memory into this
> >> proposal
> >> > >> seems pre-mature to me.
> >> > >>
> >> > >> Best,
> >> > >> Stephan
> >> > >>
> >> > >>
> >> > >> On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <
> tonysong...@gmail.com
> >> >
> >> > >> wrote:
> >> > >>
> >> > >>>>
> >> > >>>> A quick question, does network memory treated as managed memory
> >> now?
> >> > Or
> >> > >>> in
> >> > >>>> the future?
> >> > >>>>
> >> > >>> No, network memory is independent from managed memory ATM. And I'm
> >> not
> >> > >>> aware of any plan to combine these two.
> >> > >>>
> >> > >>> Any insights there?
> >> > >>>
> >> > >>> Thank you~
> >> > >>>
> >> > >>> Xintong Song
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <ykt...@gmail.com>
> >> wrote:
> >> > >>>
> >> > >>>> A quick question, does network memory treated as managed memory
> >> now?
> >> > Or
> >> > >>> in
> >> > >>>> the future?
> >> > >>>>
> >> > >>>> Best,
> >> > >>>> Kurt
> >> > >>>>
> >> > >>>>
> >> > >>>> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <
> >> tonysong...@gmail.com>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>> Hi devs,
> >> > >>>>>
> >> > >>>>> I'd like to bring the discussion over FLIP-141[1], which
> proposes
> >> how
> >> > >>>>> managed memory should be shared by various use cases within a
> >> slot.
> >> > >>> This
> >> > >>>> is
> >> > >>>>> an extension to FLIP-53[2], where we assumed that RocksDB state
> >> > >> backend
> >> > >>>> and
> >> > >>>>> batch operators are the only use cases of managed memory for
> >> > >> streaming
> >> > >>>> and
> >> > >>>>> batch jobs respectively, which is no longer true with the
> >> > >> introduction
> >> > >>> of
> >> > >>>>> Python UDFs.
> >> > >>>>>
> >> > >>>>> Please notice that we have not reached consensus between two
> >> > >> different
> >> > >>>>> designs. The major part of this FLIP describes one of the
> >> candidates,
> >> > >>>> while
> >> > >>>>> the alternative is discussed in the section "Rejected
> >> Alternatives".
> >> > >> We
> >> > >>>> are
> >> > >>>>> hoping to borrow intelligence from the community to help us
> >> resolve
> >> > >> the
> >> > >>>>> disagreement.
> >> > >>>>>
> >> > >>>>> Any feedback would be appreciated.
> >> > >>>>>
> >> > >>>>> Thank you~
> >> > >>>>>
> >> > >>>>> Xintong Song
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> [1]
> >> > >>>>>
> >> > >>>>>
> >> > >>>>
> >> > >>>
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> >> > >>>>>
> >> > >>>>> [2]
> >> > >>>>>
> >> > >>>>>
> >> > >>>>
> >> > >>>
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >> > >>>>>
> >> > >>>>
> >> > >>>
> >> > >>
> >> > >
> >> >
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Reply via email to