Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Aljoscha Krettek Tue, 01 Sep 2020 05:24:59 -0700

Hi,

playing devils advocate here: should we even make the memory weightsconfigurable? We could go with weights that should make sense for mostcases in the first version and only introduce configurable weights when(if) users need them.

Regarding where/how things are configured, I think that most thingsshould be a ConfigOption first (Thanks cc'in me, Stephan!). This makesthem configurable via flink-conf.yaml and via command line parameters,for example "bin/flink run -D memory.foo=bla ...". We can think aboutoffering programmatic API for cases where it makes sense, of course.

Regarding naming one of the configurable weights"StateBackend-BatchAlgorithm". I think it's not a good idea to be thatspecific because the option will not age well. For example when we wantto change which group of memory consumers are configured together orwhen we add something new.


Best,
Aljoscha

On 31.08.20 08:13, Xintong Song wrote:

Thanks for the feedbacks, @Stephan


   - There is a push to make as much as possible configurable via the main

configuration, and not only in code. Specifically values for operations and
tuning.
     I think it would be more important to have such memory weights in the
config, compared to in the program API. /cc Aljoscha



I can see the benefit that having memory weights in the main configuration
makes tuning easier, which makes great sense to me. On the other hand, what
we lose is the flexibility to have different weights for jobs running in
the same Flink cluster. It seems to me the problem is that we don't have an
easy way to overwrite job-specific configurations without touching the
codes.


Given the current status, what if we make the memory weights configurable
through both the main configuration and the programming API? The main
configuration should take effect iff the weights are not explicitly
specified through the programming API. In this way, job cluster users can
easily tune the weight through the main configuration, while session
cluster users, if they want to have different weights for jobs, can still
overwrite the weight through execution configs.


   - My recommendation would be to keep this as simple as possible. This

will make a lot of configuration code harder, and make it harder for users
to understand Flink's memory model.
     Making things as easy for users to understand is very important in my
opinion. In that regard, the main proposal in the FLIP seems better than
the alternative proposal listed at the end of the FLIP page.


+1 from my side.


   - For the simplicity, we could go even further and simply have two memory

users at the moment: The operator algorithm/data-structure and the external
language process (Python for now).
     We never have batch algos and RocksDB mixed, having this as separate
options is confusing as it suggests this can be combined arbitrarily. I
also think that a slim possibility that we may ever combine this in the
future is not enough reason to make it more complex/confusing.



Good point. +1 for combining batch/rocksdb weights, for they're never mixed
together. We can even just name it "StateBackend-BatchAlgorithm" for
explicitly.


For "external language process", I'm not entirely sure. Future external
languages are possibly mixed with python processes. To avoid later
considering how to share external language memory across different
languages, I would suggest to present the concept as "python memory" rather
than "external language process memory".


Thank you~

Xintong Song



On Sun, Aug 30, 2020 at 10:19 PM Stephan Ewen <se...@apache.org> wrote:

Thanks for driving this proposal. A few thoughts on the current design:

   - There is a push to make as much as possible configurable via the main
configuration, and not only in code. Specifically values for operations and
tuning.
     I think it would be more important to have such memory weights in the
config, compared to in the program API. /cc Aljoscha

   - My recommendation would be to keep this as simple as possible. This
will make a lot of configuration code harder, and make it harder for users
to understand Flink's memory model.
     Making things as easy for users to understand is very important in my
opinion. In that regard, the main proposal in the FLIP seems better than
the alternative proposal listed at the end of the FLIP page.

   - For the simplicity, we could go even further and simply have two memory
users at the moment: The operator algorithm/data-structure and the external
language process (Python for now).
     We never have batch algos and RocksDB mixed, having this as separate
options is confusing as it suggests this can be combined arbitrarily. I
also think that a slim possibility that we may ever combine this in the
future is not enough reason to make it more complex/confusing.

   - I am also not aware of any plans to combine the network and operator
memory. Not that it would be infeasible to do this, but I think this would
also be orthogonal to this change, and I am not sure this would be solved
with static weights. So trying to get network memory into this proposal
seems pre-mature to me.

Best,
Stephan


On Fri, Aug 28, 2020 at 10:48 AM Xintong Song <tonysong...@gmail.com>
wrote:


A quick question, does network memory treated as managed memory now? Or

in

the future?

No, network memory is independent from managed memory ATM. And I'm not
aware of any plan to combine these two.

Any insights there?

Thank you~

Xintong Song



On Fri, Aug 28, 2020 at 4:35 PM Kurt Young <ykt...@gmail.com> wrote:

A quick question, does network memory treated as managed memory now? Or

in

the future?

Best,
Kurt


On Wed, Aug 26, 2020 at 5:32 PM Xintong Song <tonysong...@gmail.com>
wrote:

Hi devs,

I'd like to bring the discussion over FLIP-141[1], which proposes how
managed memory should be shared by various use cases within a slot.

This

is

an extension to FLIP-53[2], where we assumed that RocksDB state

backend

and

batch operators are the only use cases of managed memory for

streaming

and

batch jobs respectively, which is no longer true with the

introduction

of

Python UDFs.

Please notice that we have not reached consensus between two

different

designs. The major part of this FLIP describes one of the candidates,

while

the alternative is discussed in the section "Rejected Alternatives".

We

are

hoping to borrow intelligence from the community to help us resolve

the

disagreement.

Any feedback would be appreciated.

Thank you~

Xintong Song


[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility

[2]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

Reply via email to