Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Xintong Song Wed, 20 Jan 2021 18:00:52 -0800

I think this makes sense.

The semantic of a SSG is that operators in the group *can* be scheduled
together in a slot, which is not a *must*. Specifying resources for SSGs
should not change that semantic. In cases that needs for scheduling the
operators into different slots arise, it makes sense for the runtime to
derive the finer grained resource requirements, if not provided.


We may not need to implement this at the moment since currently SSGs are
always respected, but we should make that semantic explicit in JavaDocs for
the interfaces and user documentations when the user APIs are exposed.

Thank you~

Xintong Song



On Thu, Jan 21, 2021 at 1:55 AM Till Rohrmann <[email protected]> wrote:

> Maybe a different minor idea: Would it be possible to treat the SSG
> resource requirements as a hint for the runtime similar to how slot sharing
> groups are designed at the moment? Meaning that we don't give the guarantee
> that Flink will always deploy this set of tasks together no matter what
> comes. If, for example, the runtime can derive by some means the resource
> requirements for each task based on the requirements for the SSG, this
> could be possible. One easy strategy would be to give every task the same
> resources as the whole slot sharing group. Another one could be
> distributing the resources equally among the tasks. This does not even have
> to be implemented but we would give ourselves the freedom to change
> scheduling if need should arise.
>
> Cheers,
> Till
>
> On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <[email protected]> wrote:
>
> > Thanks for the responses, Till and Xintong.
> >
> > I second Xintong's comment that SSG-based runtime interface will give
> > us the flexibility to achieve op/task-based approach. That's one of
> > the most important reasons for our design choice.
> >
> > Some cents regarding the default operator resource:
> > - It might be good for the scenario of DataStream jobs.
> >    ** For light-weight operators, the accumulative configuration error
> > will not be significant. Then, the resource of a task used is
> > proportional to the number of operators it contains.
> >    ** For heavy operators like join and window or operators using the
> > external resources, user will turn to the fine-grained resource
> > configuration.
> > - It can increase the stability for the standalone cluster where task
> > executors registered are heterogeneous(with different default slot
> > resources).
> > - It might not be good for SQL users. The operators that SQL will be
> > transferred to is a black box to the user. We also do not guarantee
> > the cross-version of consistency of the transformation so far.
> >
> > I think it can be treated as a follow-up work when the fine-grained
> > resource management is end-to-end ready.
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Wed, Jan 20, 2021 at 11:16 AM Xintong Song <[email protected]>
> > wrote:
> > >
> > > Thanks for the feedback, Till.
> > >
> > > ## I feel that what you proposed (operator-based + default value) might
> > be
> > > subsumed by the SSG-based approach.
> > > Thinking of op_1 -> op_2, there are the following 4 cases, categorized
> by
> > > whether the resource requirements are known to the users.
> > >
> > >    1. *Both known.* As previously mentioned, there's no reason to put
> > >    multiple operators whose individual resource requirements are
> already
> > known
> > >    into the same group in fine-grained resource management. And if op_1
> > and
> > >    op_2 are in different groups, there should be no problem switching
> > data
> > >    exchange mode from pipelined to blocking. This is equivalent to
> > specifying
> > >    operator resource requirements in your proposal.
> > >    2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is
> in a
> > >    SSG whose resource is not specified thus would have the default slot
> > >    resource. This is equivalent to having default operator resources in
> > your
> > >    proposal.
> > >    3. *Both unknown*. The user can either set op_1 and op_2 to the same
> > SSG
> > >    or separate SSGs.
> > >       - If op_1 and op_2 are in the same SSG, it will be equivalent to
> > the
> > >       coarse-grained resource management, where op_1 and op_2 share a
> > default
> > >       size slot no matter which data exchange mode is used.
> > >       - If op_1 and op_2 are in different SSGs, then each of them will
> > use
> > >       a default size slot. This is equivalent to setting them with
> > default
> > >       operator resources in your proposal.
> > >    4. *Total (pipeline) or max (blocking) of op_1 and op_2 is known.*
> > >       - It is possible that the user learns the total / max resource
> > >       requirement from executing and monitoring the job, while not
> > > being aware of
> > >       individual operator requirements.
> > >       - I believe this is the case your proposal does not cover. And
> TBH,
> > >       this is probably how most users learn the resource requirements,
> > > according
> > >       to my experiences.
> > >       - In this case, the user might need to specify different
> resources
> > if
> > >       he wants to switch the execution mode, which should not be worse
> > than not
> > >       being able to use fine-grained resource management.
> > >
> > >
> > > ## An additional idea inspired by your proposal.
> > > We may provide multiple options for deciding resources for SSGs whose
> > > requirement is not specified, if needed.
> > >
> > >    - Default slot resource (current design)
> > >    - Default operator resource times number of operators (equivalent to
> > >    your proposal)
> > >
> > >
> > > ## Exposing internal runtime strategies
> > > Theoretically, yes. Tying to the SSGs, the resource requirements might
> be
> > > affected if how SSGs are internally handled changes in future.
> > Practically,
> > > I do not concretely see at the moment what kind of changes we may want
> in
> > > future that might conflict with this FLIP proposal, as the question of
> > > switching data exchange mode answered above. I'd suggest to not give up
> > the
> > > user friendliness we may gain now for the future problems that may or
> may
> > > not exist.
> > >
> > > Moreover, the SSG-based approach has the flexibility to achieve the
> > > equivalent behavior as the operator-based approach, if we set each
> > operator
> > > (or task) to a separate SSG. We can even provide a shortcut option to
> > > automatically do that for users, if needed.
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <[email protected]>
> > wrote:
> > >
> > > > Thanks for the responses Xintong and Stephan,
> > > >
> > > > I agree that being able to define the resource requirements for a
> > group of
> > > > operators is more user friendly. However, my concern is that we are
> > > > exposing thereby internal runtime strategies which might limit our
> > > > flexibility to execute a given job. Moreover, the semantics of
> > configuring
> > > > resource requirements for SSGs could break if switching from
> streaming
> > to
> > > > batch execution. If one defines the resource requirements for op_1 ->
> > op_2
> > > > which run in pipelined mode when using the streaming execution, then
> > how do
> > > > we interpret these requirements when op_1 -> op_2 are executed with a
> > > > blocking data exchange in batch execution mode? Consequently, I am
> > still
> > > > leaning towards Stephan's proposal to set the resource requirements
> per
> > > > operator.
> > > >
> > > > Maybe the following proposal makes the configuration easier: If the
> > user
> > > > wants to use fine-grained resource requirements, then she needs to
> > specify
> > > > the default size which is used for operators which have no explicit
> > > > resource annotation. If this holds true, then every operator would
> > have a
> > > > resource requirement and the system can try to execute the operators
> > in the
> > > > best possible manner w/o being constrained by how the user set the
> SSG
> > > > requirements.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <[email protected]>
> > > > wrote:
> > > >
> > > > > Thanks for the feedback, Stephan.
> > > > >
> > > > > Actually, your proposal has also come to my mind at some point.
> And I
> > > > have
> > > > > some concerns about it.
> > > > >
> > > > >
> > > > > 1. It does not give users the same control as the SSG-based
> approach.
> > > > >
> > > > >
> > > > > While both approaches do not require specifying for each operator,
> > > > > SSG-based approach supports the semantic that "some operators
> > together
> > > > use
> > > > > this much resource" while the operator-based approach doesn't.
> > > > >
> > > > >
> > > > > Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and
> > at
> > > > some
> > > > > point there's an agg o_n (1 < n < m) which significantly reduces
> the
> > data
> > > > > amount. One can separate the pipeline into 2 groups SSG_1 (o_1,
> ...,
> > o_n)
> > > > > and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> > parallelisms
> > > > > for operators in SSG_1 than for operators in SSG_2 won't lead to
> too
> > much
> > > > > wasting of resources. If the two SSGs end up needing different
> > resources,
> > > > > with the SSG-based approach one can directly specify resources for
> > the
> > > > two
> > > > > groups. However, with the operator-based approach, the user will
> > have to
> > > > > specify resources for each operator in one of the two groups, and
> > tune
> > > > the
> > > > > default slot resource via configurations to fit the other group.
> > > > >
> > > > >
> > > > > 2. It increases the chance of breaking operator chains.
> > > > >
> > > > >
> > > > > Setting chainnable operators into different slot sharing groups
> will
> > > > > prevent them from being chained. In the current implementation,
> > > > downstream
> > > > > operators, if SSG not explicitly specified, will be set to the same
> > group
> > > > > as the chainable upstream operators (unless multiple upstream
> > operators
> > > > in
> > > > > different groups), to reduce the chance of breaking chains.
> > > > >
> > > > >
> > > > > Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding
> > SSGs
> > > > > based on whether resource is specified we will easily get groups
> like
> > > > (o_1,
> > > > > o_3) & (o_2, o_4), where none of the operators can be chained. This
> > is
> > > > also
> > > > > possible for the SSG-based approach, but I believe the chance is
> much
> > > > > smaller because there's no strong reason for users to specify the
> > groups
> > > > > with alternate operators like that. We are more likely to get
> groups
> > like
> > > > > (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2
> and
> > o_3.
> > > > >
> > > > >
> > > > > 3. It complicates the system by having two different mechanisms for
> > > > sharing
> > > > > managed memory in  a slot.
> > > > >
> > > > >
> > > > > - In FLIP-141, we introduced the intra-slot managed memory sharing
> > > > > mechanism, where managed memory is first distributed according to
> the
> > > > > consumer type, then further distributed across operators of that
> > consumer
> > > > > type.
> > > > >
> > > > > - With the operator-based approach, managed memory size specified
> > for an
> > > > > operator should account for all the consumer types of that
> operator.
> > That
> > > > > means the managed memory is first distributed across operators,
> then
> > > > > distributed to different consumer types of each operator.
> > > > >
> > > > >
> > > > > Unfortunately, the different order of the two calculation steps can
> > lead
> > > > to
> > > > > different results. To be specific, the semantic of the
> configuration
> > > > option
> > > > > `consumer-weights` changed (within a slot vs. within an operator).
> > > > >
> > > > >
> > > > >
> > > > > To sum up things:
> > > > >
> > > > > While (3) might be a bit more implementation related, I think (1)
> > and (2)
> > > > > somehow suggest that, the price for the proposed approach to avoid
> > > > > specifying resource for every operator is that it's not as
> > independent
> > > > from
> > > > > operator chaining and slot sharing as the operator-based approach
> > > > discussed
> > > > > in the FLIP.
> > > > >
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <[email protected]>
> > wrote:
> > > > >
> > > > > > Thanks a lot, Yangze and Xintong for this FLIP.
> > > > > >
> > > > > > I want to say, first of all, that this is super well written. And
> > the
> > > > > > points that the FLIP makes about how to expose the configuration
> to
> > > > users
> > > > > > is exactly the right thing to figure out first.
> > > > > > So good job here!
> > > > > >
> > > > > > About how to let users specify the resource profiles. If I can
> sum
> > the
> > > > > FLIP
> > > > > > and previous discussion up in my own words, the problem is the
> > > > following:
> > > > > >
> > > > > > Operator-level specification is the simplest and cleanest
> approach,
> > > > > because
> > > > > > > it avoids mixing operator configuration (resource) and
> > scheduling. No
> > > > > > > matter what other parameters change (chaining, slot sharing,
> > > > switching
> > > > > > > pipelined and blocking shuffles), the resource profiles stay
> the
> > > > same.
> > > > > > > But it would require that a user specifies resources on all
> > > > operators,
> > > > > > > which makes it hard to use. That's why the FLIP suggests going
> > with
> > > > > > > specifying resources on a Sharing-Group.
> > > > > >
> > > > > >
> > > > > > I think both thoughts are important, so can we find a solution
> > where
> > > > the
> > > > > > Resource Profiles are specified on an Operator, but we still
> avoid
> > that
> > > > > we
> > > > > > need to specify a resource profile on every operator?
> > > > > >
> > > > > > What do you think about something like the following:
> > > > > >   - Resource Profiles are specified on an operator level.
> > > > > >   - Not all operators need profiles
> > > > > >   - All Operators without a Resource Profile ended up in the
> > default
> > > > slot
> > > > > > sharing group with a default profile (will get a default slot).
> > > > > >   - All Operators with a Resource Profile will go into another
> slot
> > > > > sharing
> > > > > > group (the resource-specified-group).
> > > > > >   - Users can define different slot sharing groups for operators
> > like
> > > > > they
> > > > > > do now, with the exception that you cannot mix operators that
> have
> > a
> > > > > > resource profile and operators that have no resource profile.
> > > > > >   - The default case where no operator has a resource profile is
> > just a
> > > > > > special case of this model
> > > > > >   - The chaining logic sums up the profiles per operator, like it
> > does
> > > > > now,
> > > > > > and the scheduler sums up the profiles of the tasks that it
> > schedules
> > > > > > together.
> > > > > >
> > > > > >
> > > > > > There is another question about reactive scaling raised in the
> > FLIP. I
> > > > > need
> > > > > > to think a bit about that. That is indeed a bit more tricky once
> we
> > > > have
> > > > > > slots of different sizes.
> > > > > > It is not clear then which of the different slot requests the
> > > > > > ResourceManager should fulfill when new resources (TMs) show up,
> > or how
> > > > > the
> > > > > > JobManager redistributes the slots resources when resources (TMs)
> > > > > disappear
> > > > > > This question is pretty orthogonal, though, to the "how to
> specify
> > the
> > > > > > resources".
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Stephan
> > > > > >
> > > > > > On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Thanks for drafting the FLIP and driving the discussion,
> Yangze.
> > > > > > > And Thanks for the feedback, Till and Chesnay.
> > > > > > >
> > > > > > > @Till,
> > > > > > >
> > > > > > > I agree that specifying requirements for SSGs means that SSGs
> > need to
> > > > > be
> > > > > > > supported in fine-grained resource management, otherwise each
> > > > operator
> > > > > > > might use as many resources as the whole group. However, I
> cannot
> > > > think
> > > > > > of
> > > > > > > a strong reason for not supporting SSGs in fine-grained
> resource
> > > > > > > management.
> > > > > > >
> > > > > > >
> > > > > > > > Interestingly, if all operators have their resources properly
> > > > > > specified,
> > > > > > > > then slot sharing is no longer needed because Flink could
> > slice off
> > > > > the
> > > > > > > > appropriately sized slots for every Task individually.
> > > > > > > >
> > > > > > >
> > > > > > > So for example, if we have a job consisting of two operator
> op_1
> > and
> > > > > op_2
> > > > > > > > where each op needs 100 MB of memory, we would then say that
> > the
> > > > slot
> > > > > > > > sharing group needs 200 MB of memory to run. If we have a
> > cluster
> > > > > with
> > > > > > 2
> > > > > > > > TMs with one slot of 100 MB each, then the system cannot run
> > this
> > > > > job.
> > > > > > If
> > > > > > > > the resources were specified on an operator level, then the
> > system
> > > > > > could
> > > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> > TM_2.
> > > > > > >
> > > > > > >
> > > > > > > Couldn't agree more that if all operators' requirements are
> > properly
> > > > > > > specified, slot sharing should be no longer needed. I think
> this
> > > > > exactly
> > > > > > > disproves the example. If we already know op_1 and op_2 each
> > needs
> > > > 100
> > > > > MB
> > > > > > > of memory, why would we put them in the same group? If they are
> > in
> > > > > > separate
> > > > > > > groups, with the proposed approach the system can freely deploy
> > them
> > > > to
> > > > > > > either a 200 MB TM or two 100 MB TMs.
> > > > > > >
> > > > > > > Moreover, the precondition for not needing slot sharing is
> having
> > > > > > resource
> > > > > > > requirements properly specified for all operators. This is not
> > always
> > > > > > > possible, and usually requires tremendous efforts. One of the
> > > > benefits
> > > > > > for
> > > > > > > SSG-based requirements is that it allows the user to freely
> > decide
> > > > the
> > > > > > > granularity, thus efforts they want to pay. I would consider
> SSG
> > in
> > > > > > > fine-grained resource management as a group of operators that
> the
> > > > user
> > > > > > > would like to specify the total resource for. There can be only
> > one
> > > > > group
> > > > > > > in the job, 2~3 groups dividing the job into a few major parts,
> > or as
> > > > > > many
> > > > > > > groups as the number of tasks/operators, depending on how
> > > > fine-grained
> > > > > > the
> > > > > > > user is able to specify the resources.
> > > > > > >
> > > > > > > Having to support SSGs might be a constraint. But given that
> all
> > the
> > > > > > > current scheduler implementations already support SSGs, I tend
> to
> > > > think
> > > > > > > that as an acceptable price for the above discussed usability
> and
> > > > > > > flexibility.
> > > > > > >
> > > > > > > @Chesnay
> > > > > > >
> > > > > > > Will declaring them on slot sharing groups not also waste
> > resources
> > > > if
> > > > > > the
> > > > > > > > parallelism of operators within that group are different?
> > > > > > > >
> > > > > > > Yes. It's a trade-off between usability and resource
> > utilization. To
> > > > > > avoid
> > > > > > > such wasting, the user can define more groups, so that each
> group
> > > > > > contains
> > > > > > > less operators and the chance of having operators with
> different
> > > > > > > parallelism will be reduced. The price is to have more resource
> > > > > > > requirements to specify.
> > > > > > >
> > > > > > > It also seems like quite a hassle for users having to
> > recalculate the
> > > > > > > > resource requirements if they change the slot sharing.
> > > > > > > > I'd think that it's not really workable for users that create
> > a set
> > > > > of
> > > > > > > > re-usable operators which are mixed and matched in their
> > > > > applications;
> > > > > > > > managing the resources requirements in such a setting would
> be
> > a
> > > > > > > > nightmare, and in the end would require operator-level
> > requirements
> > > > > any
> > > > > > > > way.
> > > > > > > > In that sense, I'm not even sure whether it really increases
> > > > > usability.
> > > > > > > >
> > > > > > >
> > > > > > >    - As mentioned in my reply to Till's comment, there's no
> > reason to
> > > > > put
> > > > > > >    multiple operators whose individual resource requirements
> are
> > > > > already
> > > > > > > known
> > > > > > >    into the same group in fine-grained resource management.
> > > > > > >    - Even an operator implementation is reused for multiple
> > > > > applications,
> > > > > > >    it does not guarantee the same resource requirements. During
> > our
> > > > > years
> > > > > > > of
> > > > > > >    practices in Alibaba, with per-operator requirements
> > specified for
> > > > > > > Blink's
> > > > > > >    fine-grained resource management, very few users (including
> > our
> > > > > > > specialists
> > > > > > >    who are dedicated to supporting Blink users) are as
> > experienced as
> > > > > to
> > > > > > >    accurately predict/estimate the operator resource
> > requirements.
> > > > Most
> > > > > > > people
> > > > > > >    rely on the execution-time metrics (throughput, delay, cpu
> > load,
> > > > > > memory
> > > > > > >    usage, GC pressure, etc.) to improve the specification.
> > > > > > >
> > > > > > > To sum up:
> > > > > > > If the user is capable of providing proper resource
> requirements
> > for
> > > > > > every
> > > > > > > operator, that's definitely a good thing and we would not need
> to
> > > > rely
> > > > > on
> > > > > > > the SSGs. However, that shouldn't be a *must* for the
> > fine-grained
> > > > > > resource
> > > > > > > management to work. For those users who are capable and do not
> > like
> > > > > > having
> > > > > > > to set each operator to a separate SSG, I would be ok to have
> > both
> > > > > > > SSG-based and operator-based runtime interfaces and to only
> > fallback
> > > > to
> > > > > > the
> > > > > > > SSG requirements when the operator requirements are not
> > specified.
> > > > > > However,
> > > > > > > as the first step, I think we should prioritise the use cases
> > where
> > > > > users
> > > > > > > are not that experienced.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Will declaring them on slot sharing groups not also waste
> > resources
> > > > > if
> > > > > > > > the parallelism of operators within that group are different?
> > > > > > > >
> > > > > > > > It also seems like quite a hassle for users having to
> > recalculate
> > > > the
> > > > > > > > resource requirements if they change the slot sharing.
> > > > > > > > I'd think that it's not really workable for users that create
> > a set
> > > > > of
> > > > > > > > re-usable operators which are mixed and matched in their
> > > > > applications;
> > > > > > > > managing the resources requirements in such a setting would
> be
> > a
> > > > > > > > nightmare, and in the end would require operator-level
> > requirements
> > > > > any
> > > > > > > > way.
> > > > > > > > In that sense, I'm not even sure whether it really increases
> > > > > usability.
> > > > > > > >
> > > > > > > > My main worry is that it if we wire the runtime to work on
> SSGs
> > > > it's
> > > > > > > > gonna be difficult to implement more fine-grained approaches,
> > which
> > > > > > > > would not be the case if, for the runtime, they are always
> > defined
> > > > on
> > > > > > an
> > > > > > > > operator-level.
> > > > > > > >
> > > > > > > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > > > > > Thanks for drafting this FLIP and starting this discussion
> > > > Yangze.
> > > > > > > > >
> > > > > > > > > I like that defining resource requirements on a slot
> sharing
> > > > group
> > > > > > > makes
> > > > > > > > > the overall setup easier and improves usability of resource
> > > > > > > requirements.
> > > > > > > > >
> > > > > > > > > What I do not like about it is that it changes slot sharing
> > > > groups
> > > > > > from
> > > > > > > > > being a scheduling hint to something which needs to be
> > supported
> > > > in
> > > > > > > order
> > > > > > > > > to support fine grained resource requirements. So far, the
> > idea
> > > > of
> > > > > > slot
> > > > > > > > > sharing groups was that it tells the system that a set of
> > > > operators
> > > > > > can
> > > > > > > > be
> > > > > > > > > deployed in the same slot. But the system still had the
> > freedom
> > > > to
> > > > > > say
> > > > > > > > that
> > > > > > > > > it would rather place these tasks in different slots if it
> > > > wanted.
> > > > > If
> > > > > > > we
> > > > > > > > > now specify resource requirements on a per slot sharing
> > group,
> > > > then
> > > > > > the
> > > > > > > > > only option for a scheduler which does not support slot
> > sharing
> > > > > > groups
> > > > > > > is
> > > > > > > > > to say that every operator in this slot sharing group
> needs a
> > > > slot
> > > > > > with
> > > > > > > > the
> > > > > > > > > same resources as the whole group.
> > > > > > > > >
> > > > > > > > > So for example, if we have a job consisting of two operator
> > op_1
> > > > > and
> > > > > > > op_2
> > > > > > > > > where each op needs 100 MB of memory, we would then say
> that
> > the
> > > > > slot
> > > > > > > > > sharing group needs 200 MB of memory to run. If we have a
> > cluster
> > > > > > with
> > > > > > > 2
> > > > > > > > > TMs with one slot of 100 MB each, then the system cannot
> run
> > this
> > > > > > job.
> > > > > > > If
> > > > > > > > > the resources were specified on an operator level, then the
> > > > system
> > > > > > > could
> > > > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> > TM_2.
> > > > > > > > >
> > > > > > > > > Originally, one of the primary goals of slot sharing groups
> > was
> > > > to
> > > > > > make
> > > > > > > > it
> > > > > > > > > easier for the user to reason about how many slots a job
> > needs
> > > > > > > > independent
> > > > > > > > > of the actual number of operators in the job.
> Interestingly,
> > if
> > > > all
> > > > > > > > > operators have their resources properly specified, then
> slot
> > > > > sharing
> > > > > > is
> > > > > > > > no
> > > > > > > > > longer needed because Flink could slice off the
> appropriately
> > > > sized
> > > > > > > slots
> > > > > > > > > for every Task individually. What matters is whether the
> > whole
> > > > > > cluster
> > > > > > > > has
> > > > > > > > > enough resources to run all tasks or not.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > [email protected]>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi, there,
> > > > > > > > >>
> > > > > > > > >> We would like to start a discussion thread on "FLIP-156:
> > Runtime
> > > > > > > > >> Interfaces for Fine-Grained Resource Requirements"[1],
> > where we
> > > > > > > > >> propose Slot Sharing Group (SSG) based runtime interfaces
> > for
> > > > > > > > >> specifying fine-grained resource requirements.
> > > > > > > > >>
> > > > > > > > >> In this FLIP:
> > > > > > > > >> - Expound the user story of fine-grained resource
> > management.
> > > > > > > > >> - Propose runtime interfaces for specifying SSG-based
> > resource
> > > > > > > > >> requirements.
> > > > > > > > >> - Discuss the pros and cons of the three potential
> > granularities
> > > > > for
> > > > > > > > >> specifying the resource requirements (op, task and slot
> > sharing
> > > > > > group)
> > > > > > > > >> and explain why we choose the slot sharing group.
> > > > > > > > >>
> > > > > > > > >> Please find more details in the FLIP wiki document [1].
> > Looking
> > > > > > > > >> forward to your feedback.
> > > > > > > > >>
> > > > > > > > >> [1]
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > > > > >>
> > > > > > > > >> Best,
> > > > > > > > >> Yangze Guo
> > > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Reply via email to