Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Till Rohrmann Tue, 19 Jan 2021 07:48:16 -0800

Thanks for the responses Xintong and Stephan,

I agree that being able to define the resource requirements for a group of
operators is more user friendly. However, my concern is that we are
exposing thereby internal runtime strategies which might limit our
flexibility to execute a given job. Moreover, the semantics of configuring
resource requirements for SSGs could break if switching from streaming to
batch execution. If one defines the resource requirements for op_1 -> op_2
which run in pipelined mode when using the streaming execution, then how do
we interpret these requirements when op_1 -> op_2 are executed with a
blocking data exchange in batch execution mode? Consequently, I am still
leaning towards Stephan's proposal to set the resource requirements per
operator.


Maybe the following proposal makes the configuration easier: If the user
wants to use fine-grained resource requirements, then she needs to specify
the default size which is used for operators which have no explicit
resource annotation. If this holds true, then every operator would have a
resource requirement and the system can try to execute the operators in the
best possible manner w/o being constrained by how the user set the SSG
requirements.

Cheers,
Till

On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <[email protected]> wrote:

> Thanks for the feedback, Stephan.
>
> Actually, your proposal has also come to my mind at some point. And I have
> some concerns about it.
>
>
> 1. It does not give users the same control as the SSG-based approach.
>
>
> While both approaches do not require specifying for each operator,
> SSG-based approach supports the semantic that "some operators together use
> this much resource" while the operator-based approach doesn't.
>
>
> Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and at some
> point there's an agg o_n (1 < n < m) which significantly reduces the data
> amount. One can separate the pipeline into 2 groups SSG_1 (o_1, ..., o_n)
> and SSG_2 (o_n+1, ... o_m), so that configuring much higher parallelisms
> for operators in SSG_1 than for operators in SSG_2 won't lead to too much
> wasting of resources. If the two SSGs end up needing different resources,
> with the SSG-based approach one can directly specify resources for the two
> groups. However, with the operator-based approach, the user will have to
> specify resources for each operator in one of the two groups, and tune the
> default slot resource via configurations to fit the other group.
>
>
> 2. It increases the chance of breaking operator chains.
>
>
> Setting chainnable operators into different slot sharing groups will
> prevent them from being chained. In the current implementation, downstream
> operators, if SSG not explicitly specified, will be set to the same group
> as the chainable upstream operators (unless multiple upstream operators in
> different groups), to reduce the chance of breaking chains.
>
>
> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding SSGs
> based on whether resource is specified we will easily get groups like (o_1,
> o_3) & (o_2, o_4), where none of the operators can be chained. This is also
> possible for the SSG-based approach, but I believe the chance is much
> smaller because there's no strong reason for users to specify the groups
> with alternate operators like that. We are more likely to get groups like
> (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2 and o_3.
>
>
> 3. It complicates the system by having two different mechanisms for sharing
> managed memory in  a slot.
>
>
> - In FLIP-141, we introduced the intra-slot managed memory sharing
> mechanism, where managed memory is first distributed according to the
> consumer type, then further distributed across operators of that consumer
> type.
>
> - With the operator-based approach, managed memory size specified for an
> operator should account for all the consumer types of that operator. That
> means the managed memory is first distributed across operators, then
> distributed to different consumer types of each operator.
>
>
> Unfortunately, the different order of the two calculation steps can lead to
> different results. To be specific, the semantic of the configuration option
> `consumer-weights` changed (within a slot vs. within an operator).
>
>
>
> To sum up things:
>
> While (3) might be a bit more implementation related, I think (1) and (2)
> somehow suggest that, the price for the proposed approach to avoid
> specifying resource for every operator is that it's not as independent from
> operator chaining and slot sharing as the operator-based approach discussed
> in the FLIP.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <[email protected]> wrote:
>
> > Thanks a lot, Yangze and Xintong for this FLIP.
> >
> > I want to say, first of all, that this is super well written. And the
> > points that the FLIP makes about how to expose the configuration to users
> > is exactly the right thing to figure out first.
> > So good job here!
> >
> > About how to let users specify the resource profiles. If I can sum the
> FLIP
> > and previous discussion up in my own words, the problem is the following:
> >
> > Operator-level specification is the simplest and cleanest approach,
> because
> > > it avoids mixing operator configuration (resource) and scheduling. No
> > > matter what other parameters change (chaining, slot sharing, switching
> > > pipelined and blocking shuffles), the resource profiles stay the same.
> > > But it would require that a user specifies resources on all operators,
> > > which makes it hard to use. That's why the FLIP suggests going with
> > > specifying resources on a Sharing-Group.
> >
> >
> > I think both thoughts are important, so can we find a solution where the
> > Resource Profiles are specified on an Operator, but we still avoid that
> we
> > need to specify a resource profile on every operator?
> >
> > What do you think about something like the following:
> >   - Resource Profiles are specified on an operator level.
> >   - Not all operators need profiles
> >   - All Operators without a Resource Profile ended up in the default slot
> > sharing group with a default profile (will get a default slot).
> >   - All Operators with a Resource Profile will go into another slot
> sharing
> > group (the resource-specified-group).
> >   - Users can define different slot sharing groups for operators like
> they
> > do now, with the exception that you cannot mix operators that have a
> > resource profile and operators that have no resource profile.
> >   - The default case where no operator has a resource profile is just a
> > special case of this model
> >   - The chaining logic sums up the profiles per operator, like it does
> now,
> > and the scheduler sums up the profiles of the tasks that it schedules
> > together.
> >
> >
> > There is another question about reactive scaling raised in the FLIP. I
> need
> > to think a bit about that. That is indeed a bit more tricky once we have
> > slots of different sizes.
> > It is not clear then which of the different slot requests the
> > ResourceManager should fulfill when new resources (TMs) show up, or how
> the
> > JobManager redistributes the slots resources when resources (TMs)
> disappear
> > This question is pretty orthogonal, though, to the "how to specify the
> > resources".
> >
> >
> > Best,
> > Stephan
> >
> > On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <[email protected]>
> wrote:
> >
> > > Thanks for drafting the FLIP and driving the discussion, Yangze.
> > > And Thanks for the feedback, Till and Chesnay.
> > >
> > > @Till,
> > >
> > > I agree that specifying requirements for SSGs means that SSGs need to
> be
> > > supported in fine-grained resource management, otherwise each operator
> > > might use as many resources as the whole group. However, I cannot think
> > of
> > > a strong reason for not supporting SSGs in fine-grained resource
> > > management.
> > >
> > >
> > > > Interestingly, if all operators have their resources properly
> > specified,
> > > > then slot sharing is no longer needed because Flink could slice off
> the
> > > > appropriately sized slots for every Task individually.
> > > >
> > >
> > > So for example, if we have a job consisting of two operator op_1 and
> op_2
> > > > where each op needs 100 MB of memory, we would then say that the slot
> > > > sharing group needs 200 MB of memory to run. If we have a cluster
> with
> > 2
> > > > TMs with one slot of 100 MB each, then the system cannot run this
> job.
> > If
> > > > the resources were specified on an operator level, then the system
> > could
> > > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > >
> > >
> > > Couldn't agree more that if all operators' requirements are properly
> > > specified, slot sharing should be no longer needed. I think this
> exactly
> > > disproves the example. If we already know op_1 and op_2 each needs 100
> MB
> > > of memory, why would we put them in the same group? If they are in
> > separate
> > > groups, with the proposed approach the system can freely deploy them to
> > > either a 200 MB TM or two 100 MB TMs.
> > >
> > > Moreover, the precondition for not needing slot sharing is having
> > resource
> > > requirements properly specified for all operators. This is not always
> > > possible, and usually requires tremendous efforts. One of the benefits
> > for
> > > SSG-based requirements is that it allows the user to freely decide the
> > > granularity, thus efforts they want to pay. I would consider SSG in
> > > fine-grained resource management as a group of operators that the user
> > > would like to specify the total resource for. There can be only one
> group
> > > in the job, 2~3 groups dividing the job into a few major parts, or as
> > many
> > > groups as the number of tasks/operators, depending on how fine-grained
> > the
> > > user is able to specify the resources.
> > >
> > > Having to support SSGs might be a constraint. But given that all the
> > > current scheduler implementations already support SSGs, I tend to think
> > > that as an acceptable price for the above discussed usability and
> > > flexibility.
> > >
> > > @Chesnay
> > >
> > > Will declaring them on slot sharing groups not also waste resources if
> > the
> > > > parallelism of operators within that group are different?
> > > >
> > > Yes. It's a trade-off between usability and resource utilization. To
> > avoid
> > > such wasting, the user can define more groups, so that each group
> > contains
> > > less operators and the chance of having operators with different
> > > parallelism will be reduced. The price is to have more resource
> > > requirements to specify.
> > >
> > > It also seems like quite a hassle for users having to recalculate the
> > > > resource requirements if they change the slot sharing.
> > > > I'd think that it's not really workable for users that create a set
> of
> > > > re-usable operators which are mixed and matched in their
> applications;
> > > > managing the resources requirements in such a setting would be a
> > > > nightmare, and in the end would require operator-level requirements
> any
> > > > way.
> > > > In that sense, I'm not even sure whether it really increases
> usability.
> > > >
> > >
> > >    - As mentioned in my reply to Till's comment, there's no reason to
> put
> > >    multiple operators whose individual resource requirements are
> already
> > > known
> > >    into the same group in fine-grained resource management.
> > >    - Even an operator implementation is reused for multiple
> applications,
> > >    it does not guarantee the same resource requirements. During our
> years
> > > of
> > >    practices in Alibaba, with per-operator requirements specified for
> > > Blink's
> > >    fine-grained resource management, very few users (including our
> > > specialists
> > >    who are dedicated to supporting Blink users) are as experienced as
> to
> > >    accurately predict/estimate the operator resource requirements. Most
> > > people
> > >    rely on the execution-time metrics (throughput, delay, cpu load,
> > memory
> > >    usage, GC pressure, etc.) to improve the specification.
> > >
> > > To sum up:
> > > If the user is capable of providing proper resource requirements for
> > every
> > > operator, that's definitely a good thing and we would not need to rely
> on
> > > the SSGs. However, that shouldn't be a *must* for the fine-grained
> > resource
> > > management to work. For those users who are capable and do not like
> > having
> > > to set each operator to a separate SSG, I would be ok to have both
> > > SSG-based and operator-based runtime interfaces and to only fallback to
> > the
> > > SSG requirements when the operator requirements are not specified.
> > However,
> > > as the first step, I think we should prioritise the use cases where
> users
> > > are not that experienced.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <[email protected]>
> > > wrote:
> > >
> > > > Will declaring them on slot sharing groups not also waste resources
> if
> > > > the parallelism of operators within that group are different?
> > > >
> > > > It also seems like quite a hassle for users having to recalculate the
> > > > resource requirements if they change the slot sharing.
> > > > I'd think that it's not really workable for users that create a set
> of
> > > > re-usable operators which are mixed and matched in their
> applications;
> > > > managing the resources requirements in such a setting would be a
> > > > nightmare, and in the end would require operator-level requirements
> any
> > > > way.
> > > > In that sense, I'm not even sure whether it really increases
> usability.
> > > >
> > > > My main worry is that it if we wire the runtime to work on SSGs it's
> > > > gonna be difficult to implement more fine-grained approaches, which
> > > > would not be the case if, for the runtime, they are always defined on
> > an
> > > > operator-level.
> > > >
> > > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > Thanks for drafting this FLIP and starting this discussion Yangze.
> > > > >
> > > > > I like that defining resource requirements on a slot sharing group
> > > makes
> > > > > the overall setup easier and improves usability of resource
> > > requirements.
> > > > >
> > > > > What I do not like about it is that it changes slot sharing groups
> > from
> > > > > being a scheduling hint to something which needs to be supported in
> > > order
> > > > > to support fine grained resource requirements. So far, the idea of
> > slot
> > > > > sharing groups was that it tells the system that a set of operators
> > can
> > > > be
> > > > > deployed in the same slot. But the system still had the freedom to
> > say
> > > > that
> > > > > it would rather place these tasks in different slots if it wanted.
> If
> > > we
> > > > > now specify resource requirements on a per slot sharing group, then
> > the
> > > > > only option for a scheduler which does not support slot sharing
> > groups
> > > is
> > > > > to say that every operator in this slot sharing group needs a slot
> > with
> > > > the
> > > > > same resources as the whole group.
> > > > >
> > > > > So for example, if we have a job consisting of two operator op_1
> and
> > > op_2
> > > > > where each op needs 100 MB of memory, we would then say that the
> slot
> > > > > sharing group needs 200 MB of memory to run. If we have a cluster
> > with
> > > 2
> > > > > TMs with one slot of 100 MB each, then the system cannot run this
> > job.
> > > If
> > > > > the resources were specified on an operator level, then the system
> > > could
> > > > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > > > >
> > > > > Originally, one of the primary goals of slot sharing groups was to
> > make
> > > > it
> > > > > easier for the user to reason about how many slots a job needs
> > > > independent
> > > > > of the actual number of operators in the job. Interestingly, if all
> > > > > operators have their resources properly specified, then slot
> sharing
> > is
> > > > no
> > > > > longer needed because Flink could slice off the appropriately sized
> > > slots
> > > > > for every Task individually. What matters is whether the whole
> > cluster
> > > > has
> > > > > enough resources to run all tasks or not.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <[email protected]>
> > wrote:
> > > > >
> > > > >> Hi, there,
> > > > >>
> > > > >> We would like to start a discussion thread on "FLIP-156: Runtime
> > > > >> Interfaces for Fine-Grained Resource Requirements"[1], where we
> > > > >> propose Slot Sharing Group (SSG) based runtime interfaces for
> > > > >> specifying fine-grained resource requirements.
> > > > >>
> > > > >> In this FLIP:
> > > > >> - Expound the user story of fine-grained resource management.
> > > > >> - Propose runtime interfaces for specifying SSG-based resource
> > > > >> requirements.
> > > > >> - Discuss the pros and cons of the three potential granularities
> for
> > > > >> specifying the resource requirements (op, task and slot sharing
> > group)
> > > > >> and explain why we choose the slot sharing group.
> > > > >>
> > > > >> Please find more details in the FLIP wiki document [1]. Looking
> > > > >> forward to your feedback.
> > > > >>
> > > > >> [1]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > >>
> > > > >> Best,
> > > > >> Yangze Guo
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Reply via email to