Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Xintong Song Thu, 21 Jan 2021 19:30:34 -0800

 FGRuntimeInterface.png
<https://drive.google.com/file/d/13nYCLBd1HjdfYVWUjxzdNa8wo5n2GKbi/view?usp=drive_web>


Thank you~

Xintong Song



On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <tonysong...@gmail.com> wrote:

> I think Chesnay's proposal could actually work. IIUC, the keypoint is to
> derive operator requirements from SSG requirements on the API side, so that
> the runtime only deals with operator requirements. It's debatable how the
> deriving should be done though. E.g., an alternative could be to evenly
> divide the SSG requirement into requirements of operators in the group.
>
>
> However, I'm not entirely sure which option is more desired. Illustrating
> my understanding in the following figure, in which on the top is
> Chesnay's proposal and on the bottom is the SSG-based proposal in this FLIP.
>
>
> [image: FGRuntimeInterface.png]
>
>
> I think the major difference between the two approaches is where deriving
> operator requirements from SSG requirements happens.
>
> - Chesnay's proposal simplifies the runtime logic and the interface to
> expose, at the price of moving more complexity (i.e. the deriving) to the
> API side. The question is, where do we prefer to keep the complexity? I'm
> slightly leaning towards having a thin API and keep the complexity in
> runtime if possible.
>
> - Notice that the dash line arrows represent optional steps that are
> needed only for schedulers that do not respect SSGs, which we don't have at
> the moment. If we only look at the solid line arrows, then the SSG-based
> approach is much simpler, without needing to derive and aggregate the
> requirements back and forth. I'm not sure about complicating the current
> design only for the potential future needs.
>
>
> Thank you~
>
> Xintong Song
>
>
>
>
> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> You're raising a good point, but I think I can rectify that with a minor
>> adjustment.
>>
>> Default requirements are whatever the default requirements are, setting
>> the requirements for one operator has no effect on other operators.
>>
>> With these rules, and some API enhancements, the following mockup would
>> replicate the SSG-based behavior:
>>
>> Map<SlotSharingGroupId, Requirements> requirements = ...
>> for slotSharingGroup in env.getSlotSharingGroups() {
>>      vertices = slotSharingGroup.getVertices()
>>
>> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
>> vertices.remainint().setRequirements(ZERO)
>> }
>>
>> We could even allow setting requirements on slotsharing-groups
>> colocation-groups and internally translate them accordingly.
>> I can't help but feel this is a plain API issue.
>>
>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
>> > If I understand you correctly Chesnay, then you want to decouple the
>> > resource requirement specification from the slot sharing group
>> > assignment. Hence, per default all operators would be in the same slot
>> > sharing group. If there is no operator with a resource specification,
>> > then the system would allocate a default slot for it. If there is at
>> > least one operator, then the system would sum up all the specified
>> > resources and allocate a slot of this size. This effectively means
>> > that all unspecified operators will implicitly have a zero resource
>> > requirement. Did I understand your idea correctly?
>> >
>> > I am wondering whether this wouldn't lead to a surprising behaviour
>> > for the user. If the user specifies the resource requirements for a
>> > single operator, then he probably will assume that the other operators
>> > will get the default share of resources and not nothing.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <ches...@apache.org
>> > <mailto:ches...@apache.org>> wrote:
>> >
>> >     Is there even a functional difference between specifying the
>> >     requirements for an SSG vs specifying the same requirements on a
>> >     single
>> >     operator within that group (ideally a colocation group to avoid this
>> >     whole hint business)?
>> >
>> >     Wouldn't we get the best of both worlds in the latter case?
>> >
>> >     Users can take shortcuts to define shared requirements,
>> >     but refine them further as needed on a per-operator basis,
>> >     without changing semantics of slotsharing groups
>> >     nor the runtime being locked into SSG-based requirements.
>> >
>> >     (And before anyone argues what happens if slotsharing groups
>> >     change or
>> >     whatnot, that's a plain API issue that we could surely solve. (A
>> >     plain
>> >     iteration over slotsharing groups and therein contained operators
>> >     would
>> >     suffice)).
>> >
>> >     On 1/20/2021 6:48 PM, Till Rohrmann wrote:
>> >     > Maybe a different minor idea: Would it be possible to treat the
>> SSG
>> >     > resource requirements as a hint for the runtime similar to how
>> >     slot sharing
>> >     > groups are designed at the moment? Meaning that we don't give
>> >     the guarantee
>> >     > that Flink will always deploy this set of tasks together no
>> >     matter what
>> >     > comes. If, for example, the runtime can derive by some means the
>> >     resource
>> >     > requirements for each task based on the requirements for the
>> >     SSG, this
>> >     > could be possible. One easy strategy would be to give every task
>> >     the same
>> >     > resources as the whole slot sharing group. Another one could be
>> >     > distributing the resources equally among the tasks. This does
>> >     not even have
>> >     > to be implemented but we would give ourselves the freedom to
>> change
>> >     > scheduling if need should arise.
>> >     >
>> >     > Cheers,
>> >     > Till
>> >     >
>> >     > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <karma...@gmail.com
>> >     <mailto:karma...@gmail.com>> wrote:
>> >     >
>> >     >> Thanks for the responses, Till and Xintong.
>> >     >>
>> >     >> I second Xintong's comment that SSG-based runtime interface
>> >     will give
>> >     >> us the flexibility to achieve op/task-based approach. That's one
>> of
>> >     >> the most important reasons for our design choice.
>> >     >>
>> >     >> Some cents regarding the default operator resource:
>> >     >> - It might be good for the scenario of DataStream jobs.
>> >     >>     ** For light-weight operators, the accumulative
>> >     configuration error
>> >     >> will not be significant. Then, the resource of a task used is
>> >     >> proportional to the number of operators it contains.
>> >     >>     ** For heavy operators like join and window or operators
>> >     using the
>> >     >> external resources, user will turn to the fine-grained resource
>> >     >> configuration.
>> >     >> - It can increase the stability for the standalone cluster
>> >     where task
>> >     >> executors registered are heterogeneous(with different default
>> slot
>> >     >> resources).
>> >     >> - It might not be good for SQL users. The operators that SQL
>> >     will be
>> >     >> transferred to is a black box to the user. We also do not
>> guarantee
>> >     >> the cross-version of consistency of the transformation so far.
>> >     >>
>> >     >> I think it can be treated as a follow-up work when the
>> fine-grained
>> >     >> resource management is end-to-end ready.
>> >     >>
>> >     >> Best,
>> >     >> Yangze Guo
>> >     >>
>> >     >>
>> >     >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
>> >     <tonysong...@gmail.com <mailto:tonysong...@gmail.com>>
>> >     >> wrote:
>> >     >>> Thanks for the feedback, Till.
>> >     >>>
>> >     >>> ## I feel that what you proposed (operator-based + default
>> >     value) might
>> >     >> be
>> >     >>> subsumed by the SSG-based approach.
>> >     >>> Thinking of op_1 -> op_2, there are the following 4 cases,
>> >     categorized by
>> >     >>> whether the resource requirements are known to the users.
>> >     >>>
>> >     >>>     1. *Both known.* As previously mentioned, there's no
>> >     reason to put
>> >     >>>     multiple operators whose individual resource requirements
>> >     are already
>> >     >> known
>> >     >>>     into the same group in fine-grained resource management.
>> >     And if op_1
>> >     >> and
>> >     >>>     op_2 are in different groups, there should be no problem
>> >     switching
>> >     >> data
>> >     >>>     exchange mode from pipelined to blocking. This is
>> >     equivalent to
>> >     >> specifying
>> >     >>>     operator resource requirements in your proposal.
>> >     >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that
>> >     op_2 is in a
>> >     >>>     SSG whose resource is not specified thus would have the
>> >     default slot
>> >     >>>     resource. This is equivalent to having default operator
>> >     resources in
>> >     >> your
>> >     >>>     proposal.
>> >     >>>     3. *Both unknown*. The user can either set op_1 and op_2
>> >     to the same
>> >     >> SSG
>> >     >>>     or separate SSGs.
>> >     >>>        - If op_1 and op_2 are in the same SSG, it will be
>> >     equivalent to
>> >     >> the
>> >     >>>        coarse-grained resource management, where op_1 and op_2
>> >     share a
>> >     >> default
>> >     >>>        size slot no matter which data exchange mode is used.
>> >     >>>        - If op_1 and op_2 are in different SSGs, then each of
>> >     them will
>> >     >> use
>> >     >>>        a default size slot. This is equivalent to setting them
>> >     with
>> >     >> default
>> >     >>>        operator resources in your proposal.
>> >     >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2 is
>> >     known.*
>> >     >>>        - It is possible that the user learns the total / max
>> >     resource
>> >     >>>        requirement from executing and monitoring the job,
>> >     while not
>> >     >>> being aware of
>> >     >>>        individual operator requirements.
>> >     >>>        - I believe this is the case your proposal does not
>> >     cover. And TBH,
>> >     >>>        this is probably how most users learn the resource
>> >     requirements,
>> >     >>> according
>> >     >>>        to my experiences.
>> >     >>>        - In this case, the user might need to specify
>> >     different resources
>> >     >> if
>> >     >>>        he wants to switch the execution mode, which should not
>> >     be worse
>> >     >> than not
>> >     >>>        being able to use fine-grained resource management.
>> >     >>>
>> >     >>>
>> >     >>> ## An additional idea inspired by your proposal.
>> >     >>> We may provide multiple options for deciding resources for
>> >     SSGs whose
>> >     >>> requirement is not specified, if needed.
>> >     >>>
>> >     >>>     - Default slot resource (current design)
>> >     >>>     - Default operator resource times number of operators
>> >     (equivalent to
>> >     >>>     your proposal)
>> >     >>>
>> >     >>>
>> >     >>> ## Exposing internal runtime strategies
>> >     >>> Theoretically, yes. Tying to the SSGs, the resource
>> >     requirements might be
>> >     >>> affected if how SSGs are internally handled changes in future.
>> >     >> Practically,
>> >     >>> I do not concretely see at the moment what kind of changes we
>> >     may want in
>> >     >>> future that might conflict with this FLIP proposal, as the
>> >     question of
>> >     >>> switching data exchange mode answered above. I'd suggest to
>> >     not give up
>> >     >> the
>> >     >>> user friendliness we may gain now for the future problems that
>> >     may or may
>> >     >>> not exist.
>> >     >>>
>> >     >>> Moreover, the SSG-based approach has the flexibility to
>> >     achieve the
>> >     >>> equivalent behavior as the operator-based approach, if we set
>> each
>> >     >> operator
>> >     >>> (or task) to a separate SSG. We can even provide a shortcut
>> >     option to
>> >     >>> automatically do that for users, if needed.
>> >     >>>
>> >     >>>
>> >     >>> Thank you~
>> >     >>>
>> >     >>> Xintong Song
>> >     >>>
>> >     >>>
>> >     >>>
>> >     >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
>> >     <trohrm...@apache.org <mailto:trohrm...@apache.org>>
>> >     >> wrote:
>> >     >>>> Thanks for the responses Xintong and Stephan,
>> >     >>>>
>> >     >>>> I agree that being able to define the resource requirements
>> for a
>> >     >> group of
>> >     >>>> operators is more user friendly. However, my concern is that
>> >     we are
>> >     >>>> exposing thereby internal runtime strategies which might
>> >     limit our
>> >     >>>> flexibility to execute a given job. Moreover, the semantics of
>> >     >> configuring
>> >     >>>> resource requirements for SSGs could break if switching from
>> >     streaming
>> >     >> to
>> >     >>>> batch execution. If one defines the resource requirements for
>> >     op_1 ->
>> >     >> op_2
>> >     >>>> which run in pipelined mode when using the streaming
>> >     execution, then
>> >     >> how do
>> >     >>>> we interpret these requirements when op_1 -> op_2 are
>> >     executed with a
>> >     >>>> blocking data exchange in batch execution mode? Consequently,
>> >     I am
>> >     >> still
>> >     >>>> leaning towards Stephan's proposal to set the resource
>> >     requirements per
>> >     >>>> operator.
>> >     >>>>
>> >     >>>> Maybe the following proposal makes the configuration easier:
>> >     If the
>> >     >> user
>> >     >>>> wants to use fine-grained resource requirements, then she
>> >     needs to
>> >     >> specify
>> >     >>>> the default size which is used for operators which have no
>> >     explicit
>> >     >>>> resource annotation. If this holds true, then every operator
>> >     would
>> >     >> have a
>> >     >>>> resource requirement and the system can try to execute the
>> >     operators
>> >     >> in the
>> >     >>>> best possible manner w/o being constrained by how the user
>> >     set the SSG
>> >     >>>> requirements.
>> >     >>>>
>> >     >>>> Cheers,
>> >     >>>> Till
>> >     >>>>
>> >     >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
>> >     <tonysong...@gmail.com <mailto:tonysong...@gmail.com>>
>> >     >>>> wrote:
>> >     >>>>
>> >     >>>>> Thanks for the feedback, Stephan.
>> >     >>>>>
>> >     >>>>> Actually, your proposal has also come to my mind at some
>> >     point. And I
>> >     >>>> have
>> >     >>>>> some concerns about it.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> 1. It does not give users the same control as the SSG-based
>> >     approach.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> While both approaches do not require specifying for each
>> >     operator,
>> >     >>>>> SSG-based approach supports the semantic that "some operators
>> >     >> together
>> >     >>>> use
>> >     >>>>> this much resource" while the operator-based approach doesn't.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
>> >     o_m), and
>> >     >> at
>> >     >>>> some
>> >     >>>>> point there's an agg o_n (1 < n < m) which significantly
>> >     reduces the
>> >     >> data
>> >     >>>>> amount. One can separate the pipeline into 2 groups SSG_1
>> >     (o_1, ...,
>> >     >> o_n)
>> >     >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
>> >     >> parallelisms
>> >     >>>>> for operators in SSG_1 than for operators in SSG_2 won't
>> >     lead to too
>> >     >> much
>> >     >>>>> wasting of resources. If the two SSGs end up needing different
>> >     >> resources,
>> >     >>>>> with the SSG-based approach one can directly specify
>> >     resources for
>> >     >> the
>> >     >>>> two
>> >     >>>>> groups. However, with the operator-based approach, the user
>> will
>> >     >> have to
>> >     >>>>> specify resources for each operator in one of the two
>> >     groups, and
>> >     >> tune
>> >     >>>> the
>> >     >>>>> default slot resource via configurations to fit the other
>> group.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> 2. It increases the chance of breaking operator chains.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Setting chainnable operators into different slot sharing
>> >     groups will
>> >     >>>>> prevent them from being chained. In the current
>> implementation,
>> >     >>>> downstream
>> >     >>>>> operators, if SSG not explicitly specified, will be set to
>> >     the same
>> >     >> group
>> >     >>>>> as the chainable upstream operators (unless multiple upstream
>> >     >> operators
>> >     >>>> in
>> >     >>>>> different groups), to reduce the chance of breaking chains.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
>> >     deciding
>> >     >> SSGs
>> >     >>>>> based on whether resource is specified we will easily get
>> >     groups like
>> >     >>>> (o_1,
>> >     >>>>> o_3) & (o_2, o_4), where none of the operators can be
>> >     chained. This
>> >     >> is
>> >     >>>> also
>> >     >>>>> possible for the SSG-based approach, but I believe the
>> >     chance is much
>> >     >>>>> smaller because there's no strong reason for users to
>> >     specify the
>> >     >> groups
>> >     >>>>> with alternate operators like that. We are more likely to
>> >     get groups
>> >     >> like
>> >     >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between
>> >     o_2 and
>> >     >> o_3.
>> >     >>>>>
>> >     >>>>> 3. It complicates the system by having two different
>> >     mechanisms for
>> >     >>>> sharing
>> >     >>>>> managed memory in  a slot.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> - In FLIP-141, we introduced the intra-slot managed memory
>> >     sharing
>> >     >>>>> mechanism, where managed memory is first distributed
>> >     according to the
>> >     >>>>> consumer type, then further distributed across operators of
>> that
>> >     >> consumer
>> >     >>>>> type.
>> >     >>>>>
>> >     >>>>> - With the operator-based approach, managed memory size
>> >     specified
>> >     >> for an
>> >     >>>>> operator should account for all the consumer types of that
>> >     operator.
>> >     >> That
>> >     >>>>> means the managed memory is first distributed across
>> >     operators, then
>> >     >>>>> distributed to different consumer types of each operator.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Unfortunately, the different order of the two calculation
>> >     steps can
>> >     >> lead
>> >     >>>> to
>> >     >>>>> different results. To be specific, the semantic of the
>> >     configuration
>> >     >>>> option
>> >     >>>>> `consumer-weights` changed (within a slot vs. within an
>> >     operator).
>> >     >>>>>
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> To sum up things:
>> >     >>>>>
>> >     >>>>> While (3) might be a bit more implementation related, I
>> >     think (1)
>> >     >> and (2)
>> >     >>>>> somehow suggest that, the price for the proposed approach to
>> >     avoid
>> >     >>>>> specifying resource for every operator is that it's not as
>> >     >> independent
>> >     >>>> from
>> >     >>>>> operator chaining and slot sharing as the operator-based
>> >     approach
>> >     >>>> discussed
>> >     >>>>> in the FLIP.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Thank you~
>> >     >>>>>
>> >     >>>>> Xintong Song
>> >     >>>>>
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
>> >     <se...@apache.org <mailto:se...@apache.org>>
>> >     >> wrote:
>> >     >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
>> >     >>>>>>
>> >     >>>>>> I want to say, first of all, that this is super well
>> >     written. And
>> >     >> the
>> >     >>>>>> points that the FLIP makes about how to expose the
>> >     configuration to
>> >     >>>> users
>> >     >>>>>> is exactly the right thing to figure out first.
>> >     >>>>>> So good job here!
>> >     >>>>>>
>> >     >>>>>> About how to let users specify the resource profiles. If I
>> >     can sum
>> >     >> the
>> >     >>>>> FLIP
>> >     >>>>>> and previous discussion up in my own words, the problem is
>> the
>> >     >>>> following:
>> >     >>>>>> Operator-level specification is the simplest and cleanest
>> >     approach,
>> >     >>>>> because
>> >     >>>>>>> it avoids mixing operator configuration (resource) and
>> >     >> scheduling. No
>> >     >>>>>>> matter what other parameters change (chaining, slot sharing,
>> >     >>>> switching
>> >     >>>>>>> pipelined and blocking shuffles), the resource profiles
>> >     stay the
>> >     >>>> same.
>> >     >>>>>>> But it would require that a user specifies resources on all
>> >     >>>> operators,
>> >     >>>>>>> which makes it hard to use. That's why the FLIP suggests
>> going
>> >     >> with
>> >     >>>>>>> specifying resources on a Sharing-Group.
>> >     >>>>>>
>> >     >>>>>> I think both thoughts are important, so can we find a
>> solution
>> >     >> where
>> >     >>>> the
>> >     >>>>>> Resource Profiles are specified on an Operator, but we
>> >     still avoid
>> >     >> that
>> >     >>>>> we
>> >     >>>>>> need to specify a resource profile on every operator?
>> >     >>>>>>
>> >     >>>>>> What do you think about something like the following:
>> >     >>>>>>    - Resource Profiles are specified on an operator level.
>> >     >>>>>>    - Not all operators need profiles
>> >     >>>>>>    - All Operators without a Resource Profile ended up in the
>> >     >> default
>> >     >>>> slot
>> >     >>>>>> sharing group with a default profile (will get a default
>> slot).
>> >     >>>>>>    - All Operators with a Resource Profile will go into
>> >     another slot
>> >     >>>>> sharing
>> >     >>>>>> group (the resource-specified-group).
>> >     >>>>>>    - Users can define different slot sharing groups for
>> >     operators
>> >     >> like
>> >     >>>>> they
>> >     >>>>>> do now, with the exception that you cannot mix operators
>> >     that have
>> >     >> a
>> >     >>>>>> resource profile and operators that have no resource profile.
>> >     >>>>>>    - The default case where no operator has a resource
>> >     profile is
>> >     >> just a
>> >     >>>>>> special case of this model
>> >     >>>>>>    - The chaining logic sums up the profiles per operator,
>> >     like it
>> >     >> does
>> >     >>>>> now,
>> >     >>>>>> and the scheduler sums up the profiles of the tasks that it
>> >     >> schedules
>> >     >>>>>> together.
>> >     >>>>>>
>> >     >>>>>>
>> >     >>>>>> There is another question about reactive scaling raised in
>> the
>> >     >> FLIP. I
>> >     >>>>> need
>> >     >>>>>> to think a bit about that. That is indeed a bit more tricky
>> >     once we
>> >     >>>> have
>> >     >>>>>> slots of different sizes.
>> >     >>>>>> It is not clear then which of the different slot requests the
>> >     >>>>>> ResourceManager should fulfill when new resources (TMs)
>> >     show up,
>> >     >> or how
>> >     >>>>> the
>> >     >>>>>> JobManager redistributes the slots resources when resources
>> >     (TMs)
>> >     >>>>> disappear
>> >     >>>>>> This question is pretty orthogonal, though, to the "how to
>> >     specify
>> >     >> the
>> >     >>>>>> resources".
>> >     >>>>>>
>> >     >>>>>>
>> >     >>>>>> Best,
>> >     >>>>>> Stephan
>> >     >>>>>>
>> >     >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
>> >     <tonysong...@gmail.com <mailto:tonysong...@gmail.com>
>> >     >>>>> wrote:
>> >     >>>>>>> Thanks for drafting the FLIP and driving the discussion,
>> >     Yangze.
>> >     >>>>>>> And Thanks for the feedback, Till and Chesnay.
>> >     >>>>>>>
>> >     >>>>>>> @Till,
>> >     >>>>>>>
>> >     >>>>>>> I agree that specifying requirements for SSGs means that
>> SSGs
>> >     >> need to
>> >     >>>>> be
>> >     >>>>>>> supported in fine-grained resource management, otherwise
>> each
>> >     >>>> operator
>> >     >>>>>>> might use as many resources as the whole group. However, I
>> >     cannot
>> >     >>>> think
>> >     >>>>>> of
>> >     >>>>>>> a strong reason for not supporting SSGs in fine-grained
>> >     resource
>> >     >>>>>>> management.
>> >     >>>>>>>
>> >     >>>>>>>
>> >     >>>>>>>> Interestingly, if all operators have their resources
>> properly
>> >     >>>>>> specified,
>> >     >>>>>>>> then slot sharing is no longer needed because Flink could
>> >     >> slice off
>> >     >>>>> the
>> >     >>>>>>>> appropriately sized slots for every Task individually.
>> >     >>>>>>>>
>> >     >>>>>>> So for example, if we have a job consisting of two
>> >     operator op_1
>> >     >> and
>> >     >>>>> op_2
>> >     >>>>>>>> where each op needs 100 MB of memory, we would then say
>> that
>> >     >> the
>> >     >>>> slot
>> >     >>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>> >     >> cluster
>> >     >>>>> with
>> >     >>>>>> 2
>> >     >>>>>>>> TMs with one slot of 100 MB each, then the system cannot
>> run
>> >     >> this
>> >     >>>>> job.
>> >     >>>>>> If
>> >     >>>>>>>> the resources were specified on an operator level, then the
>> >     >> system
>> >     >>>>>> could
>> >     >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>> >     >> TM_2.
>> >     >>>>>>>
>> >     >>>>>>> Couldn't agree more that if all operators' requirements are
>> >     >> properly
>> >     >>>>>>> specified, slot sharing should be no longer needed. I
>> >     think this
>> >     >>>>> exactly
>> >     >>>>>>> disproves the example. If we already know op_1 and op_2 each
>> >     >> needs
>> >     >>>> 100
>> >     >>>>> MB
>> >     >>>>>>> of memory, why would we put them in the same group? If
>> >     they are
>> >     >> in
>> >     >>>>>> separate
>> >     >>>>>>> groups, with the proposed approach the system can freely
>> >     deploy
>> >     >> them
>> >     >>>> to
>> >     >>>>>>> either a 200 MB TM or two 100 MB TMs.
>> >     >>>>>>>
>> >     >>>>>>> Moreover, the precondition for not needing slot sharing is
>> >     having
>> >     >>>>>> resource
>> >     >>>>>>> requirements properly specified for all operators. This is
>> not
>> >     >> always
>> >     >>>>>>> possible, and usually requires tremendous efforts. One of
>> the
>> >     >>>> benefits
>> >     >>>>>> for
>> >     >>>>>>> SSG-based requirements is that it allows the user to freely
>> >     >> decide
>> >     >>>> the
>> >     >>>>>>> granularity, thus efforts they want to pay. I would
>> >     consider SSG
>> >     >> in
>> >     >>>>>>> fine-grained resource management as a group of operators
>> >     that the
>> >     >>>> user
>> >     >>>>>>> would like to specify the total resource for. There can be
>> >     only
>> >     >> one
>> >     >>>>> group
>> >     >>>>>>> in the job, 2~3 groups dividing the job into a few major
>> >     parts,
>> >     >> or as
>> >     >>>>>> many
>> >     >>>>>>> groups as the number of tasks/operators, depending on how
>> >     >>>> fine-grained
>> >     >>>>>> the
>> >     >>>>>>> user is able to specify the resources.
>> >     >>>>>>>
>> >     >>>>>>> Having to support SSGs might be a constraint. But given
>> >     that all
>> >     >> the
>> >     >>>>>>> current scheduler implementations already support SSGs, I
>> >     tend to
>> >     >>>> think
>> >     >>>>>>> that as an acceptable price for the above discussed
>> >     usability and
>> >     >>>>>>> flexibility.
>> >     >>>>>>>
>> >     >>>>>>> @Chesnay
>> >     >>>>>>>
>> >     >>>>>>> Will declaring them on slot sharing groups not also waste
>> >     >> resources
>> >     >>>> if
>> >     >>>>>> the
>> >     >>>>>>>> parallelism of operators within that group are different?
>> >     >>>>>>>>
>> >     >>>>>>> Yes. It's a trade-off between usability and resource
>> >     >> utilization. To
>> >     >>>>>> avoid
>> >     >>>>>>> such wasting, the user can define more groups, so that
>> >     each group
>> >     >>>>>> contains
>> >     >>>>>>> less operators and the chance of having operators with
>> >     different
>> >     >>>>>>> parallelism will be reduced. The price is to have more
>> >     resource
>> >     >>>>>>> requirements to specify.
>> >     >>>>>>>
>> >     >>>>>>> It also seems like quite a hassle for users having to
>> >     >> recalculate the
>> >     >>>>>>>> resource requirements if they change the slot sharing.
>> >     >>>>>>>> I'd think that it's not really workable for users that
>> create
>> >     >> a set
>> >     >>>>> of
>> >     >>>>>>>> re-usable operators which are mixed and matched in their
>> >     >>>>> applications;
>> >     >>>>>>>> managing the resources requirements in such a setting
>> >     would be
>> >     >> a
>> >     >>>>>>>> nightmare, and in the end would require operator-level
>> >     >> requirements
>> >     >>>>> any
>> >     >>>>>>>> way.
>> >     >>>>>>>> In that sense, I'm not even sure whether it really
>> increases
>> >     >>>>> usability.
>> >     >>>>>>>     - As mentioned in my reply to Till's comment, there's no
>> >     >> reason to
>> >     >>>>> put
>> >     >>>>>>>     multiple operators whose individual resource
>> >     requirements are
>> >     >>>>> already
>> >     >>>>>>> known
>> >     >>>>>>>     into the same group in fine-grained resource management.
>> >     >>>>>>>     - Even an operator implementation is reused for multiple
>> >     >>>>> applications,
>> >     >>>>>>>     it does not guarantee the same resource requirements.
>> >     During
>> >     >> our
>> >     >>>>> years
>> >     >>>>>>> of
>> >     >>>>>>>     practices in Alibaba, with per-operator requirements
>> >     >> specified for
>> >     >>>>>>> Blink's
>> >     >>>>>>>     fine-grained resource management, very few users
>> >     (including
>> >     >> our
>> >     >>>>>>> specialists
>> >     >>>>>>>     who are dedicated to supporting Blink users) are as
>> >     >> experienced as
>> >     >>>>> to
>> >     >>>>>>>     accurately predict/estimate the operator resource
>> >     >> requirements.
>> >     >>>> Most
>> >     >>>>>>> people
>> >     >>>>>>>     rely on the execution-time metrics (throughput, delay,
>> cpu
>> >     >> load,
>> >     >>>>>> memory
>> >     >>>>>>>     usage, GC pressure, etc.) to improve the specification.
>> >     >>>>>>>
>> >     >>>>>>> To sum up:
>> >     >>>>>>> If the user is capable of providing proper resource
>> >     requirements
>> >     >> for
>> >     >>>>>> every
>> >     >>>>>>> operator, that's definitely a good thing and we would not
>> >     need to
>> >     >>>> rely
>> >     >>>>> on
>> >     >>>>>>> the SSGs. However, that shouldn't be a *must* for the
>> >     >> fine-grained
>> >     >>>>>> resource
>> >     >>>>>>> management to work. For those users who are capable and do
>> not
>> >     >> like
>> >     >>>>>> having
>> >     >>>>>>> to set each operator to a separate SSG, I would be ok to
>> have
>> >     >> both
>> >     >>>>>>> SSG-based and operator-based runtime interfaces and to only
>> >     >> fallback
>> >     >>>> to
>> >     >>>>>> the
>> >     >>>>>>> SSG requirements when the operator requirements are not
>> >     >> specified.
>> >     >>>>>> However,
>> >     >>>>>>> as the first step, I think we should prioritise the use
>> cases
>> >     >> where
>> >     >>>>> users
>> >     >>>>>>> are not that experienced.
>> >     >>>>>>>
>> >     >>>>>>> Thank you~
>> >     >>>>>>>
>> >     >>>>>>> Xintong Song
>> >     >>>>>>>
>> >     >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
>> >     >> ches...@apache.org <mailto:ches...@apache.org>>
>> >     >>>>>>> wrote:
>> >     >>>>>>>
>> >     >>>>>>>> Will declaring them on slot sharing groups not also waste
>> >     >> resources
>> >     >>>>> if
>> >     >>>>>>>> the parallelism of operators within that group are
>> different?
>> >     >>>>>>>>
>> >     >>>>>>>> It also seems like quite a hassle for users having to
>> >     >> recalculate
>> >     >>>> the
>> >     >>>>>>>> resource requirements if they change the slot sharing.
>> >     >>>>>>>> I'd think that it's not really workable for users that
>> create
>> >     >> a set
>> >     >>>>> of
>> >     >>>>>>>> re-usable operators which are mixed and matched in their
>> >     >>>>> applications;
>> >     >>>>>>>> managing the resources requirements in such a setting
>> >     would be
>> >     >> a
>> >     >>>>>>>> nightmare, and in the end would require operator-level
>> >     >> requirements
>> >     >>>>> any
>> >     >>>>>>>> way.
>> >     >>>>>>>> In that sense, I'm not even sure whether it really
>> increases
>> >     >>>>> usability.
>> >     >>>>>>>> My main worry is that it if we wire the runtime to work
>> >     on SSGs
>> >     >>>> it's
>> >     >>>>>>>> gonna be difficult to implement more fine-grained
>> approaches,
>> >     >> which
>> >     >>>>>>>> would not be the case if, for the runtime, they are always
>> >     >> defined
>> >     >>>> on
>> >     >>>>>> an
>> >     >>>>>>>> operator-level.
>> >     >>>>>>>>
>> >     >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
>> >     >>>>>>>>> Thanks for drafting this FLIP and starting this discussion
>> >     >>>> Yangze.
>> >     >>>>>>>>> I like that defining resource requirements on a slot
>> sharing
>> >     >>>> group
>> >     >>>>>>> makes
>> >     >>>>>>>>> the overall setup easier and improves usability of
>> resource
>> >     >>>>>>> requirements.
>> >     >>>>>>>>> What I do not like about it is that it changes slot
>> sharing
>> >     >>>> groups
>> >     >>>>>> from
>> >     >>>>>>>>> being a scheduling hint to something which needs to be
>> >     >> supported
>> >     >>>> in
>> >     >>>>>>> order
>> >     >>>>>>>>> to support fine grained resource requirements. So far, the
>> >     >> idea
>> >     >>>> of
>> >     >>>>>> slot
>> >     >>>>>>>>> sharing groups was that it tells the system that a set of
>> >     >>>> operators
>> >     >>>>>> can
>> >     >>>>>>>> be
>> >     >>>>>>>>> deployed in the same slot. But the system still had the
>> >     >> freedom
>> >     >>>> to
>> >     >>>>>> say
>> >     >>>>>>>> that
>> >     >>>>>>>>> it would rather place these tasks in different slots if it
>> >     >>>> wanted.
>> >     >>>>> If
>> >     >>>>>>> we
>> >     >>>>>>>>> now specify resource requirements on a per slot sharing
>> >     >> group,
>> >     >>>> then
>> >     >>>>>> the
>> >     >>>>>>>>> only option for a scheduler which does not support slot
>> >     >> sharing
>> >     >>>>>> groups
>> >     >>>>>>> is
>> >     >>>>>>>>> to say that every operator in this slot sharing group
>> >     needs a
>> >     >>>> slot
>> >     >>>>>> with
>> >     >>>>>>>> the
>> >     >>>>>>>>> same resources as the whole group.
>> >     >>>>>>>>>
>> >     >>>>>>>>> So for example, if we have a job consisting of two
>> operator
>> >     >> op_1
>> >     >>>>> and
>> >     >>>>>>> op_2
>> >     >>>>>>>>> where each op needs 100 MB of memory, we would then say
>> that
>> >     >> the
>> >     >>>>> slot
>> >     >>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>> >     >> cluster
>> >     >>>>>> with
>> >     >>>>>>> 2
>> >     >>>>>>>>> TMs with one slot of 100 MB each, then the system cannot
>> run
>> >     >> this
>> >     >>>>>> job.
>> >     >>>>>>> If
>> >     >>>>>>>>> the resources were specified on an operator level, then
>> the
>> >     >>>> system
>> >     >>>>>>> could
>> >     >>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>> >     >> TM_2.
>> >     >>>>>>>>> Originally, one of the primary goals of slot sharing
>> groups
>> >     >> was
>> >     >>>> to
>> >     >>>>>> make
>> >     >>>>>>>> it
>> >     >>>>>>>>> easier for the user to reason about how many slots a job
>> >     >> needs
>> >     >>>>>>>> independent
>> >     >>>>>>>>> of the actual number of operators in the job.
>> Interestingly,
>> >     >> if
>> >     >>>> all
>> >     >>>>>>>>> operators have their resources properly specified, then
>> slot
>> >     >>>>> sharing
>> >     >>>>>> is
>> >     >>>>>>>> no
>> >     >>>>>>>>> longer needed because Flink could slice off the
>> >     appropriately
>> >     >>>> sized
>> >     >>>>>>> slots
>> >     >>>>>>>>> for every Task individually. What matters is whether the
>> >     >> whole
>> >     >>>>>> cluster
>> >     >>>>>>>> has
>> >     >>>>>>>>> enough resources to run all tasks or not.
>> >     >>>>>>>>>
>> >     >>>>>>>>> Cheers,
>> >     >>>>>>>>> Till
>> >     >>>>>>>>>
>> >     >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
>> >     >> karma...@gmail.com <mailto:karma...@gmail.com>>
>> >     >>>>>> wrote:
>> >     >>>>>>>>>> Hi, there,
>> >     >>>>>>>>>>
>> >     >>>>>>>>>> We would like to start a discussion thread on "FLIP-156:
>> >     >> Runtime
>> >     >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
>> >     >> where we
>> >     >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces
>> >     >> for
>> >     >>>>>>>>>> specifying fine-grained resource requirements.
>> >     >>>>>>>>>>
>> >     >>>>>>>>>> In this FLIP:
>> >     >>>>>>>>>> - Expound the user story of fine-grained resource
>> >     >> management.
>> >     >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
>> >     >> resource
>> >     >>>>>>>>>> requirements.
>> >     >>>>>>>>>> - Discuss the pros and cons of the three potential
>> >     >> granularities
>> >     >>>>> for
>> >     >>>>>>>>>> specifying the resource requirements (op, task and slot
>> >     >> sharing
>> >     >>>>>> group)
>> >     >>>>>>>>>> and explain why we choose the slot sharing group.
>> >     >>>>>>>>>>
>> >     >>>>>>>>>> Please find more details in the FLIP wiki document [1].
>> >     >> Looking
>> >     >>>>>>>>>> forward to your feedback.
>> >     >>>>>>>>>>
>> >     >>>>>>>>>> [1]
>> >     >>>>>>>>>>
>> >     >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>> >     <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>> >
>> >     >>>>>>>>>> Best,
>> >     >>>>>>>>>> Yangze Guo
>> >     >>>>>>>>>>
>> >     >>>>>>>>
>> >
>>
>>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Reply via email to