Thanks everyone for the lively discussion. I'd like to try to
summarize the current convergence in the discussion. Please let me
know if I got things wrong or missed something crucial here.
Change of this FLIP:
- Treat the SSG resource requirements as a hint instead of a
restriction for the runtime. That's should be explicitly explained in
the JavaDocs.
Potential follow-up issues if needed:
- Provide operator-level resource configuration interface.
- Provide multiple options for deciding resources for SSGs whose
requirement is not specified:
** Default slot resource.
** Default operator resource times number of operators.
If there are no other issues, I'll update the FLIP accordingly and
start a vote thread. Thanks all for the valuable feedback again.
Best,
Yangze Guo
Best,
Yangze Guo
On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <[email protected]> wrote:
>
>
> FGRuntimeInterface.png
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <[email protected]> wrote:
>>
>> I think Chesnay's proposal could actually work. IIUC, the keypoint is to
>> derive operator requirements from SSG requirements on the API side, so that
>> the runtime only deals with operator requirements. It's debatable how the
>> deriving should be done though. E.g., an alternative could be to evenly
>> divide the SSG requirement into requirements of operators in the group.
>>
>>
>> However, I'm not entirely sure which option is more desired. Illustrating my
>> understanding in the following figure, in which on the top is Chesnay's
>> proposal and on the bottom is the SSG-based proposal in this FLIP.
>>
>>
>>
>> I think the major difference between the two approaches is where deriving
>> operator requirements from SSG requirements happens.
>>
>> - Chesnay's proposal simplifies the runtime logic and the interface to
>> expose, at the price of moving more complexity (i.e. the deriving) to the
>> API side. The question is, where do we prefer to keep the complexity? I'm
>> slightly leaning towards having a thin API and keep the complexity in
>> runtime if possible.
>>
>> - Notice that the dash line arrows represent optional steps that are needed
>> only for schedulers that do not respect SSGs, which we don't have at the
>> moment. If we only look at the solid line arrows, then the SSG-based
>> approach is much simpler, without needing to derive and aggregate the
>> requirements back and forth. I'm not sure about complicating the current
>> design only for the potential future needs.
>>
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>>
>> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <[email protected]> wrote:
>>>
>>> You're raising a good point, but I think I can rectify that with a minor
>>> adjustment.
>>>
>>> Default requirements are whatever the default requirements are, setting
>>> the requirements for one operator has no effect on other operators.
>>>
>>> With these rules, and some API enhancements, the following mockup would
>>> replicate the SSG-based behavior:
>>>
>>> Map<SlotSharingGroupId, Requirements> requirements = ...
>>> for slotSharingGroup in env.getSlotSharingGroups() {
>>> vertices = slotSharingGroup.getVertices()
>>> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
>>> vertices.remainint().setRequirements(ZERO)
>>> }
>>>
>>> We could even allow setting requirements on slotsharing-groups
>>> colocation-groups and internally translate them accordingly.
>>> I can't help but feel this is a plain API issue.
>>>
>>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
>>> > If I understand you correctly Chesnay, then you want to decouple the
>>> > resource requirement specification from the slot sharing group
>>> > assignment. Hence, per default all operators would be in the same slot
>>> > sharing group. If there is no operator with a resource specification,
>>> > then the system would allocate a default slot for it. If there is at
>>> > least one operator, then the system would sum up all the specified
>>> > resources and allocate a slot of this size. This effectively means
>>> > that all unspecified operators will implicitly have a zero resource
>>> > requirement. Did I understand your idea correctly?
>>> >
>>> > I am wondering whether this wouldn't lead to a surprising behaviour
>>> > for the user. If the user specifies the resource requirements for a
>>> > single operator, then he probably will assume that the other operators
>>> > will get the default share of resources and not nothing.
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <[email protected]
>>> > <mailto:[email protected]>> wrote:
>>> >
>>> > Is there even a functional difference between specifying the
>>> > requirements for an SSG vs specifying the same requirements on a
>>> > single
>>> > operator within that group (ideally a colocation group to avoid this
>>> > whole hint business)?
>>> >
>>> > Wouldn't we get the best of both worlds in the latter case?
>>> >
>>> > Users can take shortcuts to define shared requirements,
>>> > but refine them further as needed on a per-operator basis,
>>> > without changing semantics of slotsharing groups
>>> > nor the runtime being locked into SSG-based requirements.
>>> >
>>> > (And before anyone argues what happens if slotsharing groups
>>> > change or
>>> > whatnot, that's a plain API issue that we could surely solve. (A
>>> > plain
>>> > iteration over slotsharing groups and therein contained operators
>>> > would
>>> > suffice)).
>>> >
>>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
>>> > > Maybe a different minor idea: Would it be possible to treat the SSG
>>> > > resource requirements as a hint for the runtime similar to how
>>> > slot sharing
>>> > > groups are designed at the moment? Meaning that we don't give
>>> > the guarantee
>>> > > that Flink will always deploy this set of tasks together no
>>> > matter what
>>> > > comes. If, for example, the runtime can derive by some means the
>>> > resource
>>> > > requirements for each task based on the requirements for the
>>> > SSG, this
>>> > > could be possible. One easy strategy would be to give every task
>>> > the same
>>> > > resources as the whole slot sharing group. Another one could be
>>> > > distributing the resources equally among the tasks. This does
>>> > not even have
>>> > > to be implemented but we would give ourselves the freedom to change
>>> > > scheduling if need should arise.
>>> > >
>>> > > Cheers,
>>> > > Till
>>> > >
>>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <[email protected]
>>> > <mailto:[email protected]>> wrote:
>>> > >
>>> > >> Thanks for the responses, Till and Xintong.
>>> > >>
>>> > >> I second Xintong's comment that SSG-based runtime interface
>>> > will give
>>> > >> us the flexibility to achieve op/task-based approach. That's one of
>>> > >> the most important reasons for our design choice.
>>> > >>
>>> > >> Some cents regarding the default operator resource:
>>> > >> - It might be good for the scenario of DataStream jobs.
>>> > >> ** For light-weight operators, the accumulative
>>> > configuration error
>>> > >> will not be significant. Then, the resource of a task used is
>>> > >> proportional to the number of operators it contains.
>>> > >> ** For heavy operators like join and window or operators
>>> > using the
>>> > >> external resources, user will turn to the fine-grained resource
>>> > >> configuration.
>>> > >> - It can increase the stability for the standalone cluster
>>> > where task
>>> > >> executors registered are heterogeneous(with different default slot
>>> > >> resources).
>>> > >> - It might not be good for SQL users. The operators that SQL
>>> > will be
>>> > >> transferred to is a black box to the user. We also do not guarantee
>>> > >> the cross-version of consistency of the transformation so far.
>>> > >>
>>> > >> I think it can be treated as a follow-up work when the fine-grained
>>> > >> resource management is end-to-end ready.
>>> > >>
>>> > >> Best,
>>> > >> Yangze Guo
>>> > >>
>>> > >>
>>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
>>> > <[email protected] <mailto:[email protected]>>
>>> > >> wrote:
>>> > >>> Thanks for the feedback, Till.
>>> > >>>
>>> > >>> ## I feel that what you proposed (operator-based + default
>>> > value) might
>>> > >> be
>>> > >>> subsumed by the SSG-based approach.
>>> > >>> Thinking of op_1 -> op_2, there are the following 4 cases,
>>> > categorized by
>>> > >>> whether the resource requirements are known to the users.
>>> > >>>
>>> > >>> 1. *Both known.* As previously mentioned, there's no
>>> > reason to put
>>> > >>> multiple operators whose individual resource requirements
>>> > are already
>>> > >> known
>>> > >>> into the same group in fine-grained resource management.
>>> > And if op_1
>>> > >> and
>>> > >>> op_2 are in different groups, there should be no problem
>>> > switching
>>> > >> data
>>> > >>> exchange mode from pipelined to blocking. This is
>>> > equivalent to
>>> > >> specifying
>>> > >>> operator resource requirements in your proposal.
>>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except that
>>> > op_2 is in a
>>> > >>> SSG whose resource is not specified thus would have the
>>> > default slot
>>> > >>> resource. This is equivalent to having default operator
>>> > resources in
>>> > >> your
>>> > >>> proposal.
>>> > >>> 3. *Both unknown*. The user can either set op_1 and op_2
>>> > to the same
>>> > >> SSG
>>> > >>> or separate SSGs.
>>> > >>> - If op_1 and op_2 are in the same SSG, it will be
>>> > equivalent to
>>> > >> the
>>> > >>> coarse-grained resource management, where op_1 and op_2
>>> > share a
>>> > >> default
>>> > >>> size slot no matter which data exchange mode is used.
>>> > >>> - If op_1 and op_2 are in different SSGs, then each of
>>> > them will
>>> > >> use
>>> > >>> a default size slot. This is equivalent to setting them
>>> > with
>>> > >> default
>>> > >>> operator resources in your proposal.
>>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2 is
>>> > known.*
>>> > >>> - It is possible that the user learns the total / max
>>> > resource
>>> > >>> requirement from executing and monitoring the job,
>>> > while not
>>> > >>> being aware of
>>> > >>> individual operator requirements.
>>> > >>> - I believe this is the case your proposal does not
>>> > cover. And TBH,
>>> > >>> this is probably how most users learn the resource
>>> > requirements,
>>> > >>> according
>>> > >>> to my experiences.
>>> > >>> - In this case, the user might need to specify
>>> > different resources
>>> > >> if
>>> > >>> he wants to switch the execution mode, which should not
>>> > be worse
>>> > >> than not
>>> > >>> being able to use fine-grained resource management.
>>> > >>>
>>> > >>>
>>> > >>> ## An additional idea inspired by your proposal.
>>> > >>> We may provide multiple options for deciding resources for
>>> > SSGs whose
>>> > >>> requirement is not specified, if needed.
>>> > >>>
>>> > >>> - Default slot resource (current design)
>>> > >>> - Default operator resource times number of operators
>>> > (equivalent to
>>> > >>> your proposal)
>>> > >>>
>>> > >>>
>>> > >>> ## Exposing internal runtime strategies
>>> > >>> Theoretically, yes. Tying to the SSGs, the resource
>>> > requirements might be
>>> > >>> affected if how SSGs are internally handled changes in future.
>>> > >> Practically,
>>> > >>> I do not concretely see at the moment what kind of changes we
>>> > may want in
>>> > >>> future that might conflict with this FLIP proposal, as the
>>> > question of
>>> > >>> switching data exchange mode answered above. I'd suggest to
>>> > not give up
>>> > >> the
>>> > >>> user friendliness we may gain now for the future problems that
>>> > may or may
>>> > >>> not exist.
>>> > >>>
>>> > >>> Moreover, the SSG-based approach has the flexibility to
>>> > achieve the
>>> > >>> equivalent behavior as the operator-based approach, if we set each
>>> > >> operator
>>> > >>> (or task) to a separate SSG. We can even provide a shortcut
>>> > option to
>>> > >>> automatically do that for users, if needed.
>>> > >>>
>>> > >>>
>>> > >>> Thank you~
>>> > >>>
>>> > >>> Xintong Song
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
>>> > <[email protected] <mailto:[email protected]>>
>>> > >> wrote:
>>> > >>>> Thanks for the responses Xintong and Stephan,
>>> > >>>>
>>> > >>>> I agree that being able to define the resource requirements for a
>>> > >> group of
>>> > >>>> operators is more user friendly. However, my concern is that
>>> > we are
>>> > >>>> exposing thereby internal runtime strategies which might
>>> > limit our
>>> > >>>> flexibility to execute a given job. Moreover, the semantics of
>>> > >> configuring
>>> > >>>> resource requirements for SSGs could break if switching from
>>> > streaming
>>> > >> to
>>> > >>>> batch execution. If one defines the resource requirements for
>>> > op_1 ->
>>> > >> op_2
>>> > >>>> which run in pipelined mode when using the streaming
>>> > execution, then
>>> > >> how do
>>> > >>>> we interpret these requirements when op_1 -> op_2 are
>>> > executed with a
>>> > >>>> blocking data exchange in batch execution mode? Consequently,
>>> > I am
>>> > >> still
>>> > >>>> leaning towards Stephan's proposal to set the resource
>>> > requirements per
>>> > >>>> operator.
>>> > >>>>
>>> > >>>> Maybe the following proposal makes the configuration easier:
>>> > If the
>>> > >> user
>>> > >>>> wants to use fine-grained resource requirements, then she
>>> > needs to
>>> > >> specify
>>> > >>>> the default size which is used for operators which have no
>>> > explicit
>>> > >>>> resource annotation. If this holds true, then every operator
>>> > would
>>> > >> have a
>>> > >>>> resource requirement and the system can try to execute the
>>> > operators
>>> > >> in the
>>> > >>>> best possible manner w/o being constrained by how the user
>>> > set the SSG
>>> > >>>> requirements.
>>> > >>>>
>>> > >>>> Cheers,
>>> > >>>> Till
>>> > >>>>
>>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
>>> > <[email protected] <mailto:[email protected]>>
>>> > >>>> wrote:
>>> > >>>>
>>> > >>>>> Thanks for the feedback, Stephan.
>>> > >>>>>
>>> > >>>>> Actually, your proposal has also come to my mind at some
>>> > point. And I
>>> > >>>> have
>>> > >>>>> some concerns about it.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> 1. It does not give users the same control as the SSG-based
>>> > approach.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> While both approaches do not require specifying for each
>>> > operator,
>>> > >>>>> SSG-based approach supports the semantic that "some operators
>>> > >> together
>>> > >>>> use
>>> > >>>>> this much resource" while the operator-based approach doesn't.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
>>> > o_m), and
>>> > >> at
>>> > >>>> some
>>> > >>>>> point there's an agg o_n (1 < n < m) which significantly
>>> > reduces the
>>> > >> data
>>> > >>>>> amount. One can separate the pipeline into 2 groups SSG_1
>>> > (o_1, ...,
>>> > >> o_n)
>>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
>>> > >> parallelisms
>>> > >>>>> for operators in SSG_1 than for operators in SSG_2 won't
>>> > lead to too
>>> > >> much
>>> > >>>>> wasting of resources. If the two SSGs end up needing different
>>> > >> resources,
>>> > >>>>> with the SSG-based approach one can directly specify
>>> > resources for
>>> > >> the
>>> > >>>> two
>>> > >>>>> groups. However, with the operator-based approach, the user will
>>> > >> have to
>>> > >>>>> specify resources for each operator in one of the two
>>> > groups, and
>>> > >> tune
>>> > >>>> the
>>> > >>>>> default slot resource via configurations to fit the other group.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> 2. It increases the chance of breaking operator chains.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> Setting chainnable operators into different slot sharing
>>> > groups will
>>> > >>>>> prevent them from being chained. In the current implementation,
>>> > >>>> downstream
>>> > >>>>> operators, if SSG not explicitly specified, will be set to
>>> > the same
>>> > >> group
>>> > >>>>> as the chainable upstream operators (unless multiple upstream
>>> > >> operators
>>> > >>>> in
>>> > >>>>> different groups), to reduce the chance of breaking chains.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
>>> > deciding
>>> > >> SSGs
>>> > >>>>> based on whether resource is specified we will easily get
>>> > groups like
>>> > >>>> (o_1,
>>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be
>>> > chained. This
>>> > >> is
>>> > >>>> also
>>> > >>>>> possible for the SSG-based approach, but I believe the
>>> > chance is much
>>> > >>>>> smaller because there's no strong reason for users to
>>> > specify the
>>> > >> groups
>>> > >>>>> with alternate operators like that. We are more likely to
>>> > get groups
>>> > >> like
>>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between
>>> > o_2 and
>>> > >> o_3.
>>> > >>>>>
>>> > >>>>> 3. It complicates the system by having two different
>>> > mechanisms for
>>> > >>>> sharing
>>> > >>>>> managed memory in a slot.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> - In FLIP-141, we introduced the intra-slot managed memory
>>> > sharing
>>> > >>>>> mechanism, where managed memory is first distributed
>>> > according to the
>>> > >>>>> consumer type, then further distributed across operators of that
>>> > >> consumer
>>> > >>>>> type.
>>> > >>>>>
>>> > >>>>> - With the operator-based approach, managed memory size
>>> > specified
>>> > >> for an
>>> > >>>>> operator should account for all the consumer types of that
>>> > operator.
>>> > >> That
>>> > >>>>> means the managed memory is first distributed across
>>> > operators, then
>>> > >>>>> distributed to different consumer types of each operator.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> Unfortunately, the different order of the two calculation
>>> > steps can
>>> > >> lead
>>> > >>>> to
>>> > >>>>> different results. To be specific, the semantic of the
>>> > configuration
>>> > >>>> option
>>> > >>>>> `consumer-weights` changed (within a slot vs. within an
>>> > operator).
>>> > >>>>>
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> To sum up things:
>>> > >>>>>
>>> > >>>>> While (3) might be a bit more implementation related, I
>>> > think (1)
>>> > >> and (2)
>>> > >>>>> somehow suggest that, the price for the proposed approach to
>>> > avoid
>>> > >>>>> specifying resource for every operator is that it's not as
>>> > >> independent
>>> > >>>> from
>>> > >>>>> operator chaining and slot sharing as the operator-based
>>> > approach
>>> > >>>> discussed
>>> > >>>>> in the FLIP.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> Thank you~
>>> > >>>>>
>>> > >>>>> Xintong Song
>>> > >>>>>
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
>>> > <[email protected] <mailto:[email protected]>>
>>> > >> wrote:
>>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
>>> > >>>>>>
>>> > >>>>>> I want to say, first of all, that this is super well
>>> > written. And
>>> > >> the
>>> > >>>>>> points that the FLIP makes about how to expose the
>>> > configuration to
>>> > >>>> users
>>> > >>>>>> is exactly the right thing to figure out first.
>>> > >>>>>> So good job here!
>>> > >>>>>>
>>> > >>>>>> About how to let users specify the resource profiles. If I
>>> > can sum
>>> > >> the
>>> > >>>>> FLIP
>>> > >>>>>> and previous discussion up in my own words, the problem is the
>>> > >>>> following:
>>> > >>>>>> Operator-level specification is the simplest and cleanest
>>> > approach,
>>> > >>>>> because
>>> > >>>>>>> it avoids mixing operator configuration (resource) and
>>> > >> scheduling. No
>>> > >>>>>>> matter what other parameters change (chaining, slot sharing,
>>> > >>>> switching
>>> > >>>>>>> pipelined and blocking shuffles), the resource profiles
>>> > stay the
>>> > >>>> same.
>>> > >>>>>>> But it would require that a user specifies resources on all
>>> > >>>> operators,
>>> > >>>>>>> which makes it hard to use. That's why the FLIP suggests going
>>> > >> with
>>> > >>>>>>> specifying resources on a Sharing-Group.
>>> > >>>>>>
>>> > >>>>>> I think both thoughts are important, so can we find a solution
>>> > >> where
>>> > >>>> the
>>> > >>>>>> Resource Profiles are specified on an Operator, but we
>>> > still avoid
>>> > >> that
>>> > >>>>> we
>>> > >>>>>> need to specify a resource profile on every operator?
>>> > >>>>>>
>>> > >>>>>> What do you think about something like the following:
>>> > >>>>>> - Resource Profiles are specified on an operator level.
>>> > >>>>>> - Not all operators need profiles
>>> > >>>>>> - All Operators without a Resource Profile ended up in the
>>> > >> default
>>> > >>>> slot
>>> > >>>>>> sharing group with a default profile (will get a default slot).
>>> > >>>>>> - All Operators with a Resource Profile will go into
>>> > another slot
>>> > >>>>> sharing
>>> > >>>>>> group (the resource-specified-group).
>>> > >>>>>> - Users can define different slot sharing groups for
>>> > operators
>>> > >> like
>>> > >>>>> they
>>> > >>>>>> do now, with the exception that you cannot mix operators
>>> > that have
>>> > >> a
>>> > >>>>>> resource profile and operators that have no resource profile.
>>> > >>>>>> - The default case where no operator has a resource
>>> > profile is
>>> > >> just a
>>> > >>>>>> special case of this model
>>> > >>>>>> - The chaining logic sums up the profiles per operator,
>>> > like it
>>> > >> does
>>> > >>>>> now,
>>> > >>>>>> and the scheduler sums up the profiles of the tasks that it
>>> > >> schedules
>>> > >>>>>> together.
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>> There is another question about reactive scaling raised in the
>>> > >> FLIP. I
>>> > >>>>> need
>>> > >>>>>> to think a bit about that. That is indeed a bit more tricky
>>> > once we
>>> > >>>> have
>>> > >>>>>> slots of different sizes.
>>> > >>>>>> It is not clear then which of the different slot requests the
>>> > >>>>>> ResourceManager should fulfill when new resources (TMs)
>>> > show up,
>>> > >> or how
>>> > >>>>> the
>>> > >>>>>> JobManager redistributes the slots resources when resources
>>> > (TMs)
>>> > >>>>> disappear
>>> > >>>>>> This question is pretty orthogonal, though, to the "how to
>>> > specify
>>> > >> the
>>> > >>>>>> resources".
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>> Best,
>>> > >>>>>> Stephan
>>> > >>>>>>
>>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
>>> > <[email protected] <mailto:[email protected]>
>>> > >>>>> wrote:
>>> > >>>>>>> Thanks for drafting the FLIP and driving the discussion,
>>> > Yangze.
>>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
>>> > >>>>>>>
>>> > >>>>>>> @Till,
>>> > >>>>>>>
>>> > >>>>>>> I agree that specifying requirements for SSGs means that SSGs
>>> > >> need to
>>> > >>>>> be
>>> > >>>>>>> supported in fine-grained resource management, otherwise each
>>> > >>>> operator
>>> > >>>>>>> might use as many resources as the whole group. However, I
>>> > cannot
>>> > >>>> think
>>> > >>>>>> of
>>> > >>>>>>> a strong reason for not supporting SSGs in fine-grained
>>> > resource
>>> > >>>>>>> management.
>>> > >>>>>>>
>>> > >>>>>>>
>>> > >>>>>>>> Interestingly, if all operators have their resources properly
>>> > >>>>>> specified,
>>> > >>>>>>>> then slot sharing is no longer needed because Flink could
>>> > >> slice off
>>> > >>>>> the
>>> > >>>>>>>> appropriately sized slots for every Task individually.
>>> > >>>>>>>>
>>> > >>>>>>> So for example, if we have a job consisting of two
>>> > operator op_1
>>> > >> and
>>> > >>>>> op_2
>>> > >>>>>>>> where each op needs 100 MB of memory, we would then say that
>>> > >> the
>>> > >>>> slot
>>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>>> > >> cluster
>>> > >>>>> with
>>> > >>>>>> 2
>>> > >>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
>>> > >> this
>>> > >>>>> job.
>>> > >>>>>> If
>>> > >>>>>>>> the resources were specified on an operator level, then the
>>> > >> system
>>> > >>>>>> could
>>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>>> > >> TM_2.
>>> > >>>>>>>
>>> > >>>>>>> Couldn't agree more that if all operators' requirements are
>>> > >> properly
>>> > >>>>>>> specified, slot sharing should be no longer needed. I
>>> > think this
>>> > >>>>> exactly
>>> > >>>>>>> disproves the example. If we already know op_1 and op_2 each
>>> > >> needs
>>> > >>>> 100
>>> > >>>>> MB
>>> > >>>>>>> of memory, why would we put them in the same group? If
>>> > they are
>>> > >> in
>>> > >>>>>> separate
>>> > >>>>>>> groups, with the proposed approach the system can freely
>>> > deploy
>>> > >> them
>>> > >>>> to
>>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
>>> > >>>>>>>
>>> > >>>>>>> Moreover, the precondition for not needing slot sharing is
>>> > having
>>> > >>>>>> resource
>>> > >>>>>>> requirements properly specified for all operators. This is not
>>> > >> always
>>> > >>>>>>> possible, and usually requires tremendous efforts. One of the
>>> > >>>> benefits
>>> > >>>>>> for
>>> > >>>>>>> SSG-based requirements is that it allows the user to freely
>>> > >> decide
>>> > >>>> the
>>> > >>>>>>> granularity, thus efforts they want to pay. I would
>>> > consider SSG
>>> > >> in
>>> > >>>>>>> fine-grained resource management as a group of operators
>>> > that the
>>> > >>>> user
>>> > >>>>>>> would like to specify the total resource for. There can be
>>> > only
>>> > >> one
>>> > >>>>> group
>>> > >>>>>>> in the job, 2~3 groups dividing the job into a few major
>>> > parts,
>>> > >> or as
>>> > >>>>>> many
>>> > >>>>>>> groups as the number of tasks/operators, depending on how
>>> > >>>> fine-grained
>>> > >>>>>> the
>>> > >>>>>>> user is able to specify the resources.
>>> > >>>>>>>
>>> > >>>>>>> Having to support SSGs might be a constraint. But given
>>> > that all
>>> > >> the
>>> > >>>>>>> current scheduler implementations already support SSGs, I
>>> > tend to
>>> > >>>> think
>>> > >>>>>>> that as an acceptable price for the above discussed
>>> > usability and
>>> > >>>>>>> flexibility.
>>> > >>>>>>>
>>> > >>>>>>> @Chesnay
>>> > >>>>>>>
>>> > >>>>>>> Will declaring them on slot sharing groups not also waste
>>> > >> resources
>>> > >>>> if
>>> > >>>>>> the
>>> > >>>>>>>> parallelism of operators within that group are different?
>>> > >>>>>>>>
>>> > >>>>>>> Yes. It's a trade-off between usability and resource
>>> > >> utilization. To
>>> > >>>>>> avoid
>>> > >>>>>>> such wasting, the user can define more groups, so that
>>> > each group
>>> > >>>>>> contains
>>> > >>>>>>> less operators and the chance of having operators with
>>> > different
>>> > >>>>>>> parallelism will be reduced. The price is to have more
>>> > resource
>>> > >>>>>>> requirements to specify.
>>> > >>>>>>>
>>> > >>>>>>> It also seems like quite a hassle for users having to
>>> > >> recalculate the
>>> > >>>>>>>> resource requirements if they change the slot sharing.
>>> > >>>>>>>> I'd think that it's not really workable for users that create
>>> > >> a set
>>> > >>>>> of
>>> > >>>>>>>> re-usable operators which are mixed and matched in their
>>> > >>>>> applications;
>>> > >>>>>>>> managing the resources requirements in such a setting
>>> > would be
>>> > >> a
>>> > >>>>>>>> nightmare, and in the end would require operator-level
>>> > >> requirements
>>> > >>>>> any
>>> > >>>>>>>> way.
>>> > >>>>>>>> In that sense, I'm not even sure whether it really increases
>>> > >>>>> usability.
>>> > >>>>>>> - As mentioned in my reply to Till's comment, there's no
>>> > >> reason to
>>> > >>>>> put
>>> > >>>>>>> multiple operators whose individual resource
>>> > requirements are
>>> > >>>>> already
>>> > >>>>>>> known
>>> > >>>>>>> into the same group in fine-grained resource management.
>>> > >>>>>>> - Even an operator implementation is reused for multiple
>>> > >>>>> applications,
>>> > >>>>>>> it does not guarantee the same resource requirements.
>>> > During
>>> > >> our
>>> > >>>>> years
>>> > >>>>>>> of
>>> > >>>>>>> practices in Alibaba, with per-operator requirements
>>> > >> specified for
>>> > >>>>>>> Blink's
>>> > >>>>>>> fine-grained resource management, very few users
>>> > (including
>>> > >> our
>>> > >>>>>>> specialists
>>> > >>>>>>> who are dedicated to supporting Blink users) are as
>>> > >> experienced as
>>> > >>>>> to
>>> > >>>>>>> accurately predict/estimate the operator resource
>>> > >> requirements.
>>> > >>>> Most
>>> > >>>>>>> people
>>> > >>>>>>> rely on the execution-time metrics (throughput, delay, cpu
>>> > >> load,
>>> > >>>>>> memory
>>> > >>>>>>> usage, GC pressure, etc.) to improve the specification.
>>> > >>>>>>>
>>> > >>>>>>> To sum up:
>>> > >>>>>>> If the user is capable of providing proper resource
>>> > requirements
>>> > >> for
>>> > >>>>>> every
>>> > >>>>>>> operator, that's definitely a good thing and we would not
>>> > need to
>>> > >>>> rely
>>> > >>>>> on
>>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the
>>> > >> fine-grained
>>> > >>>>>> resource
>>> > >>>>>>> management to work. For those users who are capable and do not
>>> > >> like
>>> > >>>>>> having
>>> > >>>>>>> to set each operator to a separate SSG, I would be ok to have
>>> > >> both
>>> > >>>>>>> SSG-based and operator-based runtime interfaces and to only
>>> > >> fallback
>>> > >>>> to
>>> > >>>>>> the
>>> > >>>>>>> SSG requirements when the operator requirements are not
>>> > >> specified.
>>> > >>>>>> However,
>>> > >>>>>>> as the first step, I think we should prioritise the use cases
>>> > >> where
>>> > >>>>> users
>>> > >>>>>>> are not that experienced.
>>> > >>>>>>>
>>> > >>>>>>> Thank you~
>>> > >>>>>>>
>>> > >>>>>>> Xintong Song
>>> > >>>>>>>
>>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
>>> > >> [email protected] <mailto:[email protected]>>
>>> > >>>>>>> wrote:
>>> > >>>>>>>
>>> > >>>>>>>> Will declaring them on slot sharing groups not also waste
>>> > >> resources
>>> > >>>>> if
>>> > >>>>>>>> the parallelism of operators within that group are different?
>>> > >>>>>>>>
>>> > >>>>>>>> It also seems like quite a hassle for users having to
>>> > >> recalculate
>>> > >>>> the
>>> > >>>>>>>> resource requirements if they change the slot sharing.
>>> > >>>>>>>> I'd think that it's not really workable for users that create
>>> > >> a set
>>> > >>>>> of
>>> > >>>>>>>> re-usable operators which are mixed and matched in their
>>> > >>>>> applications;
>>> > >>>>>>>> managing the resources requirements in such a setting
>>> > would be
>>> > >> a
>>> > >>>>>>>> nightmare, and in the end would require operator-level
>>> > >> requirements
>>> > >>>>> any
>>> > >>>>>>>> way.
>>> > >>>>>>>> In that sense, I'm not even sure whether it really increases
>>> > >>>>> usability.
>>> > >>>>>>>> My main worry is that it if we wire the runtime to work
>>> > on SSGs
>>> > >>>> it's
>>> > >>>>>>>> gonna be difficult to implement more fine-grained approaches,
>>> > >> which
>>> > >>>>>>>> would not be the case if, for the runtime, they are always
>>> > >> defined
>>> > >>>> on
>>> > >>>>>> an
>>> > >>>>>>>> operator-level.
>>> > >>>>>>>>
>>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
>>> > >>>>>>>>> Thanks for drafting this FLIP and starting this discussion
>>> > >>>> Yangze.
>>> > >>>>>>>>> I like that defining resource requirements on a slot sharing
>>> > >>>> group
>>> > >>>>>>> makes
>>> > >>>>>>>>> the overall setup easier and improves usability of resource
>>> > >>>>>>> requirements.
>>> > >>>>>>>>> What I do not like about it is that it changes slot sharing
>>> > >>>> groups
>>> > >>>>>> from
>>> > >>>>>>>>> being a scheduling hint to something which needs to be
>>> > >> supported
>>> > >>>> in
>>> > >>>>>>> order
>>> > >>>>>>>>> to support fine grained resource requirements. So far, the
>>> > >> idea
>>> > >>>> of
>>> > >>>>>> slot
>>> > >>>>>>>>> sharing groups was that it tells the system that a set of
>>> > >>>> operators
>>> > >>>>>> can
>>> > >>>>>>>> be
>>> > >>>>>>>>> deployed in the same slot. But the system still had the
>>> > >> freedom
>>> > >>>> to
>>> > >>>>>> say
>>> > >>>>>>>> that
>>> > >>>>>>>>> it would rather place these tasks in different slots if it
>>> > >>>> wanted.
>>> > >>>>> If
>>> > >>>>>>> we
>>> > >>>>>>>>> now specify resource requirements on a per slot sharing
>>> > >> group,
>>> > >>>> then
>>> > >>>>>> the
>>> > >>>>>>>>> only option for a scheduler which does not support slot
>>> > >> sharing
>>> > >>>>>> groups
>>> > >>>>>>> is
>>> > >>>>>>>>> to say that every operator in this slot sharing group
>>> > needs a
>>> > >>>> slot
>>> > >>>>>> with
>>> > >>>>>>>> the
>>> > >>>>>>>>> same resources as the whole group.
>>> > >>>>>>>>>
>>> > >>>>>>>>> So for example, if we have a job consisting of two operator
>>> > >> op_1
>>> > >>>>> and
>>> > >>>>>>> op_2
>>> > >>>>>>>>> where each op needs 100 MB of memory, we would then say that
>>> > >> the
>>> > >>>>> slot
>>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>>> > >> cluster
>>> > >>>>>> with
>>> > >>>>>>> 2
>>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
>>> > >> this
>>> > >>>>>> job.
>>> > >>>>>>> If
>>> > >>>>>>>>> the resources were specified on an operator level, then the
>>> > >>>> system
>>> > >>>>>>> could
>>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>>> > >> TM_2.
>>> > >>>>>>>>> Originally, one of the primary goals of slot sharing groups
>>> > >> was
>>> > >>>> to
>>> > >>>>>> make
>>> > >>>>>>>> it
>>> > >>>>>>>>> easier for the user to reason about how many slots a job
>>> > >> needs
>>> > >>>>>>>> independent
>>> > >>>>>>>>> of the actual number of operators in the job. Interestingly,
>>> > >> if
>>> > >>>> all
>>> > >>>>>>>>> operators have their resources properly specified, then slot
>>> > >>>>> sharing
>>> > >>>>>> is
>>> > >>>>>>>> no
>>> > >>>>>>>>> longer needed because Flink could slice off the
>>> > appropriately
>>> > >>>> sized
>>> > >>>>>>> slots
>>> > >>>>>>>>> for every Task individually. What matters is whether the
>>> > >> whole
>>> > >>>>>> cluster
>>> > >>>>>>>> has
>>> > >>>>>>>>> enough resources to run all tasks or not.
>>> > >>>>>>>>>
>>> > >>>>>>>>> Cheers,
>>> > >>>>>>>>> Till
>>> > >>>>>>>>>
>>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
>>> > >> [email protected] <mailto:[email protected]>>
>>> > >>>>>> wrote:
>>> > >>>>>>>>>> Hi, there,
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> We would like to start a discussion thread on "FLIP-156:
>>> > >> Runtime
>>> > >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
>>> > >> where we
>>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces
>>> > >> for
>>> > >>>>>>>>>> specifying fine-grained resource requirements.
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> In this FLIP:
>>> > >>>>>>>>>> - Expound the user story of fine-grained resource
>>> > >> management.
>>> > >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
>>> > >> resource
>>> > >>>>>>>>>> requirements.
>>> > >>>>>>>>>> - Discuss the pros and cons of the three potential
>>> > >> granularities
>>> > >>>>> for
>>> > >>>>>>>>>> specifying the resource requirements (op, task and slot
>>> > >> sharing
>>> > >>>>>> group)
>>> > >>>>>>>>>> and explain why we choose the slot sharing group.
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> Please find more details in the FLIP wiki document [1].
>>> > >> Looking
>>> > >>>>>>>>>> forward to your feedback.
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> [1]
>>> > >>>>>>>>>>
>>> > >>
>>> >
>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>>> >
>>> > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements>
>>> > >>>>>>>>>> Best,
>>> > >>>>>>>>>> Yangze Guo
>>> > >>>>>>>>>>
>>> > >>>>>>>>
>>> >
>>>