Hi all, sorry for join discussion even after voting started.

I want to share my thoughts on this after reading above discussions.

I think Flink *runtime* already has an ideal granularity for resource
management 'task'. If there is
a slot shared by multiple tasks, that slot's resource requirement is simple
sum of all its logical
slots. So basically, this is no resource requirement for SlotSharingGroup
in runtime until now,
right ?

As in discussion, we already agree upon that: "If all operators have their
resources properly
specified, then slot sharing is no longer needed. "

So seems to me, naturally in mind path, what we would discuss is that: how
to bridge impractical
operator level resource specifying to runtime task level resource
requirement ? This is actually a
pure api thing as Chesnay has pointed out.

But FLIP-156 brings another direction on table: how about using SSG for
both api and runtime
resource specifying ?

>From the FLIP and dicusssion, I assume that SSG resource specifying will
override operator level
resource specifying if both are specified ?

So, I wonder whether we could interpret SSG resource specifying as an "add"
but not an "set" on
resource requirement ?

The semantics is that SSG resource specifying adds additional resource to
shared slot to express
concerns on possible high thoughput and resource requirement for tasks in
one physical slot.

The result is that if scheduler indeed respect slot sharing, allocated slot
will gain extra resource
specified for that SSG.

I think one of coding barrier from "add" approach is ResourceSpec.UNKNOWN
which didn't support
'merge' operation. I tend to use ResourceSpec.ZERO as default, task
executor should be aware of
this.

@Chesnay
> My main worry is that it if we wire the runtime to work on SSGs it's
> gonna be difficult to implement more fine-grained approaches, which
> would not be the case if, for the runtime, they are always defined on an
> operator-level.

An "add" operation should be less invasive and enforce low barrier for
future find-grained
approaches.

@Stephan
>   - Users can define different slot sharing groups for operators like
they
> do now, with the exception that you cannot mix operators that have a
> resource profile and operators that have no resource profile.

@Till
> This effectively means that all unspecified operators
> will implicitly have a zero resource requirement.
> I am wondering whether this wouldn't lead to a surprising behaviour for
the
> user. If the user specifies the resource requirements for a single
> operator, then he probably will assume that the other operators will get
> the default share of resources and not nothing.

I think it is inherent due to fact that we could not defining
ResourceSpec.ONE, eg. resource
requirement for exact one default slot, with concrete numbers ? I tend to
squash out unspecified one
if there are operators in chaining with explicit resource specifying.
Otherwise, the protocol tends
to verbose as say "give me this much resource and a default". I think if we
have explict resource
specifying for partial operators, it is just saying "I don't care other
operators that much, just
get them places to run". It is most likely be cases there are stateless
fliter/map or other less
resource consuming operators. If there is indeed a problem, I think clients
can specify a global
default(or other level default in future). In job graph generating phase,
we could take that default
into account for unspecified operators.

@FLIP-156
> Expose operator chaining. (Cons fo task level resource specifying)

Is it inherent for all group level resource specifying ? They will either
break chaining or obey it,
or event could not work with.

To sum up above, my suggestions are:

In api side:
* StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
unspecified).
* Operator: ResourceSpec.ZERO(unspecified) as default.
* Task: sum of requirements from specified operators + global default(if
there are any unspecified operators)
* SSG: additional resource to physical slot.

In runtime side:
* Task: ResourceSpec.Task or ResourceSpec.ZERO
* SSG: ResourceSpec.SSG or ResourceSpec.ZERO

Physical slot gets sum up resources from logical slots and SSG, if it gets
ResourceSpec.ZERO, it is
just a default sized slot.

In short, turn SSG resource speciying as "add" and drop
ResourceSpec.UNKNOWN.


Questions/Issues:
* Could SSG express negative resource requirement ?
* Is there concrete bar for partial resource configured not function ? I
saw it will fail job submission in Dispatcher.submitJob.
* An option(cluster/job level) to force slot sharing in scheduler ? This
could be useful in case of migration from FLIP-156 to future approach.
* An option(cluster) to ignore resource specifying(allow resource specified
job to run on open box environment) for no production usage ?



On February 1, 2021 at 11:54:10, Yangze Guo (karma...@gmail.com) wrote:

Thanks for reply, Till and Xintong!

I update the FLIP, including:
- Edit the JavaDoc of the proposed
StreamGraphGenerator#setSlotSharingGroupResource.
- Add "Future Plan" section, which contains the potential follow-up
issues and the limitations to be documented when fine-grained resource
management is exposed to users.

I'll start a vote in another thread.

Best,
Yangze Guo

On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <trohrm...@apache.org>
wrote:
>
> Thanks for summarizing the discussion, Yangze. I agree that setting
> resource requirements per operator is not very user friendly. Moreover, I
> couldn't come up with a different proposal which would be as easy to use
> and wouldn't expose internal scheduling details. In fact, following this
> argument then we shouldn't have exposed the slot sharing groups in the
> first place.
>
> What is important for the user is that we properly document the
limitations
> and constraints the fine grained resource specification has. For example,
> we should explain how optimizations like chaining are affected by it and
> how different execution modes (batch vs. streaming) affect the execution
of
> operators which have specified resources. These things shouldn't become
> part of the contract of this feature and are more caused by internal
> implementation details but it will be important to understand these
things
> properly in order to use this feature effectively.
>
> Hence, +1 for starting the vote for this FLIP.
>
> Cheers,
> Till
>
> On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <tonysong...@gmail.com>
wrote:
>
> > Thanks for the summary, Yangze.
> >
> > The changes and follow-up issues LGTM. Let's wait for responses from
the
> > others before starting a vote.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <karma...@gmail.com> wrote:
> >
> > > Thanks everyone for the lively discussion. I'd like to try to
> > > summarize the current convergence in the discussion. Please let me
> > > know if I got things wrong or missed something crucial here.
> > >
> > > Change of this FLIP:
> > > - Treat the SSG resource requirements as a hint instead of a
> > > restriction for the runtime. That's should be explicitly explained in
> > > the JavaDocs.
> > >
> > > Potential follow-up issues if needed:
> > > - Provide operator-level resource configuration interface.
> > > - Provide multiple options for deciding resources for SSGs whose
> > > requirement is not specified:
> > > ** Default slot resource.
> > > ** Default operator resource times number of operators.
> > >
> > > If there are no other issues, I'll update the FLIP accordingly and
> > > start a vote thread. Thanks all for the valuable feedback again.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > Best,
> > > Yangze Guo
> > >
> > >
> > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <tonysong...@gmail.com>
> > > wrote:
> > > >
> > > >
> > > > FGRuntimeInterface.png
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <tonysong...@gmail.com>

> > > wrote:
> > > >>
> > > >> I think Chesnay's proposal could actually work. IIUC, the keypoint
is
> > > to derive operator requirements from SSG requirements on the API
side, so
> > > that the runtime only deals with operator requirements. It's
debatable
> > how
> > > the deriving should be done though. E.g., an alternative could be to
> > evenly
> > > divide the SSG requirement into requirements of operators in the
group.
> > > >>
> > > >>
> > > >> However, I'm not entirely sure which option is more desired.
> > > Illustrating my understanding in the following figure, in which on
the
> > top
> > > is Chesnay's proposal and on the bottom is the SSG-based proposal in
this
> > > FLIP.
> > > >>
> > > >>
> > > >>
> > > >> I think the major difference between the two approaches is where
> > > deriving operator requirements from SSG requirements happens.
> > > >>
> > > >> - Chesnay's proposal simplifies the runtime logic and the
interface to
> > > expose, at the price of moving more complexity (i.e. the deriving) to
the
> > > API side. The question is, where do we prefer to keep the complexity?
I'm
> > > slightly leaning towards having a thin API and keep the complexity in
> > > runtime if possible.
> > > >>
> > > >> - Notice that the dash line arrows represent optional steps that
are
> > > needed only for schedulers that do not respect SSGs, which we don't
have
> > at
> > > the moment. If we only look at the solid line arrows, then the
SSG-based
> > > approach is much simpler, without needing to derive and aggregate the
> > > requirements back and forth. I'm not sure about complicating the
current
> > > design only for the potential future needs.
> > > >>
> > > >>
> > > >> Thank you~
> > > >>
> > > >> Xintong Song
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
ches...@apache.org>
> > > wrote:
> > > >>>
> > > >>> You're raising a good point, but I think I can rectify that with
a
> > > minor
> > > >>> adjustment.
> > > >>>
> > > >>> Default requirements are whatever the default requirements are,
> > setting
> > > >>> the requirements for one operator has no effect on other
operators.
> > > >>>
> > > >>> With these rules, and some API enhancements, the following mockup
> > would
> > > >>> replicate the SSG-based behavior:
> > > >>>
> > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > >>> vertices = slotSharingGroup.getVertices()
> > > >>>
> > >
> >
vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > >>> vertices.remainint().setRequirements(ZERO)
> > > >>> }
> > > >>>
> > > >>> We could even allow setting requirements on slotsharing-groups
> > > >>> colocation-groups and internally translate them accordingly.
> > > >>> I can't help but feel this is a plain API issue.
> > > >>>
> > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > >>> > If I understand you correctly Chesnay, then you want to
decouple
> > the
> > > >>> > resource requirement specification from the slot sharing group
> > > >>> > assignment. Hence, per default all operators would be in the
same
> > > slot
> > > >>> > sharing group. If there is no operator with a resource
> > specification,
> > > >>> > then the system would allocate a default slot for it. If there
is
> > at
> > > >>> > least one operator, then the system would sum up all the
specified
> > > >>> > resources and allocate a slot of this size. This effectively
means
> > > >>> > that all unspecified operators will implicitly have a zero
resource
> > > >>> > requirement. Did I understand your idea correctly?
> > > >>> >
> > > >>> > I am wondering whether this wouldn't lead to a surprising
behaviour
> > > >>> > for the user. If the user specifies the resource requirements
for a
> > > >>> > single operator, then he probably will assume that the other
> > > operators
> > > >>> > will get the default share of resources and not nothing.
> > > >>> >
> > > >>> > Cheers,
> > > >>> > Till
> > > >>> >
> > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > ches...@apache.org
> > > >>> > <mailto:ches...@apache.org>> wrote:
> > > >>> >
> > > >>> > Is there even a functional difference between specifying the
> > > >>> > requirements for an SSG vs specifying the same requirements on
> > a
> > > >>> > single
> > > >>> > operator within that group (ideally a colocation group to avoid
> > > this
> > > >>> > whole hint business)?
> > > >>> >
> > > >>> > Wouldn't we get the best of both worlds in the latter case?
> > > >>> >
> > > >>> > Users can take shortcuts to define shared requirements,
> > > >>> > but refine them further as needed on a per-operator basis,
> > > >>> > without changing semantics of slotsharing groups
> > > >>> > nor the runtime being locked into SSG-based requirements.
> > > >>> >
> > > >>> > (And before anyone argues what happens if slotsharing groups
> > > >>> > change or
> > > >>> > whatnot, that's a plain API issue that we could surely solve.
> > (A
> > > >>> > plain
> > > >>> > iteration over slotsharing groups and therein contained
> > operators
> > > >>> > would
> > > >>> > suffice)).
> > > >>> >
> > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > >>> > > Maybe a different minor idea: Would it be possible to treat
> > > the SSG
> > > >>> > > resource requirements as a hint for the runtime similar to
> > how
> > > >>> > slot sharing
> > > >>> > > groups are designed at the moment? Meaning that we don't give
> > > >>> > the guarantee
> > > >>> > > that Flink will always deploy this set of tasks together no
> > > >>> > matter what
> > > >>> > > comes. If, for example, the runtime can derive by some means
> > > the
> > > >>> > resource
> > > >>> > > requirements for each task based on the requirements for the
> > > >>> > SSG, this
> > > >>> > > could be possible. One easy strategy would be to give every
> > > task
> > > >>> > the same
> > > >>> > > resources as the whole slot sharing group. Another one could
> > be
> > > >>> > > distributing the resources equally among the tasks. This does
> > > >>> > not even have
> > > >>> > > to be implemented but we would give ourselves the freedom to
> > > change
> > > >>> > > scheduling if need should arise.
> > > >>> > >
> > > >>> > > Cheers,
> > > >>> > > Till
> > > >>> > >
> > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > karma...@gmail.com
> > > >>> > <mailto:karma...@gmail.com>> wrote:
> > > >>> > >
> > > >>> > >> Thanks for the responses, Till and Xintong.
> > > >>> > >>
> > > >>> > >> I second Xintong's comment that SSG-based runtime interface
> > > >>> > will give
> > > >>> > >> us the flexibility to achieve op/task-based approach. That's
> > > one of
> > > >>> > >> the most important reasons for our design choice.
> > > >>> > >>
> > > >>> > >> Some cents regarding the default operator resource:
> > > >>> > >> - It might be good for the scenario of DataStream jobs.
> > > >>> > >> ** For light-weight operators, the accumulative
> > > >>> > configuration error
> > > >>> > >> will not be significant. Then, the resource of a task used
> > is
> > > >>> > >> proportional to the number of operators it contains.
> > > >>> > >> ** For heavy operators like join and window or operators
> > > >>> > using the
> > > >>> > >> external resources, user will turn to the fine-grained
> > > resource
> > > >>> > >> configuration.
> > > >>> > >> - It can increase the stability for the standalone cluster
> > > >>> > where task
> > > >>> > >> executors registered are heterogeneous(with different
> > default
> > > slot
> > > >>> > >> resources).
> > > >>> > >> - It might not be good for SQL users. The operators that SQL
> > > >>> > will be
> > > >>> > >> transferred to is a black box to the user. We also do not
> > > guarantee
> > > >>> > >> the cross-version of consistency of the transformation so
> > far.
> > > >>> > >>
> > > >>> > >> I think it can be treated as a follow-up work when the
> > > fine-grained
> > > >>> > >> resource management is end-to-end ready.
> > > >>> > >>
> > > >>> > >> Best,
> > > >>> > >> Yangze Guo
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>>
> > > >>> > >> wrote:
> > > >>> > >>> Thanks for the feedback, Till.
> > > >>> > >>>
> > > >>> > >>> ## I feel that what you proposed (operator-based + default
> > > >>> > value) might
> > > >>> > >> be
> > > >>> > >>> subsumed by the SSG-based approach.
> > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> > > >>> > categorized by
> > > >>> > >>> whether the resource requirements are known to the users.
> > > >>> > >>>
> > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > >>> > reason to put
> > > >>> > >>> multiple operators whose individual resource
> > requirements
> > > >>> > are already
> > > >>> > >> known
> > > >>> > >>> into the same group in fine-grained resource
> > management.
> > > >>> > And if op_1
> > > >>> > >> and
> > > >>> > >>> op_2 are in different groups, there should be no
> > problem
> > > >>> > switching
> > > >>> > >> data
> > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > >>> > equivalent to
> > > >>> > >> specifying
> > > >>> > >>> operator resource requirements in your proposal.
> > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > that
> > > >>> > op_2 is in a
> > > >>> > >>> SSG whose resource is not specified thus would have the
> > > >>> > default slot
> > > >>> > >>> resource. This is equivalent to having default operator
> > > >>> > resources in
> > > >>> > >> your
> > > >>> > >>> proposal.
> > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > op_2
> > > >>> > to the same
> > > >>> > >> SSG
> > > >>> > >>> or separate SSGs.
> > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > >>> > equivalent to
> > > >>> > >> the
> > > >>> > >>> coarse-grained resource management, where op_1 and
> > > op_2
> > > >>> > share a
> > > >>> > >> default
> > > >>> > >>> size slot no matter which data exchange mode is
> > used.
> > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > of
> > > >>> > them will
> > > >>> > >> use
> > > >>> > >>> a default size slot. This is equivalent to setting
> > > them
> > > >>> > with
> > > >>> > >> default
> > > >>> > >>> operator resources in your proposal.
> > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > > is
> > > >>> > known.*
> > > >>> > >>> - It is possible that the user learns the total /
> > max
> > > >>> > resource
> > > >>> > >>> requirement from executing and monitoring the job,
> > > >>> > while not
> > > >>> > >>> being aware of
> > > >>> > >>> individual operator requirements.
> > > >>> > >>> - I believe this is the case your proposal does not
> > > >>> > cover. And TBH,
> > > >>> > >>> this is probably how most users learn the resource
> > > >>> > requirements,
> > > >>> > >>> according
> > > >>> > >>> to my experiences.
> > > >>> > >>> - In this case, the user might need to specify
> > > >>> > different resources
> > > >>> > >> if
> > > >>> > >>> he wants to switch the execution mode, which should
> > > not
> > > >>> > be worse
> > > >>> > >> than not
> > > >>> > >>> being able to use fine-grained resource management.
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>> ## An additional idea inspired by your proposal.
> > > >>> > >>> We may provide multiple options for deciding resources for
> > > >>> > SSGs whose
> > > >>> > >>> requirement is not specified, if needed.
> > > >>> > >>>
> > > >>> > >>> - Default slot resource (current design)
> > > >>> > >>> - Default operator resource times number of operators
> > > >>> > (equivalent to
> > > >>> > >>> your proposal)
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>> ## Exposing internal runtime strategies
> > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > >>> > requirements might be
> > > >>> > >>> affected if how SSGs are internally handled changes in
> > > future.
> > > >>> > >> Practically,
> > > >>> > >>> I do not concretely see at the moment what kind of changes
> > we
> > > >>> > may want in
> > > >>> > >>> future that might conflict with this FLIP proposal, as the
> > > >>> > question of
> > > >>> > >>> switching data exchange mode answered above. I'd suggest to
> > > >>> > not give up
> > > >>> > >> the
> > > >>> > >>> user friendliness we may gain now for the future problems
> > > that
> > > >>> > may or may
> > > >>> > >>> not exist.
> > > >>> > >>>
> > > >>> > >>> Moreover, the SSG-based approach has the flexibility to
> > > >>> > achieve the
> > > >>> > >>> equivalent behavior as the operator-based approach, if we
> > > set each
> > > >>> > >> operator
> > > >>> > >>> (or task) to a separate SSG. We can even provide a shortcut
> > > >>> > option to
> > > >>> > >>> automatically do that for users, if needed.
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>> Thank you~
> > > >>> > >>>
> > > >>> > >>> Xintong Song
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > >>> > <trohrm...@apache.org <mailto:trohrm...@apache.org>>
> > > >>> > >> wrote:
> > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > >>> > >>>>
> > > >>> > >>>> I agree that being able to define the resource
> > requirements
> > > for a
> > > >>> > >> group of
> > > >>> > >>>> operators is more user friendly. However, my concern is
> > that
> > > >>> > we are
> > > >>> > >>>> exposing thereby internal runtime strategies which might
> > > >>> > limit our
> > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > semantics
> > > of
> > > >>> > >> configuring
> > > >>> > >>>> resource requirements for SSGs could break if switching
> > from
> > > >>> > streaming
> > > >>> > >> to
> > > >>> > >>>> batch execution. If one defines the resource requirements
> > > for
> > > >>> > op_1 ->
> > > >>> > >> op_2
> > > >>> > >>>> which run in pipelined mode when using the streaming
> > > >>> > execution, then
> > > >>> > >> how do
> > > >>> > >>>> we interpret these requirements when op_1 -> op_2 are
> > > >>> > executed with a
> > > >>> > >>>> blocking data exchange in batch execution mode?
> > > Consequently,
> > > >>> > I am
> > > >>> > >> still
> > > >>> > >>>> leaning towards Stephan's proposal to set the resource
> > > >>> > requirements per
> > > >>> > >>>> operator.
> > > >>> > >>>>
> > > >>> > >>>> Maybe the following proposal makes the configuration
> > easier:
> > > >>> > If the
> > > >>> > >> user
> > > >>> > >>>> wants to use fine-grained resource requirements, then she
> > > >>> > needs to
> > > >>> > >> specify
> > > >>> > >>>> the default size which is used for operators which have no
> > > >>> > explicit
> > > >>> > >>>> resource annotation. If this holds true, then every
> > operator
> > > >>> > would
> > > >>> > >> have a
> > > >>> > >>>> resource requirement and the system can try to execute the
> > > >>> > operators
> > > >>> > >> in the
> > > >>> > >>>> best possible manner w/o being constrained by how the user
> > > >>> > set the SSG
> > > >>> > >>>> requirements.
> > > >>> > >>>>
> > > >>> > >>>> Cheers,
> > > >>> > >>>> Till
> > > >>> > >>>>
> > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>>
> > > >>> > >>>> wrote:
> > > >>> > >>>>
> > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > >>> > >>>>>
> > > >>> > >>>>> Actually, your proposal has also come to my mind at some
> > > >>> > point. And I
> > > >>> > >>>> have
> > > >>> > >>>>> some concerns about it.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> 1. It does not give users the same control as the
> > SSG-based
> > > >>> > approach.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> While both approaches do not require specifying for each
> > > >>> > operator,
> > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > operators
> > > >>> > >> together
> > > >>> > >>>> use
> > > >>> > >>>>> this much resource" while the operator-based approach
> > > doesn't.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
> > > >>> > o_m), and
> > > >>> > >> at
> > > >>> > >>>> some
> > > >>> > >>>>> point there's an agg o_n (1 < n < m) which significantly
> > > >>> > reduces the
> > > >>> > >> data
> > > >>> > >>>>> amount. One can separate the pipeline into 2 groups SSG_1
> > > >>> > (o_1, ...,
> > > >>> > >> o_n)
> > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> > higher
> > > >>> > >> parallelisms
> > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2 won't
> > > >>> > lead to too
> > > >>> > >> much
> > > >>> > >>>>> wasting of resources. If the two SSGs end up needing
> > > different
> > > >>> > >> resources,
> > > >>> > >>>>> with the SSG-based approach one can directly specify
> > > >>> > resources for
> > > >>> > >> the
> > > >>> > >>>> two
> > > >>> > >>>>> groups. However, with the operator-based approach, the
> > > user will
> > > >>> > >> have to
> > > >>> > >>>>> specify resources for each operator in one of the two
> > > >>> > groups, and
> > > >>> > >> tune
> > > >>> > >>>> the
> > > >>> > >>>>> default slot resource via configurations to fit the other
> > > group.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> 2. It increases the chance of breaking operator chains.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Setting chainnable operators into different slot sharing
> > > >>> > groups will
> > > >>> > >>>>> prevent them from being chained. In the current
> > > implementation,
> > > >>> > >>>> downstream
> > > >>> > >>>>> operators, if SSG not explicitly specified, will be set
> > to
> > > >>> > the same
> > > >>> > >> group
> > > >>> > >>>>> as the chainable upstream operators (unless multiple
> > > upstream
> > > >>> > >> operators
> > > >>> > >>>> in
> > > >>> > >>>>> different groups), to reduce the chance of breaking
> > chains.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
> > > >>> > deciding
> > > >>> > >> SSGs
> > > >>> > >>>>> based on whether resource is specified we will easily get
> > > >>> > groups like
> > > >>> > >>>> (o_1,
> > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > > >>> > chained. This
> > > >>> > >> is
> > > >>> > >>>> also
> > > >>> > >>>>> possible for the SSG-based approach, but I believe the
> > > >>> > chance is much
> > > >>> > >>>>> smaller because there's no strong reason for users to
> > > >>> > specify the
> > > >>> > >> groups
> > > >>> > >>>>> with alternate operators like that. We are more likely to
> > > >>> > get groups
> > > >>> > >> like
> > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > > between
> > > >>> > o_2 and
> > > >>> > >> o_3.
> > > >>> > >>>>>
> > > >>> > >>>>> 3. It complicates the system by having two different
> > > >>> > mechanisms for
> > > >>> > >>>> sharing
> > > >>> > >>>>> managed memory in a slot.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > memory
> > > >>> > sharing
> > > >>> > >>>>> mechanism, where managed memory is first distributed
> > > >>> > according to the
> > > >>> > >>>>> consumer type, then further distributed across operators
> > > of that
> > > >>> > >> consumer
> > > >>> > >>>>> type.
> > > >>> > >>>>>
> > > >>> > >>>>> - With the operator-based approach, managed memory size
> > > >>> > specified
> > > >>> > >> for an
> > > >>> > >>>>> operator should account for all the consumer types of
> > that
> > > >>> > operator.
> > > >>> > >> That
> > > >>> > >>>>> means the managed memory is first distributed across
> > > >>> > operators, then
> > > >>> > >>>>> distributed to different consumer types of each operator.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Unfortunately, the different order of the two calculation
> > > >>> > steps can
> > > >>> > >> lead
> > > >>> > >>>> to
> > > >>> > >>>>> different results. To be specific, the semantic of the
> > > >>> > configuration
> > > >>> > >>>> option
> > > >>> > >>>>> `consumer-weights` changed (within a slot vs. within an
> > > >>> > operator).
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> To sum up things:
> > > >>> > >>>>>
> > > >>> > >>>>> While (3) might be a bit more implementation related, I
> > > >>> > think (1)
> > > >>> > >> and (2)
> > > >>> > >>>>> somehow suggest that, the price for the proposed approach
> > > to
> > > >>> > avoid
> > > >>> > >>>>> specifying resource for every operator is that it's not
> > as
> > > >>> > >> independent
> > > >>> > >>>> from
> > > >>> > >>>>> operator chaining and slot sharing as the operator-based
> > > >>> > approach
> > > >>> > >>>> discussed
> > > >>> > >>>>> in the FLIP.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Thank you~
> > > >>> > >>>>>
> > > >>> > >>>>> Xintong Song
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > >>> > <se...@apache.org <mailto:se...@apache.org>>
> > > >>> > >> wrote:
> > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > >>> > >>>>>>
> > > >>> > >>>>>> I want to say, first of all, that this is super well
> > > >>> > written. And
> > > >>> > >> the
> > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > >>> > configuration to
> > > >>> > >>>> users
> > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > >>> > >>>>>> So good job here!
> > > >>> > >>>>>>
> > > >>> > >>>>>> About how to let users specify the resource profiles.
> > If I
> > > >>> > can sum
> > > >>> > >> the
> > > >>> > >>>>> FLIP
> > > >>> > >>>>>> and previous discussion up in my own words, the problem
> > > is the
> > > >>> > >>>> following:
> > > >>> > >>>>>> Operator-level specification is the simplest and
> > cleanest
> > > >>> > approach,
> > > >>> > >>>>> because
> > > >>> > >>>>>>> it avoids mixing operator configuration (resource) and
> > > >>> > >> scheduling. No
> > > >>> > >>>>>>> matter what other parameters change (chaining, slot
> > > sharing,
> > > >>> > >>>> switching
> > > >>> > >>>>>>> pipelined and blocking shuffles), the resource profiles
> > > >>> > stay the
> > > >>> > >>>> same.
> > > >>> > >>>>>>> But it would require that a user specifies resources on
> > > all
> > > >>> > >>>> operators,
> > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > suggests
> > > going
> > > >>> > >> with
> > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > >>> > >>>>>>
> > > >>> > >>>>>> I think both thoughts are important, so can we find a
> > > solution
> > > >>> > >> where
> > > >>> > >>>> the
> > > >>> > >>>>>> Resource Profiles are specified on an Operator, but we
> > > >>> > still avoid
> > > >>> > >> that
> > > >>> > >>>>> we
> > > >>> > >>>>>> need to specify a resource profile on every operator?
> > > >>> > >>>>>>
> > > >>> > >>>>>> What do you think about something like the following:
> > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > level.
> > > >>> > >>>>>> - Not all operators need profiles
> > > >>> > >>>>>> - All Operators without a Resource Profile ended up
> > in
> > > the
> > > >>> > >> default
> > > >>> > >>>> slot
> > > >>> > >>>>>> sharing group with a default profile (will get a default
> > > slot).
> > > >>> > >>>>>> - All Operators with a Resource Profile will go into
> > > >>> > another slot
> > > >>> > >>>>> sharing
> > > >>> > >>>>>> group (the resource-specified-group).
> > > >>> > >>>>>> - Users can define different slot sharing groups for
> > > >>> > operators
> > > >>> > >> like
> > > >>> > >>>>> they
> > > >>> > >>>>>> do now, with the exception that you cannot mix operators
> > > >>> > that have
> > > >>> > >> a
> > > >>> > >>>>>> resource profile and operators that have no resource
> > > profile.
> > > >>> > >>>>>> - The default case where no operator has a resource
> > > >>> > profile is
> > > >>> > >> just a
> > > >>> > >>>>>> special case of this model
> > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > operator,
> > > >>> > like it
> > > >>> > >> does
> > > >>> > >>>>> now,
> > > >>> > >>>>>> and the scheduler sums up the profiles of the tasks that
> > > it
> > > >>> > >> schedules
> > > >>> > >>>>>> together.
> > > >>> > >>>>>>
> > > >>> > >>>>>>
> > > >>> > >>>>>> There is another question about reactive scaling raised
> > > in the
> > > >>> > >> FLIP. I
> > > >>> > >>>>> need
> > > >>> > >>>>>> to think a bit about that. That is indeed a bit more
> > > tricky
> > > >>> > once we
> > > >>> > >>>> have
> > > >>> > >>>>>> slots of different sizes.
> > > >>> > >>>>>> It is not clear then which of the different slot
> > requests
> > > the
> > > >>> > >>>>>> ResourceManager should fulfill when new resources (TMs)
> > > >>> > show up,
> > > >>> > >> or how
> > > >>> > >>>>> the
> > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > resources
> > > >>> > (TMs)
> > > >>> > >>>>> disappear
> > > >>> > >>>>>> This question is pretty orthogonal, though, to the "how
> > to
> > > >>> > specify
> > > >>> > >> the
> > > >>> > >>>>>> resources".
> > > >>> > >>>>>>
> > > >>> > >>>>>>
> > > >>> > >>>>>> Best,
> > > >>> > >>>>>> Stephan
> > > >>> > >>>>>>
> > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > >>> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>
> > > >>> > >>>>> wrote:
> > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > discussion,
> > > >>> > Yangze.
> > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> @Till,
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> I agree that specifying requirements for SSGs means
> > that
> > > SSGs
> > > >>> > >> need to
> > > >>> > >>>>> be
> > > >>> > >>>>>>> supported in fine-grained resource management,
> > otherwise
> > > each
> > > >>> > >>>> operator
> > > >>> > >>>>>>> might use as many resources as the whole group.
> > However,
> > > I
> > > >>> > cannot
> > > >>> > >>>> think
> > > >>> > >>>>>> of
> > > >>> > >>>>>>> a strong reason for not supporting SSGs in fine-grained
> > > >>> > resource
> > > >>> > >>>>>>> management.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>>
> > > >>> > >>>>>>>> Interestingly, if all operators have their resources
> > > properly
> > > >>> > >>>>>> specified,
> > > >>> > >>>>>>>> then slot sharing is no longer needed because Flink
> > > could
> > > >>> > >> slice off
> > > >>> > >>>>> the
> > > >>> > >>>>>>>> appropriately sized slots for every Task individually.
> > > >>> > >>>>>>>>
> > > >>> > >>>>>>> So for example, if we have a job consisting of two
> > > >>> > operator op_1
> > > >>> > >> and
> > > >>> > >>>>> op_2
> > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would then
> > say
> > > that
> > > >>> > >> the
> > > >>> > >>>> slot
> > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we
> > have
> > > a
> > > >>> > >> cluster
> > > >>> > >>>>> with
> > > >>> > >>>>>> 2
> > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the system
> > > cannot run
> > > >>> > >> this
> > > >>> > >>>>> job.
> > > >>> > >>>>>> If
> > > >>> > >>>>>>>> the resources were specified on an operator level,
> > then
> > > the
> > > >>> > >> system
> > > >>> > >>>>>> could
> > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > op_2
> > > to
> > > >>> > >> TM_2.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Couldn't agree more that if all operators' requirements
> > > are
> > > >>> > >> properly
> > > >>> > >>>>>>> specified, slot sharing should be no longer needed. I
> > > >>> > think this
> > > >>> > >>>>> exactly
> > > >>> > >>>>>>> disproves the example. If we already know op_1 and op_2
> > > each
> > > >>> > >> needs
> > > >>> > >>>> 100
> > > >>> > >>>>> MB
> > > >>> > >>>>>>> of memory, why would we put them in the same group? If
> > > >>> > they are
> > > >>> > >> in
> > > >>> > >>>>>> separate
> > > >>> > >>>>>>> groups, with the proposed approach the system can
> > freely
> > > >>> > deploy
> > > >>> > >> them
> > > >>> > >>>> to
> > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Moreover, the precondition for not needing slot sharing
> > > is
> > > >>> > having
> > > >>> > >>>>>> resource
> > > >>> > >>>>>>> requirements properly specified for all operators. This
> > > is not
> > > >>> > >> always
> > > >>> > >>>>>>> possible, and usually requires tremendous efforts. One
> > > of the
> > > >>> > >>>> benefits
> > > >>> > >>>>>> for
> > > >>> > >>>>>>> SSG-based requirements is that it allows the user to
> > > freely
> > > >>> > >> decide
> > > >>> > >>>> the
> > > >>> > >>>>>>> granularity, thus efforts they want to pay. I would
> > > >>> > consider SSG
> > > >>> > >> in
> > > >>> > >>>>>>> fine-grained resource management as a group of
> > operators
> > > >>> > that the
> > > >>> > >>>> user
> > > >>> > >>>>>>> would like to specify the total resource for. There can
> > > be
> > > >>> > only
> > > >>> > >> one
> > > >>> > >>>>> group
> > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a few
> > major
> > > >>> > parts,
> > > >>> > >> or as
> > > >>> > >>>>>> many
> > > >>> > >>>>>>> groups as the number of tasks/operators, depending on
> > how
> > > >>> > >>>> fine-grained
> > > >>> > >>>>>> the
> > > >>> > >>>>>>> user is able to specify the resources.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Having to support SSGs might be a constraint. But given
> > > >>> > that all
> > > >>> > >> the
> > > >>> > >>>>>>> current scheduler implementations already support
> > SSGs, I
> > > >>> > tend to
> > > >>> > >>>> think
> > > >>> > >>>>>>> that as an acceptable price for the above discussed
> > > >>> > usability and
> > > >>> > >>>>>>> flexibility.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> @Chesnay
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Will declaring them on slot sharing groups not also
> > waste
> > > >>> > >> resources
> > > >>> > >>>> if
> > > >>> > >>>>>> the
> > > >>> > >>>>>>>> parallelism of operators within that group are
> > > different?
> > > >>> > >>>>>>>>
> > > >>> > >>>>>>> Yes. It's a trade-off between usability and resource
> > > >>> > >> utilization. To
> > > >>> > >>>>>> avoid
> > > >>> > >>>>>>> such wasting, the user can define more groups, so that
> > > >>> > each group
> > > >>> > >>>>>> contains
> > > >>> > >>>>>>> less operators and the chance of having operators with
> > > >>> > different
> > > >>> > >>>>>>> parallelism will be reduced. The price is to have more
> > > >>> > resource
> > > >>> > >>>>>>> requirements to specify.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> It also seems like quite a hassle for users having to
> > > >>> > >> recalculate the
> > > >>> > >>>>>>>> resource requirements if they change the slot sharing.
> > > >>> > >>>>>>>> I'd think that it's not really workable for users that
> > > create
> > > >>> > >> a set
> > > >>> > >>>>> of
> > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > their
> > > >>> > >>>>> applications;
> > > >>> > >>>>>>>> managing the resources requirements in such a setting
> > > >>> > would be
> > > >>> > >> a
> > > >>> > >>>>>>>> nightmare, and in the end would require operator-level
> > > >>> > >> requirements
> > > >>> > >>>>> any
> > > >>> > >>>>>>>> way.
> > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > increases
> > > >>> > >>>>> usability.
> > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > there's no
> > > >>> > >> reason to
> > > >>> > >>>>> put
> > > >>> > >>>>>>> multiple operators whose individual resource
> > > >>> > requirements are
> > > >>> > >>>>> already
> > > >>> > >>>>>>> known
> > > >>> > >>>>>>> into the same group in fine-grained resource
> > > management.
> > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > multiple
> > > >>> > >>>>> applications,
> > > >>> > >>>>>>> it does not guarantee the same resource
> > requirements.
> > > >>> > During
> > > >>> > >> our
> > > >>> > >>>>> years
> > > >>> > >>>>>>> of
> > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > requirements
> > > >>> > >> specified for
> > > >>> > >>>>>>> Blink's
> > > >>> > >>>>>>> fine-grained resource management, very few users
> > > >>> > (including
> > > >>> > >> our
> > > >>> > >>>>>>> specialists
> > > >>> > >>>>>>> who are dedicated to supporting Blink users) are as
> > > >>> > >> experienced as
> > > >>> > >>>>> to
> > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > >>> > >> requirements.
> > > >>> > >>>> Most
> > > >>> > >>>>>>> people
> > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > delay, cpu
> > > >>> > >> load,
> > > >>> > >>>>>> memory
> > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > specification.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> To sum up:
> > > >>> > >>>>>>> If the user is capable of providing proper resource
> > > >>> > requirements
> > > >>> > >> for
> > > >>> > >>>>>> every
> > > >>> > >>>>>>> operator, that's definitely a good thing and we would
> > not
> > > >>> > need to
> > > >>> > >>>> rely
> > > >>> > >>>>> on
> > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> > > >>> > >> fine-grained
> > > >>> > >>>>>> resource
> > > >>> > >>>>>>> management to work. For those users who are capable and
> > > do not
> > > >>> > >> like
> > > >>> > >>>>>> having
> > > >>> > >>>>>>> to set each operator to a separate SSG, I would be ok
> > to
> > > have
> > > >>> > >> both
> > > >>> > >>>>>>> SSG-based and operator-based runtime interfaces and to
> > > only
> > > >>> > >> fallback
> > > >>> > >>>> to
> > > >>> > >>>>>> the
> > > >>> > >>>>>>> SSG requirements when the operator requirements are not
> > > >>> > >> specified.
> > > >>> > >>>>>> However,
> > > >>> > >>>>>>> as the first step, I think we should prioritise the use
> > > cases
> > > >>> > >> where
> > > >>> > >>>>> users
> > > >>> > >>>>>>> are not that experienced.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Thank you~
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Xintong Song
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > >>> > >> ches...@apache.org <mailto:ches...@apache.org>>
> > > >>> > >>>>>>> wrote:
> > > >>> > >>>>>>>
> > > >>> > >>>>>>>> Will declaring them on slot sharing groups not also
> > > waste
> > > >>> > >> resources
> > > >>> > >>>>> if
> > > >>> > >>>>>>>> the parallelism of operators within that group are
> > > different?
> > > >>> > >>>>>>>>
> > > >>> > >>>>>>>> It also seems like quite a hassle for users having to
> > > >>> > >> recalculate
> > > >>> > >>>> the
> > > >>> > >>>>>>>> resource requirements if they change the slot sharing.
> > > >>> > >>>>>>>> I'd think that it's not really workable for users that
> > > create
> > > >>> > >> a set
> > > >>> > >>>>> of
> > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > their
> > > >>> > >>>>> applications;
> > > >>> > >>>>>>>> managing the resources requirements in such a setting
> > > >>> > would be
> > > >>> > >> a
> > > >>> > >>>>>>>> nightmare, and in the end would require operator-level
> > > >>> > >> requirements
> > > >>> > >>>>> any
> > > >>> > >>>>>>>> way.
> > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > increases
> > > >>> > >>>>> usability.
> > > >>> > >>>>>>>> My main worry is that it if we wire the runtime to
> > work
> > > >>> > on SSGs
> > > >>> > >>>> it's
> > > >>> > >>>>>>>> gonna be difficult to implement more fine-grained
> > > approaches,
> > > >>> > >> which
> > > >>> > >>>>>>>> would not be the case if, for the runtime, they are
> > > always
> > > >>> > >> defined
> > > >>> > >>>> on
> > > >>> > >>>>>> an
> > > >>> > >>>>>>>> operator-level.
> > > >>> > >>>>>>>>
> > > >>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > >>> > >>>>>>>>> Thanks for drafting this FLIP and starting this
> > > discussion
> > > >>> > >>>> Yangze.
> > > >>> > >>>>>>>>> I like that defining resource requirements on a slot
> > > sharing
> > > >>> > >>>> group
> > > >>> > >>>>>>> makes
> > > >>> > >>>>>>>>> the overall setup easier and improves usability of
> > > resource
> > > >>> > >>>>>>> requirements.
> > > >>> > >>>>>>>>> What I do not like about it is that it changes slot
> > > sharing
> > > >>> > >>>> groups
> > > >>> > >>>>>> from
> > > >>> > >>>>>>>>> being a scheduling hint to something which needs to
> > be
> > > >>> > >> supported
> > > >>> > >>>> in
> > > >>> > >>>>>>> order
> > > >>> > >>>>>>>>> to support fine grained resource requirements. So
> > far,
> > > the
> > > >>> > >> idea
> > > >>> > >>>> of
> > > >>> > >>>>>> slot
> > > >>> > >>>>>>>>> sharing groups was that it tells the system that a
> > set
> > > of
> > > >>> > >>>> operators
> > > >>> > >>>>>> can
> > > >>> > >>>>>>>> be
> > > >>> > >>>>>>>>> deployed in the same slot. But the system still had
> > the
> > > >>> > >> freedom
> > > >>> > >>>> to
> > > >>> > >>>>>> say
> > > >>> > >>>>>>>> that
> > > >>> > >>>>>>>>> it would rather place these tasks in different slots
> > > if it
> > > >>> > >>>> wanted.
> > > >>> > >>>>> If
> > > >>> > >>>>>>> we
> > > >>> > >>>>>>>>> now specify resource requirements on a per slot
> > sharing
> > > >>> > >> group,
> > > >>> > >>>> then
> > > >>> > >>>>>> the
> > > >>> > >>>>>>>>> only option for a scheduler which does not support
> > slot
> > > >>> > >> sharing
> > > >>> > >>>>>> groups
> > > >>> > >>>>>>> is
> > > >>> > >>>>>>>>> to say that every operator in this slot sharing group
> > > >>> > needs a
> > > >>> > >>>> slot
> > > >>> > >>>>>> with
> > > >>> > >>>>>>>> the
> > > >>> > >>>>>>>>> same resources as the whole group.
> > > >>> > >>>>>>>>>
> > > >>> > >>>>>>>>> So for example, if we have a job consisting of two
> > > operator
> > > >>> > >> op_1
> > > >>> > >>>>> and
> > > >>> > >>>>>>> op_2
> > > >>> > >>>>>>>>> where each op needs 100 MB of memory, we would then
> > > say that
> > > >>> > >> the
> > > >>> > >>>>> slot
> > > >>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > have a
> > > >>> > >> cluster
> > > >>> > >>>>>> with
> > > >>> > >>>>>>> 2
> > > >>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > > cannot run
> > > >>> > >> this
> > > >>> > >>>>>> job.
> > > >>> > >>>>>>> If
> > > >>> > >>>>>>>>> the resources were specified on an operator level,
> > > then the
> > > >>> > >>>> system
> > > >>> > >>>>>>> could
> > > >>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > op_2 to
> > > >>> > >> TM_2.
> > > >>> > >>>>>>>>> Originally, one of the primary goals of slot sharing
> > > groups
> > > >>> > >> was
> > > >>> > >>>> to
> > > >>> > >>>>>> make
> > > >>> > >>>>>>>> it
> > > >>> > >>>>>>>>> easier for the user to reason about how many slots a
> > > job
> > > >>> > >> needs
> > > >>> > >>>>>>>> independent
> > > >>> > >>>>>>>>> of the actual number of operators in the job.
> > > Interestingly,
> > > >>> > >> if
> > > >>> > >>>> all
> > > >>> > >>>>>>>>> operators have their resources properly specified,
> > > then slot
> > > >>> > >>>>> sharing
> > > >>> > >>>>>> is
> > > >>> > >>>>>>>> no
> > > >>> > >>>>>>>>> longer needed because Flink could slice off the
> > > >>> > appropriately
> > > >>> > >>>> sized
> > > >>> > >>>>>>> slots
> > > >>> > >>>>>>>>> for every Task individually. What matters is whether
> > > the
> > > >>> > >> whole
> > > >>> > >>>>>> cluster
> > > >>> > >>>>>>>> has
> > > >>> > >>>>>>>>> enough resources to run all tasks or not.
> > > >>> > >>>>>>>>>
> > > >>> > >>>>>>>>> Cheers,
> > > >>> > >>>>>>>>> Till
> > > >>> > >>>>>>>>>
> > > >>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > >>> > >> karma...@gmail.com <mailto:karma...@gmail.com>>
> > > >>> > >>>>>> wrote:
> > > >>> > >>>>>>>>>> Hi, there,
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>>> We would like to start a discussion thread on
> > > "FLIP-156:
> > > >>> > >> Runtime
> > > >>> > >>>>>>>>>> Interfaces for Fine-Grained Resource
> > Requirements"[1],
> > > >>> > >> where we
> > > >>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > > interfaces
> > > >>> > >> for
> > > >>> > >>>>>>>>>> specifying fine-grained resource requirements.
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>>> In this FLIP:
> > > >>> > >>>>>>>>>> - Expound the user story of fine-grained resource
> > > >>> > >> management.
> > > >>> > >>>>>>>>>> - Propose runtime interfaces for specifying
> > SSG-based
> > > >>> > >> resource
> > > >>> > >>>>>>>>>> requirements.
> > > >>> > >>>>>>>>>> - Discuss the pros and cons of the three potential
> > > >>> > >> granularities
> > > >>> > >>>>> for
> > > >>> > >>>>>>>>>> specifying the resource requirements (op, task and
> > > slot
> > > >>> > >> sharing
> > > >>> > >>>>>> group)
> > > >>> > >>>>>>>>>> and explain why we choose the slot sharing group.
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>>> Please find more details in the FLIP wiki document
> > > [1].
> > > >>> > >> Looking
> > > >>> > >>>>>>>>>> forward to your feedback.
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>>> [1]
> > > >>> > >>>>>>>>>>
> > > >>> > >>
> > > >>> >
> > >
> >
https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > >>> > <
> > >
> >
https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > >
> > > >>> > >>>>>>>>>> Best,
> > > >>> > >>>>>>>>>> Yangze Guo
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>
> > > >>> >
> > > >>>
> > >
> >

Reply via email to