FGRuntimeInterface.png <https://drive.google.com/file/d/13nYCLBd1HjdfYVWUjxzdNa8wo5n2GKbi/view?usp=drive_web>
Thank you~ Xintong Song On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <tonysong...@gmail.com> wrote: > I think Chesnay's proposal could actually work. IIUC, the keypoint is to > derive operator requirements from SSG requirements on the API side, so that > the runtime only deals with operator requirements. It's debatable how the > deriving should be done though. E.g., an alternative could be to evenly > divide the SSG requirement into requirements of operators in the group. > > > However, I'm not entirely sure which option is more desired. Illustrating > my understanding in the following figure, in which on the top is > Chesnay's proposal and on the bottom is the SSG-based proposal in this FLIP. > > > [image: FGRuntimeInterface.png] > > > I think the major difference between the two approaches is where deriving > operator requirements from SSG requirements happens. > > - Chesnay's proposal simplifies the runtime logic and the interface to > expose, at the price of moving more complexity (i.e. the deriving) to the > API side. The question is, where do we prefer to keep the complexity? I'm > slightly leaning towards having a thin API and keep the complexity in > runtime if possible. > > - Notice that the dash line arrows represent optional steps that are > needed only for schedulers that do not respect SSGs, which we don't have at > the moment. If we only look at the solid line arrows, then the SSG-based > approach is much simpler, without needing to derive and aggregate the > requirements back and forth. I'm not sure about complicating the current > design only for the potential future needs. > > > Thank you~ > > Xintong Song > > > > > On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <ches...@apache.org> > wrote: > >> You're raising a good point, but I think I can rectify that with a minor >> adjustment. >> >> Default requirements are whatever the default requirements are, setting >> the requirements for one operator has no effect on other operators. >> >> With these rules, and some API enhancements, the following mockup would >> replicate the SSG-based behavior: >> >> Map<SlotSharingGroupId, Requirements> requirements = ... >> for slotSharingGroup in env.getSlotSharingGroups() { >> vertices = slotSharingGroup.getVertices() >> >> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID()) >> vertices.remainint().setRequirements(ZERO) >> } >> >> We could even allow setting requirements on slotsharing-groups >> colocation-groups and internally translate them accordingly. >> I can't help but feel this is a plain API issue. >> >> On 1/21/2021 9:44 AM, Till Rohrmann wrote: >> > If I understand you correctly Chesnay, then you want to decouple the >> > resource requirement specification from the slot sharing group >> > assignment. Hence, per default all operators would be in the same slot >> > sharing group. If there is no operator with a resource specification, >> > then the system would allocate a default slot for it. If there is at >> > least one operator, then the system would sum up all the specified >> > resources and allocate a slot of this size. This effectively means >> > that all unspecified operators will implicitly have a zero resource >> > requirement. Did I understand your idea correctly? >> > >> > I am wondering whether this wouldn't lead to a surprising behaviour >> > for the user. If the user specifies the resource requirements for a >> > single operator, then he probably will assume that the other operators >> > will get the default share of resources and not nothing. >> > >> > Cheers, >> > Till >> > >> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <ches...@apache.org >> > <mailto:ches...@apache.org>> wrote: >> > >> > Is there even a functional difference between specifying the >> > requirements for an SSG vs specifying the same requirements on a >> > single >> > operator within that group (ideally a colocation group to avoid this >> > whole hint business)? >> > >> > Wouldn't we get the best of both worlds in the latter case? >> > >> > Users can take shortcuts to define shared requirements, >> > but refine them further as needed on a per-operator basis, >> > without changing semantics of slotsharing groups >> > nor the runtime being locked into SSG-based requirements. >> > >> > (And before anyone argues what happens if slotsharing groups >> > change or >> > whatnot, that's a plain API issue that we could surely solve. (A >> > plain >> > iteration over slotsharing groups and therein contained operators >> > would >> > suffice)). >> > >> > On 1/20/2021 6:48 PM, Till Rohrmann wrote: >> > > Maybe a different minor idea: Would it be possible to treat the >> SSG >> > > resource requirements as a hint for the runtime similar to how >> > slot sharing >> > > groups are designed at the moment? Meaning that we don't give >> > the guarantee >> > > that Flink will always deploy this set of tasks together no >> > matter what >> > > comes. If, for example, the runtime can derive by some means the >> > resource >> > > requirements for each task based on the requirements for the >> > SSG, this >> > > could be possible. One easy strategy would be to give every task >> > the same >> > > resources as the whole slot sharing group. Another one could be >> > > distributing the resources equally among the tasks. This does >> > not even have >> > > to be implemented but we would give ourselves the freedom to >> change >> > > scheduling if need should arise. >> > > >> > > Cheers, >> > > Till >> > > >> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <karma...@gmail.com >> > <mailto:karma...@gmail.com>> wrote: >> > > >> > >> Thanks for the responses, Till and Xintong. >> > >> >> > >> I second Xintong's comment that SSG-based runtime interface >> > will give >> > >> us the flexibility to achieve op/task-based approach. That's one >> of >> > >> the most important reasons for our design choice. >> > >> >> > >> Some cents regarding the default operator resource: >> > >> - It might be good for the scenario of DataStream jobs. >> > >> ** For light-weight operators, the accumulative >> > configuration error >> > >> will not be significant. Then, the resource of a task used is >> > >> proportional to the number of operators it contains. >> > >> ** For heavy operators like join and window or operators >> > using the >> > >> external resources, user will turn to the fine-grained resource >> > >> configuration. >> > >> - It can increase the stability for the standalone cluster >> > where task >> > >> executors registered are heterogeneous(with different default >> slot >> > >> resources). >> > >> - It might not be good for SQL users. The operators that SQL >> > will be >> > >> transferred to is a black box to the user. We also do not >> guarantee >> > >> the cross-version of consistency of the transformation so far. >> > >> >> > >> I think it can be treated as a follow-up work when the >> fine-grained >> > >> resource management is end-to-end ready. >> > >> >> > >> Best, >> > >> Yangze Guo >> > >> >> > >> >> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song >> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>> >> > >> wrote: >> > >>> Thanks for the feedback, Till. >> > >>> >> > >>> ## I feel that what you proposed (operator-based + default >> > value) might >> > >> be >> > >>> subsumed by the SSG-based approach. >> > >>> Thinking of op_1 -> op_2, there are the following 4 cases, >> > categorized by >> > >>> whether the resource requirements are known to the users. >> > >>> >> > >>> 1. *Both known.* As previously mentioned, there's no >> > reason to put >> > >>> multiple operators whose individual resource requirements >> > are already >> > >> known >> > >>> into the same group in fine-grained resource management. >> > And if op_1 >> > >> and >> > >>> op_2 are in different groups, there should be no problem >> > switching >> > >> data >> > >>> exchange mode from pipelined to blocking. This is >> > equivalent to >> > >> specifying >> > >>> operator resource requirements in your proposal. >> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except that >> > op_2 is in a >> > >>> SSG whose resource is not specified thus would have the >> > default slot >> > >>> resource. This is equivalent to having default operator >> > resources in >> > >> your >> > >>> proposal. >> > >>> 3. *Both unknown*. The user can either set op_1 and op_2 >> > to the same >> > >> SSG >> > >>> or separate SSGs. >> > >>> - If op_1 and op_2 are in the same SSG, it will be >> > equivalent to >> > >> the >> > >>> coarse-grained resource management, where op_1 and op_2 >> > share a >> > >> default >> > >>> size slot no matter which data exchange mode is used. >> > >>> - If op_1 and op_2 are in different SSGs, then each of >> > them will >> > >> use >> > >>> a default size slot. This is equivalent to setting them >> > with >> > >> default >> > >>> operator resources in your proposal. >> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2 is >> > known.* >> > >>> - It is possible that the user learns the total / max >> > resource >> > >>> requirement from executing and monitoring the job, >> > while not >> > >>> being aware of >> > >>> individual operator requirements. >> > >>> - I believe this is the case your proposal does not >> > cover. And TBH, >> > >>> this is probably how most users learn the resource >> > requirements, >> > >>> according >> > >>> to my experiences. >> > >>> - In this case, the user might need to specify >> > different resources >> > >> if >> > >>> he wants to switch the execution mode, which should not >> > be worse >> > >> than not >> > >>> being able to use fine-grained resource management. >> > >>> >> > >>> >> > >>> ## An additional idea inspired by your proposal. >> > >>> We may provide multiple options for deciding resources for >> > SSGs whose >> > >>> requirement is not specified, if needed. >> > >>> >> > >>> - Default slot resource (current design) >> > >>> - Default operator resource times number of operators >> > (equivalent to >> > >>> your proposal) >> > >>> >> > >>> >> > >>> ## Exposing internal runtime strategies >> > >>> Theoretically, yes. Tying to the SSGs, the resource >> > requirements might be >> > >>> affected if how SSGs are internally handled changes in future. >> > >> Practically, >> > >>> I do not concretely see at the moment what kind of changes we >> > may want in >> > >>> future that might conflict with this FLIP proposal, as the >> > question of >> > >>> switching data exchange mode answered above. I'd suggest to >> > not give up >> > >> the >> > >>> user friendliness we may gain now for the future problems that >> > may or may >> > >>> not exist. >> > >>> >> > >>> Moreover, the SSG-based approach has the flexibility to >> > achieve the >> > >>> equivalent behavior as the operator-based approach, if we set >> each >> > >> operator >> > >>> (or task) to a separate SSG. We can even provide a shortcut >> > option to >> > >>> automatically do that for users, if needed. >> > >>> >> > >>> >> > >>> Thank you~ >> > >>> >> > >>> Xintong Song >> > >>> >> > >>> >> > >>> >> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann >> > <trohrm...@apache.org <mailto:trohrm...@apache.org>> >> > >> wrote: >> > >>>> Thanks for the responses Xintong and Stephan, >> > >>>> >> > >>>> I agree that being able to define the resource requirements >> for a >> > >> group of >> > >>>> operators is more user friendly. However, my concern is that >> > we are >> > >>>> exposing thereby internal runtime strategies which might >> > limit our >> > >>>> flexibility to execute a given job. Moreover, the semantics of >> > >> configuring >> > >>>> resource requirements for SSGs could break if switching from >> > streaming >> > >> to >> > >>>> batch execution. If one defines the resource requirements for >> > op_1 -> >> > >> op_2 >> > >>>> which run in pipelined mode when using the streaming >> > execution, then >> > >> how do >> > >>>> we interpret these requirements when op_1 -> op_2 are >> > executed with a >> > >>>> blocking data exchange in batch execution mode? Consequently, >> > I am >> > >> still >> > >>>> leaning towards Stephan's proposal to set the resource >> > requirements per >> > >>>> operator. >> > >>>> >> > >>>> Maybe the following proposal makes the configuration easier: >> > If the >> > >> user >> > >>>> wants to use fine-grained resource requirements, then she >> > needs to >> > >> specify >> > >>>> the default size which is used for operators which have no >> > explicit >> > >>>> resource annotation. If this holds true, then every operator >> > would >> > >> have a >> > >>>> resource requirement and the system can try to execute the >> > operators >> > >> in the >> > >>>> best possible manner w/o being constrained by how the user >> > set the SSG >> > >>>> requirements. >> > >>>> >> > >>>> Cheers, >> > >>>> Till >> > >>>> >> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song >> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com>> >> > >>>> wrote: >> > >>>> >> > >>>>> Thanks for the feedback, Stephan. >> > >>>>> >> > >>>>> Actually, your proposal has also come to my mind at some >> > point. And I >> > >>>> have >> > >>>>> some concerns about it. >> > >>>>> >> > >>>>> >> > >>>>> 1. It does not give users the same control as the SSG-based >> > approach. >> > >>>>> >> > >>>>> >> > >>>>> While both approaches do not require specifying for each >> > operator, >> > >>>>> SSG-based approach supports the semantic that "some operators >> > >> together >> > >>>> use >> > >>>>> this much resource" while the operator-based approach doesn't. >> > >>>>> >> > >>>>> >> > >>>>> Think of a long pipeline with m operators (o_1, o_2, ..., >> > o_m), and >> > >> at >> > >>>> some >> > >>>>> point there's an agg o_n (1 < n < m) which significantly >> > reduces the >> > >> data >> > >>>>> amount. One can separate the pipeline into 2 groups SSG_1 >> > (o_1, ..., >> > >> o_n) >> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher >> > >> parallelisms >> > >>>>> for operators in SSG_1 than for operators in SSG_2 won't >> > lead to too >> > >> much >> > >>>>> wasting of resources. If the two SSGs end up needing different >> > >> resources, >> > >>>>> with the SSG-based approach one can directly specify >> > resources for >> > >> the >> > >>>> two >> > >>>>> groups. However, with the operator-based approach, the user >> will >> > >> have to >> > >>>>> specify resources for each operator in one of the two >> > groups, and >> > >> tune >> > >>>> the >> > >>>>> default slot resource via configurations to fit the other >> group. >> > >>>>> >> > >>>>> >> > >>>>> 2. It increases the chance of breaking operator chains. >> > >>>>> >> > >>>>> >> > >>>>> Setting chainnable operators into different slot sharing >> > groups will >> > >>>>> prevent them from being chained. In the current >> implementation, >> > >>>> downstream >> > >>>>> operators, if SSG not explicitly specified, will be set to >> > the same >> > >> group >> > >>>>> as the chainable upstream operators (unless multiple upstream >> > >> operators >> > >>>> in >> > >>>>> different groups), to reduce the chance of breaking chains. >> > >>>>> >> > >>>>> >> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, >> > deciding >> > >> SSGs >> > >>>>> based on whether resource is specified we will easily get >> > groups like >> > >>>> (o_1, >> > >>>>> o_3) & (o_2, o_4), where none of the operators can be >> > chained. This >> > >> is >> > >>>> also >> > >>>>> possible for the SSG-based approach, but I believe the >> > chance is much >> > >>>>> smaller because there's no strong reason for users to >> > specify the >> > >> groups >> > >>>>> with alternate operators like that. We are more likely to >> > get groups >> > >> like >> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between >> > o_2 and >> > >> o_3. >> > >>>>> >> > >>>>> 3. It complicates the system by having two different >> > mechanisms for >> > >>>> sharing >> > >>>>> managed memory in a slot. >> > >>>>> >> > >>>>> >> > >>>>> - In FLIP-141, we introduced the intra-slot managed memory >> > sharing >> > >>>>> mechanism, where managed memory is first distributed >> > according to the >> > >>>>> consumer type, then further distributed across operators of >> that >> > >> consumer >> > >>>>> type. >> > >>>>> >> > >>>>> - With the operator-based approach, managed memory size >> > specified >> > >> for an >> > >>>>> operator should account for all the consumer types of that >> > operator. >> > >> That >> > >>>>> means the managed memory is first distributed across >> > operators, then >> > >>>>> distributed to different consumer types of each operator. >> > >>>>> >> > >>>>> >> > >>>>> Unfortunately, the different order of the two calculation >> > steps can >> > >> lead >> > >>>> to >> > >>>>> different results. To be specific, the semantic of the >> > configuration >> > >>>> option >> > >>>>> `consumer-weights` changed (within a slot vs. within an >> > operator). >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> To sum up things: >> > >>>>> >> > >>>>> While (3) might be a bit more implementation related, I >> > think (1) >> > >> and (2) >> > >>>>> somehow suggest that, the price for the proposed approach to >> > avoid >> > >>>>> specifying resource for every operator is that it's not as >> > >> independent >> > >>>> from >> > >>>>> operator chaining and slot sharing as the operator-based >> > approach >> > >>>> discussed >> > >>>>> in the FLIP. >> > >>>>> >> > >>>>> >> > >>>>> Thank you~ >> > >>>>> >> > >>>>> Xintong Song >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen >> > <se...@apache.org <mailto:se...@apache.org>> >> > >> wrote: >> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP. >> > >>>>>> >> > >>>>>> I want to say, first of all, that this is super well >> > written. And >> > >> the >> > >>>>>> points that the FLIP makes about how to expose the >> > configuration to >> > >>>> users >> > >>>>>> is exactly the right thing to figure out first. >> > >>>>>> So good job here! >> > >>>>>> >> > >>>>>> About how to let users specify the resource profiles. If I >> > can sum >> > >> the >> > >>>>> FLIP >> > >>>>>> and previous discussion up in my own words, the problem is >> the >> > >>>> following: >> > >>>>>> Operator-level specification is the simplest and cleanest >> > approach, >> > >>>>> because >> > >>>>>>> it avoids mixing operator configuration (resource) and >> > >> scheduling. No >> > >>>>>>> matter what other parameters change (chaining, slot sharing, >> > >>>> switching >> > >>>>>>> pipelined and blocking shuffles), the resource profiles >> > stay the >> > >>>> same. >> > >>>>>>> But it would require that a user specifies resources on all >> > >>>> operators, >> > >>>>>>> which makes it hard to use. That's why the FLIP suggests >> going >> > >> with >> > >>>>>>> specifying resources on a Sharing-Group. >> > >>>>>> >> > >>>>>> I think both thoughts are important, so can we find a >> solution >> > >> where >> > >>>> the >> > >>>>>> Resource Profiles are specified on an Operator, but we >> > still avoid >> > >> that >> > >>>>> we >> > >>>>>> need to specify a resource profile on every operator? >> > >>>>>> >> > >>>>>> What do you think about something like the following: >> > >>>>>> - Resource Profiles are specified on an operator level. >> > >>>>>> - Not all operators need profiles >> > >>>>>> - All Operators without a Resource Profile ended up in the >> > >> default >> > >>>> slot >> > >>>>>> sharing group with a default profile (will get a default >> slot). >> > >>>>>> - All Operators with a Resource Profile will go into >> > another slot >> > >>>>> sharing >> > >>>>>> group (the resource-specified-group). >> > >>>>>> - Users can define different slot sharing groups for >> > operators >> > >> like >> > >>>>> they >> > >>>>>> do now, with the exception that you cannot mix operators >> > that have >> > >> a >> > >>>>>> resource profile and operators that have no resource profile. >> > >>>>>> - The default case where no operator has a resource >> > profile is >> > >> just a >> > >>>>>> special case of this model >> > >>>>>> - The chaining logic sums up the profiles per operator, >> > like it >> > >> does >> > >>>>> now, >> > >>>>>> and the scheduler sums up the profiles of the tasks that it >> > >> schedules >> > >>>>>> together. >> > >>>>>> >> > >>>>>> >> > >>>>>> There is another question about reactive scaling raised in >> the >> > >> FLIP. I >> > >>>>> need >> > >>>>>> to think a bit about that. That is indeed a bit more tricky >> > once we >> > >>>> have >> > >>>>>> slots of different sizes. >> > >>>>>> It is not clear then which of the different slot requests the >> > >>>>>> ResourceManager should fulfill when new resources (TMs) >> > show up, >> > >> or how >> > >>>>> the >> > >>>>>> JobManager redistributes the slots resources when resources >> > (TMs) >> > >>>>> disappear >> > >>>>>> This question is pretty orthogonal, though, to the "how to >> > specify >> > >> the >> > >>>>>> resources". >> > >>>>>> >> > >>>>>> >> > >>>>>> Best, >> > >>>>>> Stephan >> > >>>>>> >> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song >> > <tonysong...@gmail.com <mailto:tonysong...@gmail.com> >> > >>>>> wrote: >> > >>>>>>> Thanks for drafting the FLIP and driving the discussion, >> > Yangze. >> > >>>>>>> And Thanks for the feedback, Till and Chesnay. >> > >>>>>>> >> > >>>>>>> @Till, >> > >>>>>>> >> > >>>>>>> I agree that specifying requirements for SSGs means that >> SSGs >> > >> need to >> > >>>>> be >> > >>>>>>> supported in fine-grained resource management, otherwise >> each >> > >>>> operator >> > >>>>>>> might use as many resources as the whole group. However, I >> > cannot >> > >>>> think >> > >>>>>> of >> > >>>>>>> a strong reason for not supporting SSGs in fine-grained >> > resource >> > >>>>>>> management. >> > >>>>>>> >> > >>>>>>> >> > >>>>>>>> Interestingly, if all operators have their resources >> properly >> > >>>>>> specified, >> > >>>>>>>> then slot sharing is no longer needed because Flink could >> > >> slice off >> > >>>>> the >> > >>>>>>>> appropriately sized slots for every Task individually. >> > >>>>>>>> >> > >>>>>>> So for example, if we have a job consisting of two >> > operator op_1 >> > >> and >> > >>>>> op_2 >> > >>>>>>>> where each op needs 100 MB of memory, we would then say >> that >> > >> the >> > >>>> slot >> > >>>>>>>> sharing group needs 200 MB of memory to run. If we have a >> > >> cluster >> > >>>>> with >> > >>>>>> 2 >> > >>>>>>>> TMs with one slot of 100 MB each, then the system cannot >> run >> > >> this >> > >>>>> job. >> > >>>>>> If >> > >>>>>>>> the resources were specified on an operator level, then the >> > >> system >> > >>>>>> could >> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to >> > >> TM_2. >> > >>>>>>> >> > >>>>>>> Couldn't agree more that if all operators' requirements are >> > >> properly >> > >>>>>>> specified, slot sharing should be no longer needed. I >> > think this >> > >>>>> exactly >> > >>>>>>> disproves the example. If we already know op_1 and op_2 each >> > >> needs >> > >>>> 100 >> > >>>>> MB >> > >>>>>>> of memory, why would we put them in the same group? If >> > they are >> > >> in >> > >>>>>> separate >> > >>>>>>> groups, with the proposed approach the system can freely >> > deploy >> > >> them >> > >>>> to >> > >>>>>>> either a 200 MB TM or two 100 MB TMs. >> > >>>>>>> >> > >>>>>>> Moreover, the precondition for not needing slot sharing is >> > having >> > >>>>>> resource >> > >>>>>>> requirements properly specified for all operators. This is >> not >> > >> always >> > >>>>>>> possible, and usually requires tremendous efforts. One of >> the >> > >>>> benefits >> > >>>>>> for >> > >>>>>>> SSG-based requirements is that it allows the user to freely >> > >> decide >> > >>>> the >> > >>>>>>> granularity, thus efforts they want to pay. I would >> > consider SSG >> > >> in >> > >>>>>>> fine-grained resource management as a group of operators >> > that the >> > >>>> user >> > >>>>>>> would like to specify the total resource for. There can be >> > only >> > >> one >> > >>>>> group >> > >>>>>>> in the job, 2~3 groups dividing the job into a few major >> > parts, >> > >> or as >> > >>>>>> many >> > >>>>>>> groups as the number of tasks/operators, depending on how >> > >>>> fine-grained >> > >>>>>> the >> > >>>>>>> user is able to specify the resources. >> > >>>>>>> >> > >>>>>>> Having to support SSGs might be a constraint. But given >> > that all >> > >> the >> > >>>>>>> current scheduler implementations already support SSGs, I >> > tend to >> > >>>> think >> > >>>>>>> that as an acceptable price for the above discussed >> > usability and >> > >>>>>>> flexibility. >> > >>>>>>> >> > >>>>>>> @Chesnay >> > >>>>>>> >> > >>>>>>> Will declaring them on slot sharing groups not also waste >> > >> resources >> > >>>> if >> > >>>>>> the >> > >>>>>>>> parallelism of operators within that group are different? >> > >>>>>>>> >> > >>>>>>> Yes. It's a trade-off between usability and resource >> > >> utilization. To >> > >>>>>> avoid >> > >>>>>>> such wasting, the user can define more groups, so that >> > each group >> > >>>>>> contains >> > >>>>>>> less operators and the chance of having operators with >> > different >> > >>>>>>> parallelism will be reduced. The price is to have more >> > resource >> > >>>>>>> requirements to specify. >> > >>>>>>> >> > >>>>>>> It also seems like quite a hassle for users having to >> > >> recalculate the >> > >>>>>>>> resource requirements if they change the slot sharing. >> > >>>>>>>> I'd think that it's not really workable for users that >> create >> > >> a set >> > >>>>> of >> > >>>>>>>> re-usable operators which are mixed and matched in their >> > >>>>> applications; >> > >>>>>>>> managing the resources requirements in such a setting >> > would be >> > >> a >> > >>>>>>>> nightmare, and in the end would require operator-level >> > >> requirements >> > >>>>> any >> > >>>>>>>> way. >> > >>>>>>>> In that sense, I'm not even sure whether it really >> increases >> > >>>>> usability. >> > >>>>>>> - As mentioned in my reply to Till's comment, there's no >> > >> reason to >> > >>>>> put >> > >>>>>>> multiple operators whose individual resource >> > requirements are >> > >>>>> already >> > >>>>>>> known >> > >>>>>>> into the same group in fine-grained resource management. >> > >>>>>>> - Even an operator implementation is reused for multiple >> > >>>>> applications, >> > >>>>>>> it does not guarantee the same resource requirements. >> > During >> > >> our >> > >>>>> years >> > >>>>>>> of >> > >>>>>>> practices in Alibaba, with per-operator requirements >> > >> specified for >> > >>>>>>> Blink's >> > >>>>>>> fine-grained resource management, very few users >> > (including >> > >> our >> > >>>>>>> specialists >> > >>>>>>> who are dedicated to supporting Blink users) are as >> > >> experienced as >> > >>>>> to >> > >>>>>>> accurately predict/estimate the operator resource >> > >> requirements. >> > >>>> Most >> > >>>>>>> people >> > >>>>>>> rely on the execution-time metrics (throughput, delay, >> cpu >> > >> load, >> > >>>>>> memory >> > >>>>>>> usage, GC pressure, etc.) to improve the specification. >> > >>>>>>> >> > >>>>>>> To sum up: >> > >>>>>>> If the user is capable of providing proper resource >> > requirements >> > >> for >> > >>>>>> every >> > >>>>>>> operator, that's definitely a good thing and we would not >> > need to >> > >>>> rely >> > >>>>> on >> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the >> > >> fine-grained >> > >>>>>> resource >> > >>>>>>> management to work. For those users who are capable and do >> not >> > >> like >> > >>>>>> having >> > >>>>>>> to set each operator to a separate SSG, I would be ok to >> have >> > >> both >> > >>>>>>> SSG-based and operator-based runtime interfaces and to only >> > >> fallback >> > >>>> to >> > >>>>>> the >> > >>>>>>> SSG requirements when the operator requirements are not >> > >> specified. >> > >>>>>> However, >> > >>>>>>> as the first step, I think we should prioritise the use >> cases >> > >> where >> > >>>>> users >> > >>>>>>> are not that experienced. >> > >>>>>>> >> > >>>>>>> Thank you~ >> > >>>>>>> >> > >>>>>>> Xintong Song >> > >>>>>>> >> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler < >> > >> ches...@apache.org <mailto:ches...@apache.org>> >> > >>>>>>> wrote: >> > >>>>>>> >> > >>>>>>>> Will declaring them on slot sharing groups not also waste >> > >> resources >> > >>>>> if >> > >>>>>>>> the parallelism of operators within that group are >> different? >> > >>>>>>>> >> > >>>>>>>> It also seems like quite a hassle for users having to >> > >> recalculate >> > >>>> the >> > >>>>>>>> resource requirements if they change the slot sharing. >> > >>>>>>>> I'd think that it's not really workable for users that >> create >> > >> a set >> > >>>>> of >> > >>>>>>>> re-usable operators which are mixed and matched in their >> > >>>>> applications; >> > >>>>>>>> managing the resources requirements in such a setting >> > would be >> > >> a >> > >>>>>>>> nightmare, and in the end would require operator-level >> > >> requirements >> > >>>>> any >> > >>>>>>>> way. >> > >>>>>>>> In that sense, I'm not even sure whether it really >> increases >> > >>>>> usability. >> > >>>>>>>> My main worry is that it if we wire the runtime to work >> > on SSGs >> > >>>> it's >> > >>>>>>>> gonna be difficult to implement more fine-grained >> approaches, >> > >> which >> > >>>>>>>> would not be the case if, for the runtime, they are always >> > >> defined >> > >>>> on >> > >>>>>> an >> > >>>>>>>> operator-level. >> > >>>>>>>> >> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote: >> > >>>>>>>>> Thanks for drafting this FLIP and starting this discussion >> > >>>> Yangze. >> > >>>>>>>>> I like that defining resource requirements on a slot >> sharing >> > >>>> group >> > >>>>>>> makes >> > >>>>>>>>> the overall setup easier and improves usability of >> resource >> > >>>>>>> requirements. >> > >>>>>>>>> What I do not like about it is that it changes slot >> sharing >> > >>>> groups >> > >>>>>> from >> > >>>>>>>>> being a scheduling hint to something which needs to be >> > >> supported >> > >>>> in >> > >>>>>>> order >> > >>>>>>>>> to support fine grained resource requirements. So far, the >> > >> idea >> > >>>> of >> > >>>>>> slot >> > >>>>>>>>> sharing groups was that it tells the system that a set of >> > >>>> operators >> > >>>>>> can >> > >>>>>>>> be >> > >>>>>>>>> deployed in the same slot. But the system still had the >> > >> freedom >> > >>>> to >> > >>>>>> say >> > >>>>>>>> that >> > >>>>>>>>> it would rather place these tasks in different slots if it >> > >>>> wanted. >> > >>>>> If >> > >>>>>>> we >> > >>>>>>>>> now specify resource requirements on a per slot sharing >> > >> group, >> > >>>> then >> > >>>>>> the >> > >>>>>>>>> only option for a scheduler which does not support slot >> > >> sharing >> > >>>>>> groups >> > >>>>>>> is >> > >>>>>>>>> to say that every operator in this slot sharing group >> > needs a >> > >>>> slot >> > >>>>>> with >> > >>>>>>>> the >> > >>>>>>>>> same resources as the whole group. >> > >>>>>>>>> >> > >>>>>>>>> So for example, if we have a job consisting of two >> operator >> > >> op_1 >> > >>>>> and >> > >>>>>>> op_2 >> > >>>>>>>>> where each op needs 100 MB of memory, we would then say >> that >> > >> the >> > >>>>> slot >> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we have a >> > >> cluster >> > >>>>>> with >> > >>>>>>> 2 >> > >>>>>>>>> TMs with one slot of 100 MB each, then the system cannot >> run >> > >> this >> > >>>>>> job. >> > >>>>>>> If >> > >>>>>>>>> the resources were specified on an operator level, then >> the >> > >>>> system >> > >>>>>>> could >> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to >> > >> TM_2. >> > >>>>>>>>> Originally, one of the primary goals of slot sharing >> groups >> > >> was >> > >>>> to >> > >>>>>> make >> > >>>>>>>> it >> > >>>>>>>>> easier for the user to reason about how many slots a job >> > >> needs >> > >>>>>>>> independent >> > >>>>>>>>> of the actual number of operators in the job. >> Interestingly, >> > >> if >> > >>>> all >> > >>>>>>>>> operators have their resources properly specified, then >> slot >> > >>>>> sharing >> > >>>>>> is >> > >>>>>>>> no >> > >>>>>>>>> longer needed because Flink could slice off the >> > appropriately >> > >>>> sized >> > >>>>>>> slots >> > >>>>>>>>> for every Task individually. What matters is whether the >> > >> whole >> > >>>>>> cluster >> > >>>>>>>> has >> > >>>>>>>>> enough resources to run all tasks or not. >> > >>>>>>>>> >> > >>>>>>>>> Cheers, >> > >>>>>>>>> Till >> > >>>>>>>>> >> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo < >> > >> karma...@gmail.com <mailto:karma...@gmail.com>> >> > >>>>>> wrote: >> > >>>>>>>>>> Hi, there, >> > >>>>>>>>>> >> > >>>>>>>>>> We would like to start a discussion thread on "FLIP-156: >> > >> Runtime >> > >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1], >> > >> where we >> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces >> > >> for >> > >>>>>>>>>> specifying fine-grained resource requirements. >> > >>>>>>>>>> >> > >>>>>>>>>> In this FLIP: >> > >>>>>>>>>> - Expound the user story of fine-grained resource >> > >> management. >> > >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based >> > >> resource >> > >>>>>>>>>> requirements. >> > >>>>>>>>>> - Discuss the pros and cons of the three potential >> > >> granularities >> > >>>>> for >> > >>>>>>>>>> specifying the resource requirements (op, task and slot >> > >> sharing >> > >>>>>> group) >> > >>>>>>>>>> and explain why we choose the slot sharing group. >> > >>>>>>>>>> >> > >>>>>>>>>> Please find more details in the FLIP wiki document [1]. >> > >> Looking >> > >>>>>>>>>> forward to your feedback. >> > >>>>>>>>>> >> > >>>>>>>>>> [1] >> > >>>>>>>>>> >> > >> >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements >> > < >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements >> > >> > >>>>>>>>>> Best, >> > >>>>>>>>>> Yangze Guo >> > >>>>>>>>>> >> > >>>>>>>> >> > >> >>