The implementation plan [1] is updated, with the following changes: - Add default slot resource profile to ResourceManagerGateway#registerTaskExecutor rather than #sendSlotReport. - Swap 'TaskExecutor derive and register with default slot resource profile' and 'Extend TaskExecutor to support dynamic slot allocation' - Add step for updating RestAPI / Web UI
Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <tonysong...@gmail.com> wrote: > @Till > Thanks for the reminding. I'll add a step for updating the web ui. I'll > try to involve Lining to help us with this step. > > @Andrey > I was thinking that after we define the RM-TM interfaces in step 2, it > would be good to concurrently work on both RM and TM side. But yes, if we > finish Step 4 early, then it would make step 6 easier. We can start to have > some IT/E2E tests, with the default slot resource profiles being available. > > Thank you~ > > Xintong Song > > > > On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <and...@ververica.com> > wrote: > >> @Xintong >> >> Thanks for the feedback. >> >> Just to clarify step 6: >> If the first point is done before step 5 (e.g. as part of 4) then it is >> just keeping the info about the default slot in RM's data structure >> associated the TM and no real change in the behaviour. >> When this info is available, I think it can be straightforwardly used >> during step 5 where we get either concrete slot requirement >> or the unknown one (step 6, point 2) which simply grabs some of the >> concrete default ones (btw not clear which one, seems just some random?) >> >> For steps 5,7, true, it is not quite clear whether we can avoid some >> split, >> e.g. after step 5 before doing step 7. >> I agree that we should introduce the feature flag if we clearly see that >> it >> would be a bigger effort without the flag. >> >> Best, >> Andrey >> >> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >> > One thing which was briefly mentioned in the Flip but not in the >> > implementation plan is the update of the web UI. I think it is worth >> > putting an extra item for updating the web UI to properly display the >> > resources a TM has still to offer with dynamic slot allocation. I guess >> we >> > need to pull in some JavaScript help in order to implement this step. >> > >> > Cheers, >> > Till >> > >> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <tonysong...@gmail.com> >> > wrote: >> > >> > > Thanks for the comments, Andrey. >> > > >> > > - I agree that instead of ResourceManagerGateway#sendSlotReport, we >> > should >> > > add the default slot resource profile to >> > > ResourceManagerGateway#registerTaskExecutor. >> > > >> > > - If I understand correctly, the reason you suggest do default slot >> > > resource profile first and then do step 3 in a way that support both >> > > TaskExecutorGateway#requestSlot and >> TaskExecutorGateway#requestResource, >> > is >> > > to try to avoid splitting code paths with the feature option? I think >> we >> > > can do that, but I also want to bring it up that this can only reduce >> the >> > > code split by the feature option (which is good) but not eliminate >> it. We >> > > still need the feature option for the fundamental differences, e.g. >> > > creating new SlotIDs on allocation vs. allocate to free slots with >> > existing >> > > SlotIDs. >> > > >> > > - I don't really think we can do step 5, 6 and 7 independently. >> Basically >> > > they are all making changes to the same component. We probably can do >> > step >> > > 6 and 7 independently, but I think they both depends on step 5. >> > > >> > > In general, I would say it's good to have as less as possible codes >> split >> > > by the feature option, which makes the later clean-up easier. But if >> it >> > > cannot be easily done, I would rather not to put too much efforts on >> > having >> > > a good abstraction and deduplication between the new code path and the >> > > original one that we are removing soon. >> > > >> > > What do you think? >> > > >> > > Thank you~ >> > > >> > > Xintong Song >> > > >> > > >> > > >> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <and...@ververica.com >> > >> > > wrote: >> > > >> > > > Hi Xintong, >> > > > >> > > > Thanks for sharing the implementation steps. I also think they makes >> > > sense >> > > > with the feature option. >> > > > >> > > > I was wondering if we could order the steps in a way that each >> change >> > > does >> > > > not affect other components too much, always having a working system >> > > > then maybe the feature option does not always need to split the >> code. >> > > Here >> > > > are some thoughts. >> > > > >> > > > - We could do default slot profile firstly and include it into the >> TM >> > > > registration. I would suggest to add >> > > > to ResourceManagerGateway#registerTaskExecutor, not sendSlotReport. >> > > > This way RM knows about it but does not use at this point. (parts >> of >> > > step >> > > > 4,6) >> > > > >> > > > - We could try to do step 3 firstly in a way that it also supports >> the >> > > > current way of allocation in TaskExecutorGateway#requestSlot with >> the >> > > > default slot profile >> > > > and sends reports both with available resources and with free >> default >> > > > slots which correspond to the available resources. We can just >> remove >> > > free >> > > > default slots later. >> > > > The new way of TaskExecutorGateway#requestResource could be also >> > > > implemented here but not used yet. >> > > > >> > > > - Then step 5 can use the new TaskExecutorGateway#requestResource >> and >> > the >> > > > default slot profile >> > > > >> > > > - Not sure, step 5 and 7 can be implemented independently without >> > > > regression of what we have. Maybe if we do step 7 firstly it will >> have >> > > only >> > > > default slots firstly and it will simplify step 5 later. >> > > > >> > > > Best, >> > > > Andrey >> > > > >> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <tonysong...@gmail.com >> > >> > > > wrote: >> > > > >> > > > > Thanks for the comments, Till and Wenlong. >> > > > > >> > > > > @Wenlong >> > > > > Regarding slot sharing, the general idea is to request a slot with >> > > > > resources for tasks of the entire slot sharing group. Details can >> be >> > > > found >> > > > > in FLIP-53 [1], regarding how to decide the slot sharing groups >> and >> > how >> > > > to >> > > > > manage task resources within the shared slots. >> > > > > >> > > > > Thank you~ >> > > > > >> > > > > Xintong Song >> > > > > >> > > > > >> > > > > >> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl < >> > wenlong88....@gmail.com> >> > > > > wrote: >> > > > > >> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for the >> feature! >> > > It >> > > > is >> > > > > > something like mapreduce-1.0 to mapreduce-2.0. >> > > > > > >> > > > > > I like the design on the whole. One point may need to be >> included >> > in >> > > > the >> > > > > > proposal:How we deal with slot share group and dynamic slot >> > > allocation? >> > > > > It >> > > > > > can be quite different with dynamic slot allocation. >> > > > > > >> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann < >> trohrm...@apache.org> >> > > > > wrote: >> > > > > > >> > > > > > > Thanks for the update Xintong. From a high level perspective >> the >> > > > > > > implementation plan looks good to me. >> > > > > > > >> > > > > > > Cheers, >> > > > > > > Till >> > > > > > > >> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song < >> > > tonysong...@gmail.com >> > > > > >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Added implementation steps for this FLIP on the wiki page >> [1]. >> > > > > > > > >> > > > > > > > >> > > > > > > > Thank you~ >> > > > > > > > >> > > > > > > > Xintong Song >> > > > > > > > >> > > > > > > > >> > > > > > > > [1] >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song < >> > > > tonysong...@gmail.com> >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > @Zili >> > > > > > > > > >> > > > > > > > > As far as I know, Timo is drafting a FLIP that has taken >> the >> > > > number >> > > > > > 55. >> > > > > > > > > There is a round-up number maintained on the FLIP wiki >> page >> > [1] >> > > > > shows >> > > > > > > > > which number should be used for the new FLIP, which >> should be >> > > > > > increased >> > > > > > > > by >> > > > > > > > > whoever takes the number for a new FLIP. >> > > > > > > > > >> > > > > > > > > Thank you~ >> > > > > > > > > >> > > > > > > > > Xintong Song >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > [1] >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals >> > > > > > > > > >> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen < >> > > wander4...@gmail.com> >> > > > > > > wrote: >> > > > > > > > > >> > > > > > > > >> We suddenly skipped FLIP-55 lol. >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> Xintong Song <tonysong...@gmail.com> 于2019年8月19日周一 >> > 下午10:23写道: >> > > > > > > > >> >> > > > > > > > >> > Hi everyone, >> > > > > > > > >> > >> > > > > > > > >> > We would like to start a discussion thread on "FLIP-56: >> > > > Dynamic >> > > > > > Slot >> > > > > > > > >> > Allocation" [1]. This is originally part of the >> discussion >> > > > > thread >> > > > > > > for >> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As >> Till >> > > > > > suggested, >> > > > > > > we >> > > > > > > > >> > would like split the original discussion into two >> topics, >> > > and >> > > > > > start >> > > > > > > a >> > > > > > > > >> > separate new discussion thread as well as FLIP process >> for >> > > > this >> > > > > > one. >> > > > > > > > >> > >> > > > > > > > >> > Thank you~ >> > > > > > > > >> > >> > > > > > > > >> > Xintong Song >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > [1] >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation >> > > > > > > > >> > >> > > > > > > > >> > [2] >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >