The implementation plan [1] is updated, with the following changes:

   - Add default slot resource profile to
   ResourceManagerGateway#registerTaskExecutor rather than #sendSlotReport.
   - Swap 'TaskExecutor derive and register with default slot resource
   profile' and 'Extend TaskExecutor to support dynamic slot allocation'
   - Add step for updating RestAPI / Web UI

Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation

On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <tonysong...@gmail.com> wrote:

> @Till
> Thanks for the reminding. I'll add a step for updating the web ui. I'll
> try to involve Lining to help us with this step.
>
> @Andrey
> I was thinking that after we define the RM-TM interfaces in step 2, it
> would be good to concurrently work on both RM and TM side. But yes, if we
> finish Step 4 early, then it would make step 6 easier. We can start to have
> some IT/E2E tests, with the default slot resource profiles being available.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <and...@ververica.com>
> wrote:
>
>> @Xintong
>>
>> Thanks for the feedback.
>>
>> Just to clarify step 6:
>> If the first point is done before step 5 (e.g. as part of 4) then it is
>> just keeping the info about the default slot in RM's data structure
>> associated the TM and no real change in the behaviour.
>> When this info is available, I think it can be straightforwardly used
>> during step 5 where we get either concrete slot requirement
>> or the unknown one (step 6, point 2) which simply grabs some of the
>> concrete default ones (btw not clear which one, seems just some random?)
>>
>> For steps 5,7, true, it is not quite clear whether we can avoid some
>> split,
>> e.g. after step 5 before doing step 7.
>> I agree that we should introduce the feature flag if we clearly see that
>> it
>> would be a bigger effort without the flag.
>>
>> Best,
>> Andrey
>>
>> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>> > One thing which was briefly mentioned in the Flip but not in the
>> > implementation plan is the update of the web UI. I think it is worth
>> > putting an extra item for updating the web UI to properly display the
>> > resources a TM has still to offer with dynamic slot allocation. I guess
>> we
>> > need to pull in some JavaScript help in order to implement this step.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <tonysong...@gmail.com>
>> > wrote:
>> >
>> > > Thanks for the comments, Andrey.
>> > >
>> > > - I agree that instead of ResourceManagerGateway#sendSlotReport, we
>> > should
>> > > add the default slot resource profile to
>> > > ResourceManagerGateway#registerTaskExecutor.
>> > >
>> > > - If I understand correctly, the reason you suggest do default slot
>> > > resource profile first and then do step 3 in a way that support both
>> > > TaskExecutorGateway#requestSlot and
>> TaskExecutorGateway#requestResource,
>> > is
>> > > to try to avoid splitting code paths with the feature option? I think
>> we
>> > > can do that, but I also want to bring it up that this can only reduce
>> the
>> > > code split by the feature option (which is good) but not eliminate
>> it. We
>> > > still need the feature option for the fundamental differences, e.g.
>> > > creating new SlotIDs on allocation vs. allocate to free slots with
>> > existing
>> > > SlotIDs.
>> > >
>> > > - I don't really think we can do step 5, 6 and 7 independently.
>> Basically
>> > > they are all making changes to the same component. We probably can do
>> > step
>> > > 6 and 7 independently, but I think they both depends on step 5.
>> > >
>> > > In general, I would say it's good to have as less as possible codes
>> split
>> > > by the feature option, which makes the later clean-up easier. But if
>> it
>> > > cannot be easily done, I would rather not to put too much efforts on
>> > having
>> > > a good abstraction and deduplication between the new code path and the
>> > > original one that we are removing soon.
>> > >
>> > > What do you think?
>> > >
>> > > Thank you~
>> > >
>> > > Xintong Song
>> > >
>> > >
>> > >
>> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <and...@ververica.com
>> >
>> > > wrote:
>> > >
>> > > > Hi Xintong,
>> > > >
>> > > > Thanks for sharing the implementation steps. I also think they makes
>> > > sense
>> > > > with the feature option.
>> > > >
>> > > > I was wondering if we could order the steps in a way that each
>> change
>> > > does
>> > > > not affect other components too much, always having a working system
>> > > > then maybe the feature option does not always need to split the
>> code.
>> > > Here
>> > > > are some thoughts.
>> > > >
>> > > > - We could do default slot profile firstly and include it into the
>> TM
>> > > > registration. I would suggest to add
>> > > > to ResourceManagerGateway#registerTaskExecutor, not sendSlotReport.
>> > > >   This way RM knows about it but does not use at this point. (parts
>> of
>> > > step
>> > > > 4,6)
>> > > >
>> > > > - We could try to do step 3 firstly in a way that it also supports
>> the
>> > > > current way of allocation in TaskExecutorGateway#requestSlot with
>> the
>> > > > default slot profile
>> > > >   and sends reports both with available resources and with free
>> default
>> > > > slots which correspond to the available resources. We can just
>> remove
>> > > free
>> > > > default slots later.
>> > > >   The new way of TaskExecutorGateway#requestResource could be also
>> > > > implemented here but not used yet.
>> > > >
>> > > > - Then step 5 can use the new TaskExecutorGateway#requestResource
>> and
>> > the
>> > > > default slot profile
>> > > >
>> > > > - Not sure, step 5 and 7 can be implemented independently without
>> > > > regression of what we have. Maybe if we do step 7 firstly it will
>> have
>> > > only
>> > > > default slots firstly and it will simplify step 5 later.
>> > > >
>> > > > Best,
>> > > > Andrey
>> > > >
>> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <tonysong...@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > Thanks for the comments, Till and Wenlong.
>> > > > >
>> > > > > @Wenlong
>> > > > > Regarding slot sharing, the general idea is to request a slot with
>> > > > > resources for tasks of the entire slot sharing group. Details can
>> be
>> > > > found
>> > > > > in FLIP-53 [1], regarding how to decide the slot sharing groups
>> and
>> > how
>> > > > to
>> > > > > manage task resources within the shared slots.
>> > > > >
>> > > > > Thank you~
>> > > > >
>> > > > > Xintong Song
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
>> > wenlong88....@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for the
>> feature!
>> > > It
>> > > > is
>> > > > > > something like mapreduce-1.0 to mapreduce-2.0.
>> > > > > >
>> > > > > > I like the design on the whole. One point may need to be
>> included
>> > in
>> > > > the
>> > > > > > proposal:How we deal with slot share group and dynamic slot
>> > > allocation?
>> > > > > It
>> > > > > > can be quite different with dynamic slot allocation.
>> > > > > >
>> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <
>> trohrm...@apache.org>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Thanks for the update Xintong. From a high level perspective
>> the
>> > > > > > > implementation plan looks good to me.
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Till
>> > > > > > >
>> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
>> > > tonysong...@gmail.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Added implementation steps for this FLIP on the wiki page
>> [1].
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Thank you~
>> > > > > > > >
>> > > > > > > > Xintong Song
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > [1]
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
>> > > > tonysong...@gmail.com>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > @Zili
>> > > > > > > > >
>> > > > > > > > > As far as I know, Timo is drafting a FLIP that has taken
>> the
>> > > > number
>> > > > > > 55.
>> > > > > > > > > There is a round-up number maintained on the FLIP wiki
>> page
>> > [1]
>> > > > > shows
>> > > > > > > > > which number should be used for the new FLIP, which
>> should be
>> > > > > > increased
>> > > > > > > > by
>> > > > > > > > > whoever takes the number for a new FLIP.
>> > > > > > > > >
>> > > > > > > > > Thank you~
>> > > > > > > > >
>> > > > > > > > > Xintong Song
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>> > > > > > > > >
>> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
>> > > wander4...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >> We suddenly skipped FLIP-55 lol.
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> Xintong Song <tonysong...@gmail.com> 于2019年8月19日周一
>> > 下午10:23写道:
>> > > > > > > > >>
>> > > > > > > > >> > Hi everyone,
>> > > > > > > > >> >
>> > > > > > > > >> > We would like to start a discussion thread on "FLIP-56:
>> > > > Dynamic
>> > > > > > Slot
>> > > > > > > > >> > Allocation" [1]. This is originally part of the
>> discussion
>> > > > > thread
>> > > > > > > for
>> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management" [2]. As
>> Till
>> > > > > > suggested,
>> > > > > > > we
>> > > > > > > > >> > would like split the original discussion into two
>> topics,
>> > > and
>> > > > > > start
>> > > > > > > a
>> > > > > > > > >> > separate new discussion thread as well as FLIP process
>> for
>> > > > this
>> > > > > > one.
>> > > > > > > > >> >
>> > > > > > > > >> > Thank you~
>> > > > > > > > >> >
>> > > > > > > > >> > Xintong Song
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >> > [1]
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>> > > > > > > > >> >
>> > > > > > > > >> > [2]
>> > > > > > > > >> >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to