@Till @Andrey

According to the comments, I just updated the FLIP document [1], with the
following changes:

   - Remove SlotID (in the section Protocol Changes)
   - Updated implementation steps to reduce separated code paths. As far as
   I can see at the moment, we do not need the feature option. We can add it
   if later we find it necessary in the implementation.


Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation

On Fri, Sep 20, 2019 at 11:01 AM Xintong Song <tonysong...@gmail.com> wrote:

> I'm not sure if I understand the implementation plan you suggested
> correctly. To my understanding, it seems that all the steps except for step
> 5 have to happen in strict order.
>
>    - Profiles to be used in step 2 is reported with step 1.
>    - SlotProfile in TaskExecutorGateway#requestSlot in step 3 comes from
>    profiles used in step 2.
>    - Only if RM request slots from TM with profiles (step 3), would TM be
>    able to do the proper bookkeeping (step 4)
>    - Step 5 can be done as long as we have step 2.
>    - Step 6 relies on both step 4  and step 5, for proper bookkeepings on
>    both TM and RM sides before enabling non-default profiles.
>
> That means we can only work on the steps in the following order.
> 1-2-3-4-6
>    \-5-/
>
> What I'm trying to achieve with the current plan, is to have most of the
> implementation steps paralleled, as the following. So that Andrey and I can
> work concurrently without blocking each other too much.
> 1-2-3-4
>    \5-6-7
>
>
> I also agree that it would be good to not add too much separate codes. I
> would suggest leave that decision to the implementation time. E.g., if by
> the time we do the TM side bookkeeping, the RM side has already implemented
> requesting slots with profiles, then we do not need to separate the code
> paths.
>
>
> To that end, I think it makes sense to adjust step 5-7 to first use
> default slot resource profiles for all the bookkeepings, and replace it
> with the requested profiles at the end.
>
>
> What do you think?
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Sep 19, 2019 at 7:59 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
>> I think besides of point 1. and 3. there are no dependencies between the
>> RM
>> and TM side changes. Also, I'm not sure whether it makes sense to split
>> the
>> slot manager changes up into the proposed steps 5, 6 and 7.
>>
>> I would highly recommend to not add too much duplicate logic/separate code
>> paths because it just adds blind spots which are probably not as well
>> tested as the old code paths.
>>
>> Cheers,
>> Till
>>
>> On Thu, Sep 19, 2019 at 11:58 AM Xintong Song <tonysong...@gmail.com>
>> wrote:
>>
>> > Thanks for the comments, Till.
>> >
>> > - Agree on removing SlotID.
>> >
>> > - Regarding the implementation plan, it is true that we can possibly
>> reduce
>> > codes separated by the feature option. But I think to do that we need to
>> > introduce more dependencies between implementation steps. With the
>> current
>> > plan, we can easily separate steps on the RM side and the TM side, and
>> > start concurrently working on them after quickly updating the
>> interfaces in
>> > between. The feature will come alive when the steps on both RM/TM sides
>> are
>> > finished. Since we are planning to have two persons (Andrey and I)
>> working
>> > on this FLIP, I think the current plan is probably more convenient.
>> >
>> > Thank you~
>> >
>> > Xintong Song
>> >
>> >
>> >
>> > On Thu, Sep 19, 2019 at 5:09 PM Till Rohrmann <trohrm...@apache.org>
>> > wrote:
>> >
>> > > Hi Xintong,
>> > >
>> > > thanks for starting the vote. The general plan looks good. Hence +1
>> from
>> > my
>> > > side. I still have some minor comments one could think about:
>> > >
>> > > * As we no longer have predetermined slots on the TaskExecutor, I
>> think
>> > we
>> > > can get rid of the SlotID. Instead, an allocated slot will be
>> identified
>> > by
>> > > the AllocationID and the TaskManager's ResourceID in order to
>> > differentiate
>> > > duplicate registrations.
>> > > * For the implementation plan, I believe there is only one tiny part
>> on
>> > the
>> > > SlotManager for which we need a separate code path/feature flag which
>> is
>> > > how we find a matching slot. Everything else should be possible to
>> > > implement in a way that it works with dynamic and static slot
>> allocation:
>> > > 1. Let TMs register with default slot profile at RM
>> > > 2. Change SlotManager to use reported slot profiles instead of
>> > > pre-calculated profiles
>> > > 3. Replace SlotID with SlotProfile in TaskExecutorGateway#requestSlot
>> > > 4. Extend TM to support dynamic slot allocation (aka proper
>> bookkeeping)
>> > > (can happen concurrently to any of steps 2-3)
>> > > 5. Add bookkeeping to SlotManager (for pending TMs and registered TMs)
>> > but
>> > > still only use default slot profiles for matching with slot requests
>> > > 6. Allow to match slot requests with reported resources instead of
>> > default
>> > > slot profiles (here we could use a feature flag to switch between
>> dynamic
>> > > and static slot allocation)
>> > >
>> > > Wdyt?
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > > On Thu, Sep 19, 2019 at 9:45 AM Andrey Zagrebin <azagre...@apache.org
>> >
>> > > wrote:
>> > >
>> > > > Hi Xintong,
>> > > >
>> > > > Thanks for starting the vote, +1 from my side.
>> > > >
>> > > > Best,
>> > > > Andrey
>> > > >
>> > > > On Tue, Sep 17, 2019 at 4:26 PM Xintong Song <tonysong...@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > I would like to start the vote for FLIP-56 [1], on which a
>> consensus
>> > is
>> > > > > reached in this discussion thread [2].
>> > > > >
>> > > > > The vote will be open for at least 72 hours. I'll try to close it
>> > after
>> > > > > Sep. 20 15:00 UTC, unless there is an objection or not enough
>> votes.
>> > > > >
>> > > > > Thank you~
>> > > > >
>> > > > > Xintong Song
>> > > > >
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>> > > > >
>> > > > > [2]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-56-Dynamic-Slot-Allocation-td31960.html
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to