Thanks for the votes, Gary and Kurt. @Kurt Sorry for the confusion. I've added a clarification in the section "Unknown Resource Requirement".
And +1 (non-binding) from my side. Thank you~ Xintong Song On Tue, Sep 24, 2019 at 5:35 PM Kurt Young <ykt...@gmail.com> wrote: > If it's possible, I would suggest to add one sector in this doc to > emphasize that current design has a prerequisite that each job > should either has all its operators using unknown resource > profile or all using specified amount of resource. This would > make this document easier to understand. > > (I was confused by it and realized this after talking to Xingtong > offline) > > But still I would +1 for this. > > Best, > Kurt > > > On Mon, Sep 23, 2019 at 10:18 PM Till Rohrmann <trohrm...@apache.org> > wrote: > > > Thanks for updating the Flip. It looks good to me. > > > > +1 (binding) > > > > Cheers, > > Till > > > > On Mon, Sep 23, 2019 at 4:12 PM Xintong Song <tonysong...@gmail.com> > > wrote: > > > > > @Till @Andrey > > > > > > According to the comments, I just updated the FLIP document [1], with > the > > > following changes: > > > > > > - Remove SlotID (in the section Protocol Changes) > > > - Updated implementation steps to reduce separated code paths. As > far > > as > > > I can see at the moment, we do not need the feature option. We can > add > > > it > > > if later we find it necessary in the implementation. > > > > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation > > > > > > On Fri, Sep 20, 2019 at 11:01 AM Xintong Song <tonysong...@gmail.com> > > > wrote: > > > > > > > I'm not sure if I understand the implementation plan you suggested > > > > correctly. To my understanding, it seems that all the steps except > for > > > step > > > > 5 have to happen in strict order. > > > > > > > > - Profiles to be used in step 2 is reported with step 1. > > > > - SlotProfile in TaskExecutorGateway#requestSlot in step 3 comes > > from > > > > profiles used in step 2. > > > > - Only if RM request slots from TM with profiles (step 3), would > TM > > be > > > > able to do the proper bookkeeping (step 4) > > > > - Step 5 can be done as long as we have step 2. > > > > - Step 6 relies on both step 4 and step 5, for proper > bookkeepings > > on > > > > both TM and RM sides before enabling non-default profiles. > > > > > > > > That means we can only work on the steps in the following order. > > > > 1-2-3-4-6 > > > > \-5-/ > > > > > > > > What I'm trying to achieve with the current plan, is to have most of > > the > > > > implementation steps paralleled, as the following. So that Andrey > and I > > > can > > > > work concurrently without blocking each other too much. > > > > 1-2-3-4 > > > > \5-6-7 > > > > > > > > > > > > I also agree that it would be good to not add too much separate > codes. > > I > > > > would suggest leave that decision to the implementation time. E.g., > if > > by > > > > the time we do the TM side bookkeeping, the RM side has already > > > implemented > > > > requesting slots with profiles, then we do not need to separate the > > code > > > > paths. > > > > > > > > > > > > To that end, I think it makes sense to adjust step 5-7 to first use > > > > default slot resource profiles for all the bookkeepings, and replace > it > > > > with the requested profiles at the end. > > > > > > > > > > > > What do you think? > > > > > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Thu, Sep 19, 2019 at 7:59 PM Till Rohrmann <trohrm...@apache.org> > > > > wrote: > > > > > > > >> I think besides of point 1. and 3. there are no dependencies between > > the > > > >> RM > > > >> and TM side changes. Also, I'm not sure whether it makes sense to > > split > > > >> the > > > >> slot manager changes up into the proposed steps 5, 6 and 7. > > > >> > > > >> I would highly recommend to not add too much duplicate > logic/separate > > > code > > > >> paths because it just adds blind spots which are probably not as > well > > > >> tested as the old code paths. > > > >> > > > >> Cheers, > > > >> Till > > > >> > > > >> On Thu, Sep 19, 2019 at 11:58 AM Xintong Song < > tonysong...@gmail.com> > > > >> wrote: > > > >> > > > >> > Thanks for the comments, Till. > > > >> > > > > >> > - Agree on removing SlotID. > > > >> > > > > >> > - Regarding the implementation plan, it is true that we can > possibly > > > >> reduce > > > >> > codes separated by the feature option. But I think to do that we > > need > > > to > > > >> > introduce more dependencies between implementation steps. With the > > > >> current > > > >> > plan, we can easily separate steps on the RM side and the TM side, > > and > > > >> > start concurrently working on them after quickly updating the > > > >> interfaces in > > > >> > between. The feature will come alive when the steps on both RM/TM > > > sides > > > >> are > > > >> > finished. Since we are planning to have two persons (Andrey and I) > > > >> working > > > >> > on this FLIP, I think the current plan is probably more > convenient. > > > >> > > > > >> > Thank you~ > > > >> > > > > >> > Xintong Song > > > >> > > > > >> > > > > >> > > > > >> > On Thu, Sep 19, 2019 at 5:09 PM Till Rohrmann < > trohrm...@apache.org > > > > > > >> > wrote: > > > >> > > > > >> > > Hi Xintong, > > > >> > > > > > >> > > thanks for starting the vote. The general plan looks good. Hence > > +1 > > > >> from > > > >> > my > > > >> > > side. I still have some minor comments one could think about: > > > >> > > > > > >> > > * As we no longer have predetermined slots on the TaskExecutor, > I > > > >> think > > > >> > we > > > >> > > can get rid of the SlotID. Instead, an allocated slot will be > > > >> identified > > > >> > by > > > >> > > the AllocationID and the TaskManager's ResourceID in order to > > > >> > differentiate > > > >> > > duplicate registrations. > > > >> > > * For the implementation plan, I believe there is only one tiny > > part > > > >> on > > > >> > the > > > >> > > SlotManager for which we need a separate code path/feature flag > > > which > > > >> is > > > >> > > how we find a matching slot. Everything else should be possible > to > > > >> > > implement in a way that it works with dynamic and static slot > > > >> allocation: > > > >> > > 1. Let TMs register with default slot profile at RM > > > >> > > 2. Change SlotManager to use reported slot profiles instead of > > > >> > > pre-calculated profiles > > > >> > > 3. Replace SlotID with SlotProfile in > > > TaskExecutorGateway#requestSlot > > > >> > > 4. Extend TM to support dynamic slot allocation (aka proper > > > >> bookkeeping) > > > >> > > (can happen concurrently to any of steps 2-3) > > > >> > > 5. Add bookkeeping to SlotManager (for pending TMs and > registered > > > TMs) > > > >> > but > > > >> > > still only use default slot profiles for matching with slot > > requests > > > >> > > 6. Allow to match slot requests with reported resources instead > of > > > >> > default > > > >> > > slot profiles (here we could use a feature flag to switch > between > > > >> dynamic > > > >> > > and static slot allocation) > > > >> > > > > > >> > > Wdyt? > > > >> > > > > > >> > > Cheers, > > > >> > > Till > > > >> > > > > > >> > > On Thu, Sep 19, 2019 at 9:45 AM Andrey Zagrebin < > > > azagre...@apache.org > > > >> > > > > >> > > wrote: > > > >> > > > > > >> > > > Hi Xintong, > > > >> > > > > > > >> > > > Thanks for starting the vote, +1 from my side. > > > >> > > > > > > >> > > > Best, > > > >> > > > Andrey > > > >> > > > > > > >> > > > On Tue, Sep 17, 2019 at 4:26 PM Xintong Song < > > > tonysong...@gmail.com > > > >> > > > > >> > > > wrote: > > > >> > > > > > > >> > > > > Hi all, > > > >> > > > > > > > >> > > > > I would like to start the vote for FLIP-56 [1], on which a > > > >> consensus > > > >> > is > > > >> > > > > reached in this discussion thread [2]. > > > >> > > > > > > > >> > > > > The vote will be open for at least 72 hours. I'll try to > close > > > it > > > >> > after > > > >> > > > > Sep. 20 15:00 UTC, unless there is an objection or not > enough > > > >> votes. > > > >> > > > > > > > >> > > > > Thank you~ > > > >> > > > > > > > >> > > > > Xintong Song > > > >> > > > > > > > >> > > > > > > > >> > > > > [1] > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation > > > >> > > > > > > > >> > > > > [2] > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-56-Dynamic-Slot-Allocation-td31960.html > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >