Yup, working on addressing all of the comments! Thanks for leaving them, everyone.
Also, as @serb correctly pointed out (and Josh found out) I submitted an updated patch [1] with updated design document [2] [1] https://reviews.apache.org/r/57487/ [2] https://docs.google.com/document/d/1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit# On Mon, Mar 13, 2017 at 2:33 PM Joshua Cohen <jco...@apache.org> wrote: > Dmitriy, > > There's a fair number of comments both here and on the doc. Will you have > time to respond to these so we can find a path forward? > > Cheers, > > Joshua > > On Wed, Mar 8, 2017 at 8:44 PM, David McLaughlin <dmclaugh...@apache.org> > wrote: > > > Ticket for replace task primitive already exists: > > https://issues.apache.org/jira/browse/MESOS-1280 > > > > On Wed, Mar 8, 2017 at 6:34 PM, David McLaughlin <dmclaugh...@apache.org > > > > wrote: > > > > > Spoke with Zameer offline and he asked me to post additional thoughts > > > here. > > > > > > My motivation for solving this without dynamic reservations is just the > > > sheer number of questions I have after reading the RFC and current > design > > > doc. And most of them are not about the current proposal and goals or > the > > > MVP but more about how this feature will scale into persistent storage. > > > > > > I think best-effort dynamic reservations are such a different problem > > than > > > the reservations that would be needed to support persistent storage. My > > > primary concern is around things like quota. For the current proposal > and > > > the small best-effort feature we're adding, it makes no sense to get > into > > > the complexities of separate quota for reserved resources vs preferred > > > resources, but the reality of exposing such a concept to a large > > > organisation where we can't automatically reclaim anything reserved > means > > > we'd almost definitely want that. The issue with the iterative approach > > is > > > decisions we take here could have a huge impact on those tasks later, > > once > > > we expose the reserved tier into the open. That means more upfront > design > > > and planning, which so far has blocked a super useful feature that I > feel > > > all of us want. > > > > > > My gut feeling is we went about this all wrong. We started with dynamic > > > reservations and thought about how we could speed up task scheduling > with > > > them. If we took the current problem brief and started from first > > > principals then I think we'd naturally look for something like a > > > replaceTask(offerId, taskInfo) type API from Mesos. > > > > > > I'll bring this up within our team and see if we can put resources on > > > adding such an API. Any feedback on this approach in the meantime is > > > welcome. > > > > > > On Wed, Mar 8, 2017 at 5:30 PM, David McLaughlin < > dmclaugh...@apache.org > > > > > > wrote: > > > > > >> You don't have to store anything with my proposal. Preemption doesn't > > >> store anything either. The whole thing is it's just best-effort, and > if > > the > > >> Scheduler restarts the worst that would happen is part of the current > > batch > > >> would have to go through the current Scheduling loop that users > tolerate > > >> and deal with today. > > >> > > >> > > >> > > >> On Wed, Mar 8, 2017 at 5:08 PM, Zameer Manji <zma...@apache.org> > wrote: > > >> > > >>> David, > > >>> > > >>> I have two concerns with that idea. First, it would require > persisting > > >>> the > > >>> relationship of <Hostname, Resources> to <Task> for every task. I'm > not > > >>> sure if adding more storage and storage operations is the ideal way > of > > >>> solving this problem. Second, in a multi framework environment, a > > >>> framework > > >>> needs to use dynamic reservations otherwise the resources might be > > taken > > >>> by > > >>> another framework. > > >>> > > >>> On Wed, Mar 8, 2017 at 5:01 PM, David McLaughlin < > > dmclaugh...@apache.org > > >>> > > > >>> wrote: > > >>> > > >>> > So I read the docs again and I have one major question - do we even > > >>> need > > >>> > dynamic reservations for the current proposal? > > >>> > > > >>> > The current goal of the proposed work is to keep an offer on a host > > and > > >>> > prevent some other pending task from taking it before the next > > >>> scheduling > > >>> > round. This exact problem is solved in preemption and we could use > a > > >>> > similar technique for reserving offers after killing tasks when > going > > >>> > through the update loop. We wouldn't need to add tiers or > > >>> reconciliation or > > >>> > solve any of these other concerns. Reusing an offer skips so much > of > > >>> the > > >>> > expensive stuff in the Scheduler that it would be a no-brainer for > > the > > >>> > operator to turn it on for every single task in the cluster. > > >>> > > > >>> > > > >>> > On Thu, Mar 2, 2017 at 7:52 AM, Steve Niemitz <sniem...@apache.org > > > > >>> wrote: > > >>> > > > >>> > > I read over the docs, it looks like a good start. Personally I > > >>> don't see > > >>> > > much of a benefit for dynamically reserved cpu/mem, but I'm > excited > > >>> about > > >>> > > the possibility of building off this for dynamically reserved > > >>> persistent > > >>> > > volumes. > > >>> > > > > >>> > > I would like to see more detail on how a reservation "times out", > > >>> and the > > >>> > > configuration options per job around that, as I feel like its the > > >>> most > > >>> > > complicated part of all of this. Ideally there would also be > hooks > > >>> into > > >>> > > the host maintenance APIs here. > > >>> > > > > >>> > > I also didn't see any mention of it, but I believe mesos requires > > the > > >>> > > framework to reserve resources with a role. By default aurora > runs > > >>> as > > >>> > the > > >>> > > special "*" role, does this mean aurora will need to have a role > > >>> > specified > > >>> > > now for this to work? Or does mesos allow reserving resources > > >>> without a > > >>> > > role? > > >>> > > > > >>> > > On Thu, Mar 2, 2017 at 8:35 AM, Erb, Stephan < > > >>> > stephan....@blue-yonder.com> > > >>> > > wrote: > > >>> > > > > >>> > > > Hi everyone, > > >>> > > > > > >>> > > > There have been two documents on Dynamic Reservations as a > first > > >>> step > > >>> > > > towards persistent services: > > >>> > > > > > >>> > > > · RFC: https://docs.google.com/document/d/ > > >>> > > > 15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A/edit#heading=h. > > >>> > hcsc8tda08vy > > >>> > > > > > >>> > > > · Technical Design Doc: > https://docs.google.com/docume > > >>> nt/d/ > > >>> > > > 1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE/edit#heading=h. > > >>> > klg3urfbnq3v > > >>> > > > > > >>> > > > Since a couple of days there are also now two patches online > for > > a > > >>> MVP > > >>> > by > > >>> > > > Dmitriy: > > >>> > > > > > >>> > > > · https://reviews.apache.org/r/56690/ > > >>> > > > > > >>> > > > · https://reviews.apache.org/r/56691/ > > >>> > > > > > >>> > > > From reading the documents, I am under the impression that > there > > >>> is a > > >>> > > > rough consensus on the following points: > > >>> > > > > > >>> > > > · We want dynamic reservations. Our general goal is to > > >>> enable > > >>> > the > > >>> > > > re-scheduling of tasks on the same host they used in a previous > > >>> run. > > >>> > > > > > >>> > > > · Dynamic reservations are a best-effort feature. If in > > >>> doubt, > > >>> > a > > >>> > > > task will be scheduled somewhere else. > > >>> > > > > > >>> > > > · Jobs opt into reserved resources using an appropriate > > >>> tier > > >>> > > > config. > > >>> > > > > > >>> > > > · The tier config in supposed to be neither preemptible > > nor > > >>> > > > revocable. Reserving resources therefore requires appropriate > > >>> quota. > > >>> > > > > > >>> > > > · Aurora will tag reserved Mesos resources by adding > the > > >>> unique > > >>> > > > instance key of the reserving task instance as a label. Only > this > > >>> task > > >>> > > > instance will be allowed to use those tagged resources. > > >>> > > > > > >>> > > > I am unclear on the following general questions as there is > > >>> > contradicting > > >>> > > > content: > > >>> > > > > > >>> > > > a) How does the user interact with reservations? There > are > > >>> > several > > >>> > > > proposals in the documents to auto-reserve on `aurora job > create` > > >>> or > > >>> > > > `aurora cron schedule` and to automatically un-reserve on the > > >>> > appropriate > > >>> > > > reverse actions. But will we also allow a user further control > > >>> over the > > >>> > > > reservations so that they can manage those independent of the > > >>> task/job > > >>> > > > lifecycle? For example, how does Borg handle this? > > >>> > > > > > >>> > > > b) The implementation proposal and patches include an > > >>> > > > OfferReconciler, so this implies we don’t want to offer any > > >>> control for > > >>> > > the > > >>> > > > user. The only control mechanism will be the cluster-wide offer > > >>> wait > > >>> > time > > >>> > > > limiting the number of seconds unused reserved resources can > > linger > > >>> > > before > > >>> > > > they are un-reserved. > > >>> > > > > > >>> > > > c) Will we allow adhoc/cron jobs to reserve resources? > Does > > >>> it > > >>> > even > > >>> > > > matter if we don’t give control to users and just rely on the > > >>> > > > OfferReconciler? > > >>> > > > > > >>> > > > > > >>> > > > I have a couple of questions on the MVP and some implementation > > >>> > details. > > >>> > > I > > >>> > > > will follow up with those in a separate mail. > > >>> > > > > > >>> > > > Thanks and best regards, > > >>> > > > Stephan > > >>> > > > > > >>> > > > > >>> > > > >>> > -- > > >>> > Zameer Manji > > >>> > > > >>> > > >> > > >> > > > > > >