Now that persistent resources need to be considered, we revisited the maintenance design to ensure persistent frameworks were accounted for. In particular, in the updated design we allow operators to specify a conservative estimate of the unavailability; useful for persistent frameworks. There is no longer a split between the planned schedule and the actual draining, also useful for persistent frameworks.
The updated high level design is here: https://docs.google.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/edit?usp=sharing On Mon, Aug 25, 2014 at 12:24 PM, Benjamin Mahler <benjamin.mah...@gmail.com > wrote: > Hi all, > > I wanted to take a moment to thank Alexandra Sava, who completed her OPW > internship this past week. We worked together in the second half of her > internship to create a design document for maintenance primitives in Mesos > (the original ticket is MESOS-1474 > <https://issues.apache.org/jira/browse/MESOS-1474>, but the design > document is the most up-to-date plan). > > Maintenance in this context consists of anything that requires the tasks > running on the slave to be killed (e.g. kernel upgrades, machine > decommissioning, non-recoverable mesos upgrades / configuration changes, > etc). > > The desire is to expose maintenance events to frameworks in a generic > manner, as to allow frameworks to respect their SLAs, perform better task > placement, and migrate tasks if necessary. > > The design document is here: > > https://docs.google.com/document/d/1NjK7MQeJzTRdfZTQ9q1Q5p4dY985bZ7cFqDpX4_fgjM/edit?usp=sharing > > Please take a moment before the end of next week to go over this design. > *Higher > level feedback and questions can be discussed most effectively in this > thread.* > > Let's thank Alexandra for her work! > > Ben >