Hey, This is not a design doc for supporting Mesos Maintenance, but more of a high level overview on how we *could* support it going forward. I just wanted to get this idea out there now to see where we all stand.
As Ankit mentioned in AURORA-1800 Mesos has had Maintenance primitives since 0.25. You can read about them here <http://mesos.apache.org/documentation/latest/maintenance/>. The primitives map pretty well to our existing concept of maintenance, but they allow operators to do work across multiple frameworks. Since the Mesos community is growing and new frameworks are emerging all the time, I think Aurora should support these primitives and drop our custom primitives to be a better player in the ecosystem. We cannot adopt these just yet however, because it is only accessible behind the Mesos HTTP API which Aurora does not use today. Further, `aurora_admin` has some SLA aware maintenance processes which are computed and coordinated from the client. I think for us to successfully adopt Mesos Maintenance, we need to do at least two things: 1. Adopt the Mesos HTTP API. 2. Move the SLA aware maintenance logic from the admin tool into the scheduler itself, so the scheduler can coordinate with the Mesos Master in an SLA aware fashion. What do folks think? -- Zameer Manji