Hey,

This is not a design doc for supporting Mesos Maintenance, but more of a
high level overview on how we *could* support it going forward. I just
wanted to get this idea out there now to see where we all stand.

As Ankit mentioned in AURORA-1800 Mesos has had Maintenance primitives
since 0.25. You can read about them here
<http://mesos.apache.org/documentation/latest/maintenance/>. The primitives
map pretty well to our existing concept of maintenance, but they allow
operators to do work across multiple frameworks.

Since the Mesos community is growing and new frameworks are emerging all
the time, I think Aurora should support these primitives and drop our
custom primitives to be a better player in the ecosystem.

We cannot adopt these just yet however, because it is only accessible
behind the Mesos HTTP API which Aurora does not use today. Further,
`aurora_admin` has some SLA aware maintenance processes which are computed
and coordinated from the client. I think for us to successfully adopt Mesos
Maintenance, we need to do at least two things:

1. Adopt the Mesos HTTP API.
2. Move the SLA aware maintenance logic from the admin tool into the
scheduler itself, so the scheduler can coordinate with the Mesos Master in
an SLA aware fashion.

What do folks think?

-- 
Zameer Manji

Reply via email to