Hi All,

I'd like to mention some changes that have been discussed amongst the
committers but have not yet been shared broadly with the list.

The central component of Mesos is the "Master". The Master is responsible
for administering slaves, frameworks, and resource offers. It also handles
task launching requests, status updates, and framework messages. As you may
or may not know, the Master is currently stateless, in that it does not
persist any information across failovers. Rather, the Master currently
recovers all of its state from the slaves and frameworks that re-register
after a failover.

This design has many benefits. First, failing over a Master is a trivial
operation. Second, we do not have the performance overhead and complexity
of dealing with persistent state. However, this design opens up a few cases
for information loss in the system. For example, when no Master is running
and a Slave fails permanently, there's no knowledge of this in the failed
over Master.

In order to detect these events, we'd like to add persistence of the
registered slaves. The first step for this was creating the Registrar:
 https://reviews.apache.org/r/14383/
https://reviews.apache.org/r/14384/
https://reviews.apache.org/r/15099/
https://reviews.apache.org/r/15100/

The Registrar is responsible for keeping the official records of the
master. This will initially include SlaveInfo in order to correctly handle
cases like the example I provided above. The Registrar is agnostic to the
underlying data storage and can be backed by a local LevelDB, by ZooKeeper
(for high availability Masters), and in the future by our reconfigurable
replicated log.

The next steps are to implement "statefulness" in the Master using the
Registrar. So far I've sent out some of the preliminary cleanup work, and I
have a few pending patches that I'm in the process of cleaning up that
implement this fully so keep an eye out for those.

In the longer term, we will add persistence of framework information in the
same vein. That is, handling framework failures in the presence of Master
failures.

Cheers!
Ben

Reply via email to