Folks, Thanks to everyone for their feedback! Based on discussions with members of the Mesos community, we've made a few changes to this proposal. To summarize:
(1) Renamed "rack" to "zone", both to be a bit more abstract and to match the terminology used by most public cloud providers. That is, a fault domain now consists of a zone and a region. (2) To accommodate future kinds of domains, the DomainInfo message now has a nested "FaultDomain" field. New types of domains (e.g., latency domains, power domains) might be represented in the future via additional fields in DomainInfo, but such extensions are out of the scope of the current proposal. (3) Clarified that allowing an agent to transition from "no configured domain" to "configured domain" will require an agent drain in the MVP, and added some discussion of the implementation/framework API challenges around supporting domain opt-in w/o. The review chain for the MVP of this feature are up now (MESOS-7607). Neil On Mon, Apr 17, 2017 at 9:44 AM, Neil Conway <neil.con...@gmail.com> wrote: > Folks, > > I'd like to enhance Mesos to support a first-class notion of "fault > domains" -- i.e., identifying the "rack" and "region" (DC) where a > Mesos agent or master is located. The goal is to enable two main > features: > > (1) To make it easier to write "rack-aware" Mesos frameworks that are > portable to different Mesos clusters. > > (2) To improve the experience of configuring Mesos with a set of > masters and agents in one DC, and another pool of "remote" agents in a > different DC. > > For more information, please see the design doc: > > https://docs.google.com/document/d/1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8 > > I'd love any feedback, either directly on the Google doc or via email. > > Thanks, > Neil