[ https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240214#comment-15240214 ]
Deepak Vij commented on MESOS-3548: ----------------------------------- Hi Stefano, this is still a work-in-progress and not ready yet. We are going to present a talk and demo during the upcoming MesosCon conference in June. Thanks. - Deepak > Investigate federations of Mesos masters > ---------------------------------------- > > Key: MESOS-3548 > URL: https://issues.apache.org/jira/browse/MESOS-3548 > Project: Mesos > Issue Type: Improvement > Reporter: Neil Conway > Labels: federation, mesosphere, multi-dc > > In a large Mesos installation, the operator might want to ensure that even if > the Mesos masters are inaccessible or failed, new tasks can still be > scheduled (across multiple different frameworks). HA masters are only a > partial solution here: the masters might still be inaccessible due to a > correlated failure (e.g., Zookeeper misconfiguration/human error). > To support this, we could support the notion of "hierarchies" or > "federations" of Mesos masters. In a Mesos installation with 10k machines, > the operator might configure 10 Mesos masters (each of which might be HA) to > manage 1k machines each. Then an additional "meta-Master" would manage the > allocation of cluster resources to the 10 masters. Hence, the failure of any > individual master would impact 1k machines at most. The meta-master might not > have a lot of work to do: e.g., it might be limited to occasionally > reallocating cluster resources among the 10 masters, or ensuring that newly > added cluster resources are allocated among the masters as appropriate. > Hence, the failure of the meta-master would not prevent any of the individual > masters from scheduling new tasks. A single framework instance probably > wouldn't be able to use more resources than have been assigned to a single > Master, but that seems like a reasonable restriction. > This feature might also be a good fit for a multi-datacenter deployment of > Mesos: each Mesos master instance would manage a single DC. Naturally, > reducing the traffic between frameworks and the meta-master would be > important for performance reasons in a configuration like this. > Operationally, this might be simpler if Mesos processes were self-hosting > ([MESOS-3547]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)