[jira] [Commented] (MESOS-3548) Investigate federations of Mesos masters

Deepak Vij (JIRA) Fri, 06 Nov 2015 12:47:54 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994404#comment-14994404
 ]


Deepak Vij commented on MESOS-3548:
-----------------------------------

Federated Mesos Clustering across multiple cloud providers environment is very 
near and dear to our hearts with the following federation related common use 
cases.
 
1) Cloud Bursting: Preferentially run my workloads in my on-premise private 
cloud environment, but automatically overflow to my public cloud-hosted 
cluster(s) if I run out of on-premise capacity – holiday seasonality traffic 
spike problem.

2) QoS & Privacy Requirements: Most of my workloads should run in my preferred 
cloud-hosted cluster(s), but some are privacy-sensitive, and should be 
automatically diverted to run in my secure, on-premise private cloud 
environment. Spreading services across multiple cloud providers for avoiding 
datacenter outage scenarios.

3) Avoid Cloud Provider Vendor lock-in: Move workloads across multiple cloud 
providers on a periodic basis depending on their pricing etc.

We have already started exploring design internally within our lab. Would love 
to collaborate on this.

Regards,
Deepak K. Vij
Huawei Software Lab, Santa Clara)

> Investigate federations of Mesos masters
> ----------------------------------------
>
>                 Key: MESOS-3548
>                 URL: https://issues.apache.org/jira/browse/MESOS-3548
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Neil Conway
>              Labels: federation, mesosphere, multi-dc
>
> In a large Mesos installation, the operator might want to ensure that even if 
> the Mesos masters are inaccessible or failed, new tasks can still be 
> scheduled (across multiple different frameworks). HA masters are only a 
> partial solution here: the masters might still be inaccessible due to a 
> correlated failure (e.g., Zookeeper misconfiguration/human error).
> To support this, we could support the notion of "hierarchies" or 
> "federations" of Mesos masters. In a Mesos installation with 10k machines, 
> the operator might configure 10 Mesos masters (each of which might be HA) to 
> manage 1k machines each. Then an additional "meta-Master" would manage the 
> allocation of cluster resources to the 10 masters. Hence, the failure of any 
> individual master would impact 1k machines at most. The meta-master might not 
> have a lot of work to do: e.g., it might be limited to occasionally 
> reallocating cluster resources among the 10 masters, or ensuring that newly 
> added cluster resources are allocated among the masters as appropriate. 
> Hence, the failure of the meta-master would not prevent any of the individual 
> masters from scheduling new tasks. A single framework instance probably 
> wouldn't be able to use more resources than have been assigned to a single 
> Master, but that seems like a reasonable restriction.
> This feature might also be a good fit for a multi-datacenter deployment of 
> Mesos: each Mesos master instance would manage a single DC. Naturally, 
> reducing the traffic between frameworks and the meta-master would be 
> important for performance reasons in a configuration like this.
> Operationally, this might be simpler if Mesos processes were self-hosting 
> ([MESOS-3547]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3548) Investigate federations of Mesos masters

Reply via email to