Repository: mesos Updated Branches: refs/heads/master 25feb869e -> 5a1433576
Added documentation for fault domains. Fault domains are a new feature in 1.5 which did not yet have a corresponding description in the documentation. Review: https://reviews.apache.org/r/65437/ Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/5a143357 Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/5a143357 Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/5a143357 Branch: refs/heads/master Commit: 5a1433576eca20f15f1ea309fc202f4bbaf3b6c7 Parents: 25feb86 Author: Benno Evers <bev...@mesosphere.com> Authored: Fri Feb 2 14:12:21 2018 -0800 Committer: Vinod Kone <vinodk...@gmail.com> Committed: Fri Feb 2 14:14:04 2018 -0800 ---------------------------------------------------------------------- docs/configuration/master-and-agent.md | 3 + docs/fault-domains.md | 106 ++++++++++++++++++++++++++++ docs/home.md | 1 + 3 files changed, 110 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/5a143357/docs/configuration/master-and-agent.md ---------------------------------------------------------------------- diff --git a/docs/configuration/master-and-agent.md b/docs/configuration/master-and-agent.md index f247498..df18681 100644 --- a/docs/configuration/master-and-agent.md +++ b/docs/configuration/master-and-agent.md @@ -95,6 +95,9 @@ different zones). Agents configured to use a different region than the master's region will not appear in resource offers to frameworks that have not enabled the <code>REGION_AWARE</code> capability. This value can be specified as either a JSON-formatted string or a file path containing JSON. + +See the [documentation](../fault-domains.md) for further details. + <p/> Example: <pre><code>{ http://git-wip-us.apache.org/repos/asf/mesos/blob/5a143357/docs/fault-domains.md ---------------------------------------------------------------------- diff --git a/docs/fault-domains.md b/docs/fault-domains.md new file mode 100644 index 0000000..08f13b5 --- /dev/null +++ b/docs/fault-domains.md @@ -0,0 +1,106 @@ +--- +title: Apache Mesos - Domains and Regions +layout: documentation +--- + +# Regions and Fault Domains + +Starting with Mesos 1.5, it is possible to place Mesos masters and agents into +*domains*, which are logical groups of machines that share some characteristics. + +Currently, fault domains are the only supported type of domains, which are +groups of machines with similar failure characteristics. + +A fault domain is a 2 level hierarchy of regions and zones. The mapping from +fault domains to physical infrastructure is up to the operator to configure, +although it is recommended that machines in the same zones have low latency to +each other. + +In cloud environments, regions and zones can be mapped to the "region" and +"availability zone" concepts exposed by most cloud providers, respectively. +In on-premise deployments, regions and zones can be mapped to data centers and +racks, respectively. + +Schedulers may prefer to place network-intensive workloads in the same domain, +as this may improve performance. Conversely, a single failure that affects a +host in a domain may be more likely to affect other hosts in the same domain; +hence, schedulers may prefer to place workloads that require high availability +in multiple domains. For example, all the hosts in a single rack might lose +power or network connectivity simultaneously. + +The `--domain` flag can be used to specify the fault domain of a master or +agent node. The value of this flag must be a file path or a JSON dictionary +with the key `fault_domain` and subkeys `region` and `zone` mapping to +arbitrary strings: + + mesos-master --domain='{"fault_domain": {"region": "eu", "zone": "rack1"}}' + + mesos-agent --domain='{"fault_domain": {"region": "eu", "zone": "rack2"}}' + +Frameworks can learn about the domain of an agent by inspecting the `domain` +field in the received offer, which contains a `DomainInfo` that has the +same structure as the JSON dictionary above. + + +# Constraints + +When configuring fault domains for the masters and agents, the following +constraints must be obeyed: + + * If a mesos master is not configured with a domain, it will reject connection + attempts from agents with a domain. + + This is done because the master is not able to determine whether or not the + agent would be remote in this case. + + * Agents with no configured domain are assumed to be in the same domain as the + master. + + If this behaviour isn't desired, the `--require_agent_domain` flag on the + master can be used to enforce that domains are configured on all agents by + having the master reject all registration attempts by agents without a + configured domain. + + * If one master is configured with a domain, all other masters must be in the + same "region" to avoid cross-region quorum writes. It is recommended to put + them in different zones within that region for high availability. + + * The default DRF resource allocator will only offer resources from agents in + the same region as the master. To receive offers from all regions, a + framework must set the `REGION_AWARE` capability bit in its FrameworkInfo. + + +# Example + +A short example will serve to illustrate these concepts. WayForward Technologies +runs a successful website that allows users to purchase things that they want +to have. + +To do this, it owns a data center in San Francisco, in which it runs a number of +custom Mesos frameworks. All agents within the data center are configured with +the same region `sf`, and the individual racks inside the data center are used +as zones. + +The three mesos masters are placed in different server racks in the data center, +which gives them enough isolation to withstand events like a whole rack losing +power or network connectivity but still have low-enough latency for +quorum writes. + +One of the provided services is a real-time view of the company's inventory. +The framework providing this service is placing all of its tasks in the same +zone as the database server, to take advantage of the high-speed, low-latency +link so it can always display the latest results. + +During peak hours, it might happen that the computing power required to operate +the website exceeds the capacity of the data center. To avoid unnecessary +hardware purchases, WayForward Technologies contracted with a third-party cloud +provider TPC. The machines from this provider are placed in a different +region `tpc`, and the zones are configured to correspond to the availability +zones provided by TPC. All relevant frameworks are updated with the +`REGION_AWARE` bit in their `FrameworkInfo` and their scheduling logic is +updated so that they can schedule tasks in the cloud if required. + +Non-region aware frameworks will now only receive offers from agents within +the data center, where the master nodes reside. Region-aware frameworks are +supposed to know when and if they should place their tasks in the data center +or with the cloud provider. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/mesos/blob/5a143357/docs/home.md ---------------------------------------------------------------------- diff --git a/docs/home.md b/docs/home.md index f5b65cc..91d5bcb 100644 --- a/docs/home.md +++ b/docs/home.md @@ -26,6 +26,7 @@ layout: documentation * [Monitoring](monitoring.md) * [Operational Guide](operational-guide.md) * [Fetcher Cache Configuration](fetcher.md) +* [Fault Domains](fault-domains.md) ## Resource Management * [Attributes and Resources](attributes-resources.md) for how to describe the agents that comprise a cluster.