Repository: mesos
Updated Branches:
  refs/heads/master 25feb869e -> 5a1433576


Added documentation for fault domains.

Fault domains are a new feature in 1.5 which did not yet have
a corresponding description in the documentation.

Review: https://reviews.apache.org/r/65437/


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/5a143357
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/5a143357
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/5a143357

Branch: refs/heads/master
Commit: 5a1433576eca20f15f1ea309fc202f4bbaf3b6c7
Parents: 25feb86
Author: Benno Evers <bev...@mesosphere.com>
Authored: Fri Feb 2 14:12:21 2018 -0800
Committer: Vinod Kone <vinodk...@gmail.com>
Committed: Fri Feb 2 14:14:04 2018 -0800

----------------------------------------------------------------------
 docs/configuration/master-and-agent.md |   3 +
 docs/fault-domains.md                  | 106 ++++++++++++++++++++++++++++
 docs/home.md                           |   1 +
 3 files changed, 110 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/5a143357/docs/configuration/master-and-agent.md
----------------------------------------------------------------------
diff --git a/docs/configuration/master-and-agent.md 
b/docs/configuration/master-and-agent.md
index f247498..df18681 100644
--- a/docs/configuration/master-and-agent.md
+++ b/docs/configuration/master-and-agent.md
@@ -95,6 +95,9 @@ different zones). Agents configured to use a different region 
than the
 master's region will not appear in resource offers to frameworks that have
 not enabled the <code>REGION_AWARE</code> capability. This value can be
 specified as either a JSON-formatted string or a file path containing JSON.
+
+See the [documentation](../fault-domains.md) for further details.
+
 <p/>
 Example:
 <pre><code>{

http://git-wip-us.apache.org/repos/asf/mesos/blob/5a143357/docs/fault-domains.md
----------------------------------------------------------------------
diff --git a/docs/fault-domains.md b/docs/fault-domains.md
new file mode 100644
index 0000000..08f13b5
--- /dev/null
+++ b/docs/fault-domains.md
@@ -0,0 +1,106 @@
+---
+title: Apache Mesos - Domains and Regions
+layout: documentation
+---
+
+# Regions and Fault Domains
+
+Starting with Mesos 1.5, it is possible to place Mesos masters and agents into
+*domains*, which are logical groups of machines that share some 
characteristics.
+
+Currently, fault domains are the only supported type of domains, which are
+groups of machines with similar failure characteristics.
+
+A fault domain is a 2 level hierarchy of regions and zones. The mapping from
+fault domains to physical infrastructure is up to the operator to configure,
+although it is recommended that machines in the same zones have low latency to
+each other.
+
+In cloud environments, regions and zones can be mapped to the "region" and
+"availability zone" concepts exposed by most cloud providers, respectively.
+In on-premise deployments, regions and zones can be mapped to data centers and
+racks, respectively.
+
+Schedulers may prefer to place network-intensive workloads in the same domain,
+as this may improve performance. Conversely, a single failure that affects a
+host in a domain may be more likely to affect other hosts in the same domain;
+hence, schedulers may prefer to place workloads that require high availability
+in multiple domains. For example, all the hosts in a single rack might lose
+power or network connectivity simultaneously.
+
+The `--domain` flag can be used to specify the fault domain of a master or
+agent node. The value of this flag must be a file path or a JSON dictionary
+with the key `fault_domain` and subkeys `region` and `zone` mapping to
+arbitrary strings:
+
+    mesos-master --domain='{"fault_domain": {"region": "eu", "zone": "rack1"}}'
+
+    mesos-agent  --domain='{"fault_domain": {"region": "eu", "zone": "rack2"}}'
+
+Frameworks can learn about the domain of an agent by inspecting the `domain`
+field in the received offer, which contains a `DomainInfo` that has the
+same structure as the JSON dictionary above.
+
+
+# Constraints
+
+When configuring fault domains for the masters and agents, the following
+constraints must be obeyed:
+
+ * If a mesos master is not configured with a domain, it will reject connection
+   attempts from agents with a domain.
+
+   This is done because the master is not able to determine whether or not the
+   agent would be remote in this case.
+
+ * Agents with no configured domain are assumed to be in the same domain as the
+   master.
+
+   If this behaviour isn't desired, the `--require_agent_domain` flag on the
+   master can be used to enforce that domains are configured on all agents by
+   having the master reject all registration attempts by agents without a
+   configured domain.
+
+ * If one master is configured with a domain, all other masters must be in the
+   same "region" to avoid cross-region quorum writes. It is recommended to put
+   them in different zones within that region for high availability.
+
+ * The default DRF resource allocator will only offer resources from agents in
+   the same region as the master. To receive offers from all regions, a
+   framework must set the `REGION_AWARE` capability bit in its FrameworkInfo.
+
+
+# Example
+
+A short example will serve to illustrate these concepts. WayForward 
Technologies
+runs a successful website that allows users to purchase things that they want
+to have.
+
+To do this, it owns a data center in San Francisco, in which it runs a number 
of
+custom Mesos frameworks. All agents within the data center are configured with
+the same region `sf`, and the individual racks inside the data center are used
+as zones.
+
+The three mesos masters are placed in different server racks in the data 
center,
+which gives them enough isolation to withstand events like a whole rack losing
+power or network connectivity but still have low-enough latency for
+quorum writes.
+
+One of the provided services is a real-time view of the company's inventory.
+The framework providing this service is placing all of its tasks in the same
+zone as the database server, to take advantage of the high-speed, low-latency
+link so it can always display the latest results.
+
+During peak hours, it might happen that the computing power required to operate
+the website exceeds the capacity of the data center. To avoid unnecessary
+hardware purchases, WayForward Technologies contracted with a third-party cloud
+provider TPC. The machines from this provider are placed in a different
+region `tpc`, and the zones are configured to correspond to the availability
+zones provided by TPC. All relevant frameworks are updated with the
+`REGION_AWARE` bit in their `FrameworkInfo` and their scheduling logic is
+updated so that they can schedule tasks in the cloud if required.
+
+Non-region aware frameworks will now only receive offers from agents within
+the data center, where the master nodes reside. Region-aware frameworks are
+supposed to know when and if they should place their tasks in the data center
+or with the cloud provider.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/mesos/blob/5a143357/docs/home.md
----------------------------------------------------------------------
diff --git a/docs/home.md b/docs/home.md
index f5b65cc..91d5bcb 100644
--- a/docs/home.md
+++ b/docs/home.md
@@ -26,6 +26,7 @@ layout: documentation
 * [Monitoring](monitoring.md)
 * [Operational Guide](operational-guide.md)
 * [Fetcher Cache Configuration](fetcher.md)
+* [Fault Domains](fault-domains.md)
 
 ## Resource Management
 * [Attributes and Resources](attributes-resources.md) for how to describe the 
agents that comprise a cluster.

Reply via email to