spark git commit: [SPARK-11667] Update dynamic allocation docs to reflect supported cluster managers

andrewor14 Thu, 12 Nov 2015 15:49:27 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 a98cac26f -> 782885786



[SPARK-11667] Update dynamic allocation docs to reflect supported cluster 
managers

Author: Andrew Or <and...@databricks.com>

Closes #9637 from andrewor14/update-da-docs.

(cherry picked from commit 12a0784ac0f314a606f1237e7144eb1355421307)
Signed-off-by: Andrew Or <and...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/78288578
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/78288578
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/78288578

Branch: refs/heads/branch-1.6
Commit: 782885786032da72e9a76e93e1dbeb9643e572dd
Parents: a98cac2
Author: Andrew Or <and...@databricks.com>
Authored: Thu Nov 12 15:48:42 2015 -0800
Committer: Andrew Or <and...@databricks.com>
Committed: Thu Nov 12 15:48:59 2015 -0800

----------------------------------------------------------------------
 docs/job-scheduling.md | 55 ++++++++++++++++++++++-----------------------
 1 file changed, 27 insertions(+), 28 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/78288578/docs/job-scheduling.md
----------------------------------------------------------------------
diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md
index 8d9c2ba..a3c34cb 100644
--- a/docs/job-scheduling.md
+++ b/docs/job-scheduling.md
@@ -56,36 +56,32 @@ provide another approach to share RDDs.
 
 ## Dynamic Resource Allocation
 
-Spark 1.2 introduces the ability to dynamically scale the set of cluster 
resources allocated to
-your application up and down based on the workload. This means that your 
application may give
-resources back to the cluster if they are no longer used and request them 
again later when there
-is demand. This feature is particularly useful if multiple applications share 
resources in your
-Spark cluster. If a subset of the resources allocated to an application 
becomes idle, it can be
-returned to the cluster's pool of resources and acquired by other 
applications. In Spark, dynamic
-resource allocation is performed on the granularity of the executor and can be 
enabled through
-`spark.dynamicAllocation.enabled`.
-
-This feature is currently disabled by default and available only on 
[YARN](running-on-yarn.html).
-A future release will extend this to [standalone mode](spark-standalone.html) 
and
-[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes). Note that 
although Spark on
-Mesos already has a similar notion of dynamic resource sharing in fine-grained 
mode, enabling
-dynamic allocation allows your Mesos application to take advantage of 
coarse-grained low-latency
-scheduling while sharing cluster resources efficiently.
+Spark provides a mechanism to dynamically adjust the resources your 
application occupies based
+on the workload. This means that your application may give resources back to 
the cluster if they
+are no longer used and request them again later when there is demand. This 
feature is particularly
+useful if multiple applications share resources in your Spark cluster.
+
+This feature is disabled by default and available on all coarse-grained 
cluster managers, i.e.
+[standalone mode](spark-standalone.html), [YARN mode](running-on-yarn.html), 
and
+[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes).
 
 ### Configuration and Setup
 
-All configurations used by this feature live under the 
`spark.dynamicAllocation.*` namespace.
-To enable this feature, your application must set 
`spark.dynamicAllocation.enabled` to `true`.
-Other relevant configurations are described on the
-[configurations page](configuration.html#dynamic-allocation) and in the 
subsequent sections in
-detail.
+There are two requirements for using this feature. First, your application 
must set
+`spark.dynamicAllocation.enabled` to `true`. Second, you must set up an 
*external shuffle service*
+on each worker node in the same cluster and set 
`spark.shuffle.service.enabled` to true in your
+application. The purpose of the external shuffle service is to allow executors 
to be removed
+without deleting shuffle files written by them (more detail described
+[below](job-scheduling.html#graceful-decommission-of-executors)). The way to 
set up this service
+varies across cluster managers:
+
+In standalone mode, simply start your workers with 
`spark.shuffle.service.enabled` set to `true`.
 
-Additionally, your application must use an external shuffle service. The 
purpose of the service is
-to preserve the shuffle files written by executors so the executors can be 
safely removed (more
-detail described 
[below](job-scheduling.html#graceful-decommission-of-executors)). To enable
-this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this 
external shuffle service
-is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs 
in each `NodeManager`
-in your cluster. To start this service, follow these steps:
+In Mesos coarse-grained mode, run 
`$SPARK_HOME/sbin/start-mesos-shuffle-service.sh` on all
+slave nodes with `spark.shuffle.service.enabled` set to `true`. For instance, 
you may do so
+through Marathon.
+
+In YARN mode, start the shuffle service on each `NodeManager` as follows:
 
 1. Build Spark with the [YARN profile](building-spark.html). Skip this step if 
you are using a
 pre-packaged distribution.
@@ -95,10 +91,13 @@ pre-packaged distribution.
 2. Add this jar to the classpath of all `NodeManager`s in your cluster.
 3. In the `yarn-site.xml` on each node, add `spark_shuffle` to 
`yarn.nodemanager.aux-services`,
 then set `yarn.nodemanager.aux-services.spark_shuffle.class` to
-`org.apache.spark.network.yarn.YarnShuffleService`. Additionally, set all 
relevant
-`spark.shuffle.service.*` [configurations](configuration.html).
+`org.apache.spark.network.yarn.YarnShuffleService` and 
`spark.shuffle.service.enabled` to true.
 4. Restart all `NodeManager`s in your cluster.
 
+All other relevant configurations are optional and under the 
`spark.dynamicAllocation.*` and
+`spark.shuffle.service.*` namespaces. For more detail, see the
+[configurations page](configuration.html#dynamic-allocation).
+
 ### Resource Allocation Policy
 
 At a high level, Spark should relinquish executors when they are no longer 
used and acquire


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11667] Update dynamic allocation docs to reflect supported cluster managers

Reply via email to