[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

tnachen Wed, 11 Nov 2015 14:04:58 -0800

Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9637#discussion_r44593380
  
    --- Diff: docs/job-scheduling.md ---
    @@ -56,36 +56,31 @@ provide another approach to share RDDs.
     
     ## Dynamic Resource Allocation
     
    -Spark 1.2 introduces the ability to dynamically scale the set of cluster 
resources allocated to
    -your application up and down based on the workload. This means that your 
application may give
    -resources back to the cluster if they are no longer used and request them 
again later when there
    -is demand. This feature is particularly useful if multiple applications 
share resources in your
    -Spark cluster. If a subset of the resources allocated to an application 
becomes idle, it can be
    -returned to the cluster's pool of resources and acquired by other 
applications. In Spark, dynamic
    -resource allocation is performed on the granularity of the executor and 
can be enabled through
    -`spark.dynamicAllocation.enabled`.
    -
    -This feature is currently disabled by default and available only on 
[YARN](running-on-yarn.html).
    -A future release will extend this to [standalone 
mode](spark-standalone.html) and
    -[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes). Note 
that although Spark on
    -Mesos already has a similar notion of dynamic resource sharing in 
fine-grained mode, enabling
    -dynamic allocation allows your Mesos application to take advantage of 
coarse-grained low-latency
    -scheduling while sharing cluster resources efficiently.
    +Spark provides a mechanism to dynamically adjust the resources your 
application occupies based
    +on the workload. This means that your application may give resources back 
to the cluster if they
    +are no longer used and request them again later when there is demand. This 
feature is particularly
    +useful if multiple applications share resources in your Spark cluster.
    +
    +This feature is disabled by default and available on all coarse-grained 
cluster managers, i.e.
    +[standalone mode](spark-standalone.html), [YARN 
mode](running-on-yarn.html), and
    +[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes).
     
     ### Configuration and Setup
     
    -All configurations used by this feature live under the 
`spark.dynamicAllocation.*` namespace.
    -To enable this feature, your application must set 
`spark.dynamicAllocation.enabled` to `true`.
    -Other relevant configurations are described on the
    -[configurations page](configuration.html#dynamic-allocation) and in the 
subsequent sections in
    -detail.
    +There are two requirements for using this feature. First, your application 
must set
    +`spark.dynamicAllocation.enabled` to `true`. Second, you must set up an 
*external shuffle service*
    +on each worker node in the same cluster and set 
`spark.shuffle.service.enabled` to true in your
    +application. The purpose of the external shuffle service is to allow 
executors to be removed
    +without deleting shuffle files written by them (more detail described
    +[below](job-scheduling.html#graceful-decommission-of-executors)). The way 
to set up this service
    +varies across cluster managers:
    +
    +In standalone mode, simply start your workers with 
`spark.shuffle.service.enabled` set to `true`.
     
    -Additionally, your application must use an external shuffle service. The 
purpose of the service is
    -to preserve the shuffle files written by executors so the executors can be 
safely removed (more
    -detail described 
[below](job-scheduling.html#graceful-decommission-of-executors)). To enable
    -this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this 
external shuffle service
    -is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that 
runs in each `NodeManager`
    -in your cluster. To start this service, follow these steps:
    +In Mesos coarse-grained mode, run 
`$SPARK_HOME/sbin/start-mesos-shuffle-service.sh` on all
    +slave nodes with `spark.shuffle.service.enabled` set to `true`.
    --- End diff --
    
    I'd like to add that users can run the mesos-shuffle-service.sh with 
Marathon, and they should start the service in the foreground running 
`spark-class org.apache.spark.deploy.mesos.MesosExternalShuffleService`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Reply via email to