[spark] branch master updated: [SPARK-27492][DOC][YARN][K8S][CORE] Resource scheduling high level user docs

tgraves Wed, 11 Sep 2019 06:23:21 -0700

This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new b425f8e  [SPARK-27492][DOC][YARN][K8S][CORE] Resource scheduling high 
level user docs
b425f8e is described below

commit b425f8ee6599f53f47d7d4a8f0c27f2ba7d2eab9
Author: Thomas Graves <tgra...@nvidia.com>
AuthorDate: Wed Sep 11 08:22:36 2019 -0500

    [SPARK-27492][DOC][YARN][K8S][CORE] Resource scheduling high level user docs
    
    ### What changes were proposed in this pull request?
    
    Document the resource scheduling feature - 
https://issues.apache.org/jira/browse/SPARK-24615
    Add general docs, yarn, kubernetes, and standalone cluster specific ones.
    
    ### Why are the changes needed?
    Help users understand the feature
    
    ### Does this PR introduce any user-facing change?
    docs
    
    ### How was this patch tested?
    N/A
    
    Closes #25698 from tgravescs/SPARK-27492-gpu-sched-docs.
    
    Authored-by: Thomas Graves <tgra...@nvidia.com>
    Signed-off-by: Thomas Graves <tgra...@apache.org>
---
 docs/configuration.md         | 14 +++++++++++++-
 docs/running-on-kubernetes.md | 11 +++++++++++
 docs/running-on-yarn.md       | 14 ++++++++++++++
 docs/spark-standalone.md      | 12 ++++++++++++
 4 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index 9933283..5cf42d5 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -230,7 +230,7 @@ of the most common options to set are:
     write to STDOUT a JSON string in the format of the ResourceInformation 
class. This has a
     name and an array of addresses. For a client-submitted driver in 
Standalone, discovery
     script must assign different resource addresses to this driver comparing 
to workers' and
-    other dirvers' when <code>spark.resources.coordinate.enable</code> is off.
+    other drivers' when <code>spark.resources.coordinate.enable</code> is off.
   </td>
 </tr>
 <tr>
@@ -2617,3 +2617,15 @@ Also, you can modify or add configurations at runtime:
   --conf spark.hadoop.abc.def=xyz \ 
   myApp.jar
 {% endhighlight %}
+
+# Custom Resource Scheduling and Configuration Overview
+
+GPUs and other accelerators have been widely used for accelerating special 
workloads, e.g.,
+deep learning and signal processing. Spark now supports requesting and 
scheduling generic resources, such as GPUs, with a few caveats. The current 
implementation requires that the resource have addresses that can be allocated 
by the scheduler. It requires your cluster manager to support and be properly 
configured with the resources.
+
+There are configurations available to request resources for the driver: 
<code>spark.driver.resource.{resourceName}.amount</code>, request resources for 
the executor(s): <code>spark.executor.resource.{resourceName}.amount</code> and 
specify the requirements for each task: 
<code>spark.task.resource.{resourceName}.amount</code>. The 
<code>spark.driver.resource.{resourceName}.discoveryScript</code> config is 
required on YARN, Kubernetes and a client side Driver on Spark Standalone. 
<code>spa [...]
+
+Spark will use the configurations specified to first request containers with 
the corresponding resources from the cluster manager. Once it gets the 
container, Spark launches an Executor in that container which will discover 
what resources the container has and the addresses associated with each 
resource. The Executor will register with the Driver and report back the 
resources available to that Executor. The Spark scheduler can then schedule 
tasks to each Executor and assign specific reso [...]
+
+See your cluster manager specific page for requirements and details on each of 
- [YARN](running-on-yarn.html#resource-allocation-and-configuration-overview), 
[Kubernetes](running-on-kubernetes.html#resource-allocation-and-configuration-overview)
 and [Standalone 
Mode](spark-standalone.html#resource-allocation-and-configuration-overview). It 
is currently not available with Mesos or local mode. If using local-cluster 
mode see the Spark Standalone documentation but be aware only a single wor [...]
+
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 2d4e5cd..4ef738e 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -1266,3 +1266,14 @@ The following affect the driver and executor containers. 
All other containers in
   </td>
 </tr>
 </table>
+
+### Resource Allocation and Configuration Overview
+
+Please make sure to have read the Custom Resource Scheduling and Configuration 
Overview section on the [configuration page](configuration.html). This section 
only talks about the Kubernetes specific aspects of resource scheduling.
+
+The user is responsible to properly configuring the Kubernetes cluster to have 
the resources available and ideally isolate each resource per container so that 
a resource is not shared between multiple containers. If the resource is not 
isolated the user is responsible for writing a discovery script so that the 
resource is not shared between containers. See the Kubernetes documentation for 
specifics on configuring Kubernetes with [custom 
resources](https://kubernetes.io/docs/concepts/exte [...]
+
+Spark automatically handles translating the Spark configs 
<code>spark.{driver/executor}.resource.{resourceType}</code> into the 
kubernetes configs as long as the Kubernetes resource type follows the 
Kubernetes device plugin format of `vendor-domain/resourcetype`. The user must 
specify the vendor using the 
<code>spark.{driver/executor}.resource.{resourceType}.vendor</code> config. The 
user does not need to explicitly add anything if you are using Pod templates. 
For reference and an exampl [...]
+
+Kubernetes does not tell Spark the addresses of the resources allocated to 
each container. For that reason, the user must specify a discovery script that 
gets run by the executor on startup to discover what resources are available to 
that executor. You can find an example scripts in 
`examples/src/main/scripts/getGpusResources.sh`. The script must have execute 
permissions set and the user should setup permissions to not allow malicious 
users to modify it. The script should write to STDOUT [...]
+
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index d3d049e..418db41 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -542,6 +542,20 @@ For example, suppose you would like to point log url link 
to Job History Server
 
  NOTE: you need to replace `<JHS_POST>` and `<JHS_PORT>` with actual value.
 
+# Resource Allocation and Configuration Overview
+
+Please make sure to have read the Custom Resource Scheduling and Configuration 
Overview section on the [configuration page](configuration.html). This section 
only talks about the YARN specific aspects of resource scheduling.
+
+YARN needs to be configured to support any resources the user wants to use 
with Spark. Resource scheduling on YARN was added in YARN 3.1.0. See the YARN 
documentation for more information on configuring resources and properly 
setting up isolation. Ideally the resources are setup isolated so that an 
executor can only see the resources it was allocated. If you do not have 
isolation enabled, the user is responsible for creating a discovery script that 
ensures the resource is not shared betw [...]
+
+YARN currently supports any user defined resource type but has built in types 
for GPU (<code>yarn.io/gpu</code>) and FPGA (<code>yarn.io/fpga</code>). For 
that reason, if you are using either of those resources, Spark can translate 
your request for spark resources into YARN resources and you only have to 
specify the <code>spark.{driver/executor}.resource.</code> configs. If you are 
using a resource other then FPGA or GPU, the user is responsible for specifying 
the configs for both YARN ( [...]
+
+For example, the user wants to request 2 GPUs for each executor. The user can 
just specify <code>spark.executor.resource.gpu.amount=2</code> and Spark will 
handle requesting <code>yarn.io/gpu</code> resource type from YARN.
+
+If the user has a user defined YARN resource, lets call it `acceleratorX` then 
the user must specify 
<code>spark.yarn.executor.resource.acceleratorX.amount=2</code> and 
<code>spark.executor.resource.acceleratorX.amount=2</code>.
+
+YARN does not tell Spark the addresses of the resources allocated to each 
container. For that reason, the user must specify a discovery script that gets 
run by the executor on startup to discover what resources are available to that 
executor. You can find an example scripts in 
`examples/src/main/scripts/getGpusResources.sh`. The script must have execute 
permissions set and the user should setup permissions to not allow malicious 
users to modify it. The script should write to STDOUT a JSO [...]
+
 # Important notes
 
 - Whether core requests are honored in scheduling decisions depends on which 
scheduler is in use and how it is configured.
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index bc77469..1af0bef 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -340,6 +340,18 @@ SPARK_WORKER_OPTS supports the following system properties:
 </tr>
 </table>
 
+# Resource Allocation and Configuration Overview
+
+Please make sure to have read the Custom Resource Scheduling and Configuration 
Overview section on the [configuration page](configuration.html). This section 
only talks about the Spark Standalone specific aspects of resource scheduling.
+
+Spark Standalone has 2 parts, the first is configuring the resources for the 
Worker, the second is the resource allocation for a specific application.
+
+The user must configure the Workers to have a set of resources available so 
that it can assign them out to Executors. The 
<code>spark.worker.resource.{resourceName}.amount</code> is used to control the 
amount of each resource the worker has allocated. The user must also specify 
either <code>spark.worker.resourcesFile</code> or 
<code>spark.worker.resource.{resourceName}.discoveryScript</code> to specify 
how the Worker discovers the resources its assigned. See the descriptions above 
for ea [...]
+
+The second part is running an application on Spark Standalone. The only 
special case from the standard Spark resource configs is when you are running 
the Driver in client mode. For a Driver in client mode, the user can specify 
the resources it uses via <code>spark.driver.resourcesfile</code> or 
<code>spark.driver.resources.{resourceName}.discoveryScript</code>. If the 
Driver is running on the same host as other Drivers or Workers there are 2 ways 
to make sure the they don't use the same  [...]
+
+Note, the user does not need to specify a discovery script when submitting an 
application as the Worker will start each Executor with the resources it 
allocates to it.
+
 # Connecting an Application to the Cluster
 
 To run an application on the Spark cluster, simply pass the `spark://IP:PORT` 
URL of the master as to the [`SparkContext`


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27492][DOC][YARN][K8S][CORE] Resource scheduling high level user docs

Reply via email to