This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
     new 596aa71  [SPARK-38562][K8S][DOCS] Add doc for `Volcano` scheduler
596aa71 is described below

commit 596aa71c3a8e66a49fa9b033be67a9b939dd7580
Author: Yikun Jiang <yikunk...@gmail.com>
AuthorDate: Tue Mar 29 10:58:53 2022 -0700

    [SPARK-38562][K8S][DOCS] Add doc for `Volcano` scheduler
    
    ### What changes were proposed in this pull request?
    This is PR to doc for volcano scheduler capability for Spark on K8S.
    
    ### Why are the changes needed?
    Guide user how to use volcano as spark on kubernetes custom scheduler
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    CI passed
    
    Closes #35870 from Yikun/SPARK-38562.
    
    Authored-by: Yikun Jiang <yikunk...@gmail.com>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
    (cherry picked from commit 80deb24588aa8ba15db592504e90980d016f881b)
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 docs/running-on-kubernetes.md | 67 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index a262f98..763a966 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -1732,6 +1732,73 @@ Spark allows users to specify a custom Kubernetes 
schedulers.
   - Create additional Kubernetes custom resources for driver/executor 
scheduling.
   - Set scheduler hints according to configuration or existing Pod info 
dynamically.
 
+#### Using Volcano as Customized Scheduler for Spark on Kubernetes
+
+**This feature is currently experimental. In future versions, there may be 
behavioral changes around configuration, feature step improvement.**
+
+##### Prerequisites
+* Spark on Kubernetes with [Volcano](https://volcano.sh/en) as a custom 
scheduler is supported since Spark v3.3.0 and Volcano v1.5.1. Below is an 
example to install Volcano 1.5.1:
+
+  ```bash
+  # x86_64
+  kubectl apply -f 
https://raw.githubusercontent.com/volcano-sh/volcano/v1.5.1/installer/volcano-development.yaml
+
+  # arm64:
+  kubectl apply -f 
https://raw.githubusercontent.com/volcano-sh/volcano/v1.5.1/installer/volcano-development-arm64.yaml
+  ```
+
+##### Build
+To create a Spark distribution along with Volcano suppport like those 
distributed by the Spark [Downloads 
page](https://spark.apache.org/downloads.html), also see more in ["Building 
Spark"](https://spark.apache.org/docs/latest/building-spark.html):
+
+```bash
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive 
-Phive-thriftserver -Pkubernetes -Pvolcano
+```
+
+##### Usage
+Spark on Kubernetes allows using Volcano as a custom scheduler. Users can use 
Volcano to
+support more advanced resource scheduling: queue scheduling, resource 
reservation, priority scheduling, and more.
+
+To use Volcano as a custom scheduler the user needs to specify the following 
configuration options:
+
+```bash
+# Specify volcano scheduler and PodGroup template
+--conf spark.kubernetes.scheduler.name=volcano
+--conf 
spark.kubernetes.scheduler.volcano.podGroupTemplateFile=/path/to/podgroup-template.yaml
+# Specify driver/executor VolcanoFeatureStep
+--conf 
spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep
+--conf 
spark.kubernetes.executor.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep```
+```
+
+##### Volcano Feature Step
+Volcano feature steps help users to create a Volcano PodGroup and set 
driver/executor pod annotation to link with this 
[PodGroup](https://volcano.sh/en/docs/podgroup/).
+
+Note that currently only driver/job level PodGroup is supported in Volcano 
Feature Step.
+
+##### Volcano PodGroup Template
+Volcano defines PodGroup spec using [CRD 
yaml](https://volcano.sh/en/docs/podgroup/#example).
+
+Similar to [Pod template](#pod-template), Spark users can use Volcano PodGroup 
Template to define the PodGroup spec configurations.
+To do so, specify the Spark property 
`spark.kubernetes.scheduler.volcano.podGroupTemplateFile` to point to files 
accessible to the `spark-submit` process.
+Below is an example of PodGroup template:
+
+```yaml
+apiVersion: scheduling.volcano.sh/v1beta1
+kind: PodGroup
+spec:
+  # Specify minMember to 1 to make a driver pod
+  minMember: 1
+  # Specify minResources to support resource reservation (the driver pod 
resource and executors pod resource should be considered)
+  # It is useful for ensource the available resources meet the minimum 
requirements of the Spark job and avoiding the
+  # situation where drivers are scheduled, and then they are unable to 
schedule sufficient executors to progress.
+  minResources:
+    cpu: "2"
+    memory: "3Gi"
+  # Specify the priority, help users to specify job priority in the queue 
during scheduling.
+  priorityClassName: system-node-critical
+  # Specify the queue, indicates the resource queue which the job should be 
submitted to
+  queue: default
+```
+
 ### Stage Level Scheduling Overview
 
 Stage level scheduling is supported on Kubernetes when dynamic allocation is 
enabled. This also requires 
<code>spark.dynamicAllocation.shuffleTracking.enabled</code> to be enabled 
since Kubernetes doesn't support an external shuffle service at this time. The 
order in which containers for different profiles is requested from Kubernetes 
is not guaranteed. Note that since dynamic allocation on Kubernetes requires 
the shuffle tracking feature, this means that executors from previous stages t 
[...]

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to