GitHub user UtkarshMe opened a pull request:
https://github.com/apache/spark/pull/22822
[SPARK-25678] Requesting feedback regarding a prototype for adding PBS
Professional as a cluster manager
## What changes were proposed in this pull request?
*From Spark [JIRA
ticket](https://issues.apache.org/jira/browse/SPARK-25678):*
[PBS (Portable Batch System)
Professional](https://github.com/pbspro/pbspro) is an open sourced workload
management system for HPC clusters. Many organizations using PBS for managing
their cluster also use Spark for Big Data but they are forced to divide the
cluster into Spark cluster and PBS cluster either physically dividing the
cluster nodes into two groups or starting Spark Standalone cluster manager's
Master and Slaves as PBS jobs, leading to underutilization of resources.
I am trying to add support in Spark to use PBS as a pluggable cluster
manager. Going through the Spark codebase and looking at Mesos and Kubernetes
integration, I found that we can get this working as follows:
- Extend `ExternalClusterManager`.
- Extend `CoarseGrainedSchedulerBackend`
- This class can start `Executors` as PBS jobs.
- The initial number of `Executors` are started `onStart`.
- More `Executors` can be started as and when required using
`doRequestTotalExecutors`.
- `Executors` can be killed using `doKillExecutors`.
- Extend `SparkApplication` to start `Driver` as a PBS job in cluster
deploy mode.
- This extended class can submit the Spark application again as a PBS job
with deploy mode = client, so that the application driver is started on a node
in the cluster.
## How was this patch tested?
- Compiled with PBS support by sending `-Ppbs` flag to `build/mvn`.
- I was able to run a basic `SparkPi`Java application with client and
cluster deploy modes using `bin/spark-submit`:
```bash
./bin/spark-submit --master pbs --deploy-mode cluster --class
org.apache.spark.examples.SparkPi spark-examples.jar 100
```
- The TravisCI build seems to fail because of code lint/license comments
## I have a couple of questions:
- Does this seem like a good idea to do this or should we look at other
options?
- What are the expectations from the initial prototype?
- Would Spark maintainers look forward to merging this or would they want
it to be maintained as a fork?
CC: @sakshamgarg
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/UtkarshMe/spark pbs_support
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22822.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22822
commit e3a97ebfbb8862b3a83fa5d01b1a8a3bd191f456
Author: Utkarsh
Date: 2018-08-29T13:39:01Z
Add prototype for using PBS as external cluster manager
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org