GitHub user UtkarshMe opened a pull request: https://github.com/apache/spark/pull/22822
[SPARK-25678] Requesting feedback regarding a prototype for adding PBS Professional as a cluster manager ## What changes were proposed in this pull request? *From Spark [JIRA ticket](https://issues.apache.org/jira/browse/SPARK-25678):* [PBS (Portable Batch System) Professional](https://github.com/pbspro/pbspro) is an open sourced workload management system for HPC clusters. Many organizations using PBS for managing their cluster also use Spark for Big Data but they are forced to divide the cluster into Spark cluster and PBS cluster either physically dividing the cluster nodes into two groups or starting Spark Standalone cluster manager's Master and Slaves as PBS jobs, leading to underutilization of resources. I am trying to add support in Spark to use PBS as a pluggable cluster manager. Going through the Spark codebase and looking at Mesos and Kubernetes integration, I found that we can get this working as follows: - Extend `ExternalClusterManager`. - Extend `CoarseGrainedSchedulerBackend` - This class can start `Executors` as PBS jobs. - The initial number of `Executors` are started `onStart`. - More `Executors` can be started as and when required using `doRequestTotalExecutors`. - `Executors` can be killed using `doKillExecutors`. - Extend `SparkApplication` to start `Driver` as a PBS job in cluster deploy mode. - This extended class can submit the Spark application again as a PBS job with deploy mode = client, so that the application driver is started on a node in the cluster. ## How was this patch tested? - Compiled with PBS support by sending `-Ppbs` flag to `build/mvn`. - I was able to run a basic `SparkPi`Java application with client and cluster deploy modes using `bin/spark-submit`: ```bash ./bin/spark-submit --master pbs --deploy-mode cluster --class org.apache.spark.examples.SparkPi spark-examples.jar 1000000 ``` - The TravisCI build seems to fail because of code lint/license comments ## I have a couple of questions: - Does this seem like a good idea to do this or should we look at other options? - What are the expectations from the initial prototype? - Would Spark maintainers look forward to merging this or would they want it to be maintained as a fork? CC: @sakshamgarg You can merge this pull request into a Git repository by running: $ git pull https://github.com/UtkarshMe/spark pbs_support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22822.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22822 ---- commit e3a97ebfbb8862b3a83fa5d01b1a8a3bd191f456 Author: Utkarsh <utkarsh.maheshwari@...> Date: 2018-08-29T13:39:01Z Add prototype for using PBS as external cluster manager ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org