GitHub user UtkarshMe opened a pull request:

    https://github.com/apache/spark/pull/22822

    [SPARK-25678] Requesting feedback regarding a prototype for adding PBS 
Professional as a cluster manager

    ## What changes were proposed in this pull request?
    *From Spark [JIRA 
ticket](https://issues.apache.org/jira/browse/SPARK-25678):*  
      
    [PBS (Portable Batch System) 
Professional](https://github.com/pbspro/pbspro) is an open sourced workload 
management system for HPC clusters. Many organizations using PBS for managing 
their cluster also use Spark for Big Data but they are forced to divide the 
cluster into Spark cluster and PBS cluster either physically dividing the 
cluster nodes into two groups or starting Spark Standalone cluster manager's 
Master and Slaves as PBS jobs, leading to underutilization of resources.
    
    I am trying to add support in Spark to use PBS as a pluggable cluster 
manager. Going through the Spark codebase and looking at Mesos and Kubernetes 
integration, I found that we can get this working as follows:
    
    - Extend `ExternalClusterManager`.
    - Extend `CoarseGrainedSchedulerBackend`
      - This class can start `Executors` as PBS jobs.
      - The initial number of `Executors` are started `onStart`.
      - More `Executors` can be started as and when required using 
`doRequestTotalExecutors`.
      - `Executors` can be killed using `doKillExecutors`.
    - Extend `SparkApplication` to start `Driver` as a PBS job in cluster 
deploy mode.
      - This extended class can submit the Spark application again as a PBS job 
with deploy mode = client, so that the application driver is started on a node 
in the cluster.
    
    
    ## How was this patch tested?
    - Compiled with PBS support by sending `-Ppbs` flag to `build/mvn`.
    - I was able to run a basic `SparkPi`Java application with client and 
cluster deploy modes using `bin/spark-submit`:
    ```bash
    ./bin/spark-submit --master pbs --deploy-mode cluster --class 
org.apache.spark.examples.SparkPi spark-examples.jar 1000000
    ```
    - The TravisCI build seems to fail because of code lint/license comments
    
    
    ## I have a couple of questions:
    
    - Does this seem like a good idea to do this or should we look at other 
options?
    - What are the expectations from the initial prototype?
    - Would Spark maintainers look forward to merging this or would they want 
it to be maintained as a fork?
    
    CC: @sakshamgarg

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/UtkarshMe/spark pbs_support

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22822.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22822
    
----
commit e3a97ebfbb8862b3a83fa5d01b1a8a3bd191f456
Author: Utkarsh <utkarsh.maheshwari@...>
Date:   2018-08-29T13:39:01Z

    Add prototype for using PBS as external cluster manager

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to