Utkarsh Maheshwari created SPARK-25678:
------------------------------------------

             Summary: SPIP: Adding support in Spark for HPC cluster manager 
(PBS Professional)
                 Key: SPARK-25678
                 URL: https://issues.apache.org/jira/browse/SPARK-25678
             Project: Spark
          Issue Type: New Feature
          Components: Scheduler
    Affects Versions: 3.0.0
            Reporter: Utkarsh Maheshwari


I sent an email on the dev mailing list but got no response, hence filing a 
JIRA ticket.

 

PBS (Portable Batch System) Professional is an open sourced workload management 
system for HPC clusters. Many organizations using PBS for managing their 
cluster also use Spark for Big Data but they are forced to divide the cluster 
into Spark cluster and PBS cluster either physically dividing the cluster nodes 
into two groups or starting Spark Standalone cluster manager's Master and 
Slaves as PBS jobs, leading to underutilization of resources.
 
 I am trying to add support in Spark to use PBS as a pluggable cluster manager. 
Going through the Spark codebase and looking at Mesos and Kubernetes 
integration, I found that we can get this working as follows:
 
 - Extend `ExternalClusterManager`.
 - Extend `CoarseGrainedSchedulerBackend`
   - This class can start `Executors` as PBS jobs.
   - The initial number of `Executors` are started `onStart`.
   - More `Executors` can be started as and when required using 
`doRequestTotalExecutors`.
   - `Executors` can be killed using `doKillExecutors`.
 - Extend `SparkApplication` to start `Driver` as a PBS job in cluster deploy 
mode.
   - This extended class can submit the Spark application again as a PBS job 
which with deploy mode = client, so that the application driver is started on a 
node in the cluster.
 
 I have a couple of questions:
 - Does this seem like a good idea to do this or should we look at other 
options?
 - What are the expectations from the initial prototype?
 - If this works, would Spark maintainers look forward to merging this or would 
they want it to be maintained as a fork?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to