[jira] [Updated] (SPARK-29593) Enhance Cluster Managers to be Pluggable

Kevin Doyle (Jira) Thu, 24 Oct 2019 10:17:29 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-29593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kevin Doyle updated SPARK-29593:
--------------------------------
    Description: 
Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
Kubernetes forked the code to build it and then bring it into Spark. Lots of 
work is still going on with the Kubernetes cluster manager. It should be able 
to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
This will also benefit enterprise companies that have their own cluster 
managers that aren't open source, so can't be part of Spark itself.

High level idea to be discussed for additional options:
 1. Make the cluster manager pluggable.
 2. Have the Spark Standalone cluster manager ship with Spark by default and be 
the base cluster manager others can inherit from. Others can be shipped or not 
shipped at same time.
 3. Each Cluster Manager can ship additional jars that can be placed inside 
Spark, then with a configuration file define the cluster manager Spark runs 
with. 
 4. The configuration file can define which classes to use for the various 
parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
different one.
 5. Based on the classes that are allowed to be switched out in the Spark code 
we can use code like the following to load a different class.

–+val+ +clazz+ = Class.forName("*<Some Scheduler Class we got the class name 
from configuration file*")
 +val+ cons = clazz.getConstructor(classOf[SparkContext])
 cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]

  was:
Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
Kubernetes forked the code to build it and then bring it into Spark. Lots of 
work is still going on with the Kubernetes cluster manager. It should be able 
to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
This will also benefit enterprise companies that have their own cluster 
managers that aren't open source, so can't be part of Spark itself.

High level idea to be discussed for additional options:
 1. Make the cluster manager pluggable.
 2. Have the Spark Standalone cluster manager ship with Spark by default and be 
the base cluster manager others can inherit from. Others can be shipped or not 
shipped at same time.
 3. Each Cluster Manager can ship additional jars that can be placed inside 
Spark, then with a configuration file define the cluster manager Spark runs 
with. 
 4. The configuration file can define which classes to use for the various 
parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
different one.
 5. Based on the classes that are allowed to be switched out in the Spark code 
we can use code like the following to load a different class.

–+val+ +clazz+ = Class.forName("*<Some Scheduler Class we got the class name 
from configuration file*")
 +val+ cons = clazz.getConstructor(classOf[SparkContext])
 cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]

Proposal discussed at Spark + AI Summit Europe 2019: 
[https://databricks.com/session_eu19/refactoring-apache-spark-to-allow-additional-cluster-managers]


> Enhance Cluster Managers to be Pluggable
> ----------------------------------------
>
>                 Key: SPARK-29593
>                 URL: https://issues.apache.org/jira/browse/SPARK-29593
>             Project: Spark
>          Issue Type: New Feature
>          Components: Scheduler
>    Affects Versions: 2.4.4
>            Reporter: Kevin Doyle
>            Priority: Major
>
> Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
> Kubernetes forked the code to build it and then bring it into Spark. Lots of 
> work is still going on with the Kubernetes cluster manager. It should be able 
> to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
> This will also benefit enterprise companies that have their own cluster 
> managers that aren't open source, so can't be part of Spark itself.
> High level idea to be discussed for additional options:
>  1. Make the cluster manager pluggable.
>  2. Have the Spark Standalone cluster manager ship with Spark by default and 
> be the base cluster manager others can inherit from. Others can be shipped or 
> not shipped at same time.
>  3. Each Cluster Manager can ship additional jars that can be placed inside 
> Spark, then with a configuration file define the cluster manager Spark runs 
> with. 
>  4. The configuration file can define which classes to use for the various 
> parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
> different one.
>  5. Based on the classes that are allowed to be switched out in the Spark 
> code we can use code like the following to load a different class.
> –+val+ +clazz+ = Class.forName("*<Some Scheduler Class we got the class name 
> from configuration file*")
>  +val+ cons = clazz.getConstructor(classOf[SparkContext])
>  cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29593) Enhance Cluster Managers to be Pluggable

Reply via email to