Matt Cheah created SPARK-19700:
----------------------------------

             Summary: Design an API for pluggable scheduler implementations
                 Key: SPARK-19700
                 URL: https://issues.apache.org/jira/browse/SPARK-19700
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.1.0
            Reporter: Matt Cheah


One point that was brought up in discussing SPARK-18278 was that schedulers 
cannot easily be added to Spark without forking the whole project. The main 
reason is that much of the scheduler's behavior fundamentally depends on the 
CoarseGrainedSchedulerBackend class, which is not part of the public API of 
Spark and is in fact quite a complex module. As resource management and 
allocation continues evolves, Spark will need to be integrated with more 
cluster managers, but maintaining support for all possible allocators in the 
Spark project would be untenable. Furthermore, it would be impossible for Spark 
to support proprietary frameworks that are developed by specific users for 
their other particular use cases.

Therefore, this ticket proposes making scheduler implementations fully 
pluggable. The idea is that Spark will provide a Java/Scala interface that is 
to be implemented by a scheduler that is backed by the cluster manager of 
interest. The user can compile their scheduler's code into a JAR that is placed 
on the driver's classpath. Finally, as is the case in the current world, the 
scheduler implementation is selected and dynamically loaded depending on the 
user's provided master URL.

Determining the correct API is the most challenging problem. The current 
CoarseGrainedSchedulerBackend handles many responsibilities, some of which will 
be common across all cluster managers, and some which will be specific to a 
particular cluster manager. For example, the particular mechanism for creating 
the executor processes will differ between YARN and Mesos, but, once these 
executors have started running, the means to submit tasks to them over the 
Netty RPC is identical across the board.

We must also consider a plugin model and interface for submitting the 
application as well, because different cluster managers support different 
configuration options, and thus the driver must be bootstrapped accordingly. 
For example, in YARN mode the application and Hadoop configuration must be 
packaged and shipped to the distributed cache prior to launching the job. A 
prototype of a Kubernetes implementation starts a Kubernetes pod that runs the 
driver in cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to