Re: SPIP: Spark on Kubernetes

vaquar khan Wed, 30 Aug 2017 19:27:01 -0700

+1 (non-binding)

Regards,
Vaquar khan


On Mon, Aug 28, 2017 at 5:09 PM, Erik Erlandson <eerla...@redhat.com> wrote:

>
> In addition to the engineering & software aspects of the native Kubernetes
> community project, we have also worked at building out the community, with
> the goal of providing the foundation for sustaining engineering on the
> Kubernetes scheduler back-end.  That said, I agree 100% with your point
> that adding committers with kube-specific experience is good strategy for
> increasing review bandwidth to help service PRs from this community.
>
> On Mon, Aug 28, 2017 at 2:16 PM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> In my opinion, the fact that there are nearly no changes to spark-core,
>>> and most of our changes are additive should go to prove that this adds
>>> little complexity to the workflow of the committers.
>>
>>
>> Actually (and somewhat perversely), the otherwise praiseworthy isolation
>> of the Kubernetes code does mean that it adds complexity to the workflow of
>> the existing Spark committers. I'll reiterate Imran's concerns: The
>> existing Spark committers familiar with Spark's scheduler code have
>> adequate knowledge of the Standalone and Yarn implementations, and still
>> not sufficient coverage of Mesos. Adding k8s code to Spark would mean that
>> the progression of that code would start seeing the issues that the Mesos
>> code in Spark currently sees: Reviews and commits tend to languish because
>> we don't have currently active committers with sufficient knowledge and
>> cycles to deal with the Mesos PRs. Some of this is because the PMC needs to
>> get back to addressing the issue of adding new Spark committers who do have
>> the needed Mesos skills, but that isn't as simple as we'd like because
>> ideally a Spark committer has demonstrated skills across a significant
>> portion of the Spark code, not just tightly focused on one area (such as
>> Mesos or k8s integration.) In short, adding Kubernetes support directly
>> into Spark isn't likely (at least in the short-term) to be entirely
>> positive for the spark-on-k8s project, since merging of PRs to the
>> spark-on-k8s is very likely to be quite slow at least until such time as we
>> have k8s-focused Spark committers. If this project does end up getting
>> pulled into the Spark codebase, then the PMC will need to start looking at
>> bringing in one or more new committers who meet our requirements for such a
>> role and responsibility, and who also have k8s skills. The success and pace
>> of development of the spark-on-k8s will depend in large measure on the
>> PMC's ability to find such new committers.
>>
>> All that said, I'm +1 if the those currently responsible for the
>> spark-on-k8s project still want to bring the code into Spark.
>>
>>
>> On Mon, Aug 21, 2017 at 11:48 AM, Anirudh Ramanathan <
>> ramanath...@google.com.invalid> wrote:
>>
>>> Thank you for your comments Imran.
>>>
>>> Regarding integration tests,
>>>
>>> What you inferred from the documentation is correct -
>>> Integration tests do not require any prior setup or a Kubernetes cluster
>>> to run. Minikube is a single binary that brings up a one-node cluster and
>>> exposes the full Kubernetes API. It is actively maintained and kept up to
>>> date with the rest of the project. These local integration tests on Jenkins
>>> (like the ones with spark-on-yarn), should allow for the committers to
>>> merge changes with a high degree of confidence.
>>> I will update the proposal to include more information about the extent
>>> and kinds of testing we do.
>>>
>>> As for (b), people on this thread and the set of contributors on our
>>> fork are a fairly wide community of contributors and committers who would
>>> be involved in the maintenance long-term. It was one of the reasons behind
>>> developing separately as a fork. In my opinion, the fact that there are
>>> nearly no changes to spark-core, and most of our changes are additive
>>> should go to prove that this adds little complexity to the workflow of the
>>> committers.
>>>
>>> Separating out the cluster managers (into an as yet undecided new home)
>>> appears far more disruptive and a high risk change for the short term.
>>> However, when there is enough community support behind that effort, tracked
>>> in 19700 <https://issues.apache.org/jira/browse/SPARK-19700>; and if
>>> that is realized in the future, it wouldn't be difficult to switch over
>>> Kubernetes, YARN and Mesos to using the pluggable API. Currently, in my
>>> opinion, with the integration tests, active users, and a community of
>>> maintainers, Spark-on-Kubernetes would add minimal overhead and benefit a
>>> large (and growing) class of users.
>>>
>>> Lastly, the RSS is indeed separate and a value-add that we would love to
>>> share with other cluster managers as well.
>>>
>>> On Mon, Aug 21, 2017 at 10:17 AM, Imran Rashid <iras...@cloudera.com>
>>> wrote:
>>>
>>>> Overall this looks like a good proposal.  I do have some concerns which
>>>> I'd like to discuss -- please understand I'm taking a "devil's advocate"
>>>> stance here for discussion, not that I'm giving a -1.
>>>>
>>>> My primary concern is about testing and maintenance.  My concerns might
>>>> be addressed if the doc included a section on testing that might just be
>>>> this: https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2
>>>> -kubernetes/resource-managers/kubernetes/README.md#running-t
>>>> he-kubernetes-integration-tests
>>>>
>>>> but without the concerning warning "Note that the integration test
>>>> framework is currently being heavily revised and is subject to change".
>>>> I'd like the proposal to clearly indicate that some baseline testing can be
>>>> done by devs and in spark's regular jenkins builds without special access
>>>> to kubernetes clusters.
>>>>
>>>> Its worth noting that there *are* advantages to keeping it outside
>>>> Spark:
>>>> * when making changes to spark's scheduler, we do *not* have to worry
>>>> about how those changes impact kubernetes.  This simplifies things for
>>>> those making changes to spark
>>>> * making changes changes to the kubernetes integration is not blocked
>>>> by getting enough attention from spark's committers
>>>>
>>>> or in other words, each community of experts can maintain its focus.  I
>>>> have these concerns based on past experience with the mesos integration --
>>>> mesos contributors are blocked on committers reviewing their changes, and
>>>> then committers have no idea how to test that the changes are correct, and
>>>> find it hard to even learn the ins and outs of that code without access to
>>>> a mesos cluster.
>>>>
>>>> The same could be said for the yarn integration, but I think its helped
>>>> that (a) spark-on-yarn *does* have local tests for testing basic
>>>> integration and (b) there is a sufficient community of contributors and
>>>> committers for spark-on-yarn.   I realize (b) is a chicken-and-egg problem,
>>>> but I'd like to be sure that at least (a) is addressed.  (and maybe even
>>>> spark-on-yarn shouldln't be inside spark itself, as mridul said, but its
>>>> not clear what the other home should be.)
>>>>
>>>> At some point, this is just a judgement call, of the value it brings to
>>>> the spark community vs the added complexity.  I'm willing to believe that
>>>> kubernetes will bring enough value to make this worthwhile, just voicing my
>>>> concerns.
>>>>
>>>> Secondary concern:
>>>> the RSS doesn't seem necessary for kubernetes support, or specific to
>>>> it.  If its nice to have, and you want to add it to kubernetes first before
>>>> other cluster managers, fine, but seems separate from this proposal.
>>>>
>>>>
>>>>
>>>> On Tue, Aug 15, 2017 at 10:32 AM, Anirudh Ramanathan <
>>>> fox...@google.com.invalid> wrote:
>>>>
>>>>> Spark on Kubernetes effort has been developed separately in a fork,
>>>>> and linked back from the Apache Spark project as an experimental
>>>>> backend
>>>>> <http://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types>.
>>>>> We're ~6 months in, have had 5 releases
>>>>> <https://github.com/apache-spark-on-k8s/spark/releases>.
>>>>>
>>>>>    - 2 Spark versions maintained (2.1, and 2.2)
>>>>>    - Extensive integration testing and refactoring efforts to
>>>>>    maintain code quality
>>>>>    - Developer
>>>>>    <https://github.com/apache-spark-on-k8s/spark#getting-started> and
>>>>>    user-facing <https://apache-spark-on-k8s.github.io/userdocs/> docu
>>>>>    mentation
>>>>>    - 10+ consistent code contributors from different organizations
>>>>>    
>>>>> <https://apache-spark-on-k8s.github.io/userdocs/contribute.html#project-contributions>
>>>>>  involved
>>>>>    in actively maintaining and using the project, with several more 
>>>>> members
>>>>>    involved in testing and providing feedback.
>>>>>    - The community has delivered several talks on Spark-on-Kubernetes
>>>>>    generating lots of feedback from users.
>>>>>    - In addition to these, we've seen efforts spawn off such as:
>>>>>    - HDFS on Kubernetes
>>>>>       <https://github.com/apache-spark-on-k8s/kubernetes-HDFS> with
>>>>>       Locality and Performance Experiments
>>>>>       - Kerberized access
>>>>>       
>>>>> <https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg/edit>
>>>>>  to
>>>>>       HDFS from Spark running on Kubernetes
>>>>>
>>>>> *Following the SPIP process, I'm putting this SPIP up for a vote.*
>>>>>
>>>>>    - +1: Yeah, let's go forward and implement the SPIP.
>>>>>    - +0: Don't really care.
>>>>>    - -1: I don't think this is a good idea because of the following
>>>>>    technical reasons.
>>>>>
>>>>> If there is any further clarification desired, on the design or the
>>>>> implementation, please feel free to ask questions or provide feedback.
>>>>>
>>>>>
>>>>> SPIP: Kubernetes as A Native Cluster Manager
>>>>>
>>>>> Full Design Doc: link
>>>>> <https://issues.apache.org/jira/secure/attachment/12881586/SPARK-18278%20Spark%20on%20Kubernetes%20Design%20Proposal%20Revision%202%20%281%29.pdf>
>>>>>
>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-18278
>>>>>
>>>>> Kubernetes Issue: https://github.com/kubernetes/
>>>>> kubernetes/issues/34377
>>>>>
>>>>> Authors: Yinan Li, Anirudh Ramanathan, Erik Erlandson, Andrew Ash,
>>>>> Matt Cheah,
>>>>>
>>>>> Ilan Filonenko, Sean Suchter, Kimoon Kim
>>>>> Background and Motivation
>>>>>
>>>>> Containerization and cluster management technologies are constantly
>>>>> evolving in the cluster computing world. Apache Spark currently implements
>>>>> support for Apache Hadoop YARN and Apache Mesos, in addition to providing
>>>>> its own standalone cluster manager. In 2014, Google announced development
>>>>> of Kubernetes <https://kubernetes.io/> which has its own unique
>>>>> feature set and differentiates itself from YARN and Mesos. Since its 
>>>>> debut,
>>>>> it has seen contributions from over 1300 contributors with over 50000
>>>>> commits. Kubernetes has cemented itself as a core player in the cluster
>>>>> computing world, and cloud-computing providers such as Google Container
>>>>> Engine, Google Compute Engine, Amazon Web Services, and Microsoft Azure
>>>>> support running Kubernetes clusters.
>>>>>
>>>>> This document outlines a proposal for integrating Apache Spark with
>>>>> Kubernetes in a first class way, adding Kubernetes to the list of cluster
>>>>> managers that Spark can be used with. Doing so would allow users to share
>>>>> their computing resources and containerization framework between their
>>>>> existing applications on Kubernetes and their computational Spark
>>>>> applications. Although there is existing support for running a Spark
>>>>> standalone cluster on Kubernetes
>>>>> <https://github.com/kubernetes/examples/blob/master/staging/spark/README.md>,
>>>>> there are still major advantages and significant interest in having native
>>>>> execution support. For example, this integration provides better support
>>>>> for multi-tenancy and dynamic resource allocation. It also allows users to
>>>>> run applications of different Spark versions of their choices in the same
>>>>> cluster.
>>>>>
>>>>> The feature is being developed in a separate fork
>>>>> <https://github.com/apache-spark-on-k8s/spark> in order to minimize
>>>>> risk to the main project during development. Since the start of the
>>>>> development in November of 2016, it has received over 100 commits from 
>>>>> over
>>>>> 20 contributors and supports two releases based on Spark 2.1 and 2.2
>>>>> respectively. Documentation is also being actively worked on both in the
>>>>> main project repository and also in the repository
>>>>> https://github.com/apache-spark-on-k8s/userdocs. Regarding real-world
>>>>> use cases, we have seen cluster setup that uses 1000+ cores. We are also
>>>>> seeing growing interests on this project from more and more organizations.
>>>>>
>>>>> While it is easy to bootstrap the project in a forked repository, it
>>>>> is hard to maintain it in the long run because of the tricky process of
>>>>> rebasing onto the upstream and lack of awareness in the large Spark
>>>>> community. It would be beneficial to both the Spark and Kubernetes
>>>>> community seeing this feature being merged upstream. On one hand, it gives
>>>>> Spark users the option of running their Spark workloads along with other
>>>>> workloads that may already be running on Kubernetes, enabling better
>>>>> resource sharing and isolation, and better cluster administration. On the
>>>>> other hand, it gives Kubernetes a leap forward in the area of large-scale
>>>>> data processing by being an officially supported cluster manager for 
>>>>> Spark.
>>>>> The risk of merging into upstream is low because most of the changes are
>>>>> purely incremental, i.e., new Kubernetes-aware implementations of existing
>>>>> interfaces/classes in Spark core are introduced. The development is also
>>>>> concentrated in a single place at resource-managers/kubernetes
>>>>> <https://github.com/apache-spark-on-k8s/spark/tree/branch-2.2-kubernetes/resource-managers/kubernetes>.
>>>>> The risk is further reduced by a comprehensive integration test framework,
>>>>> and an active and responsive community of future maintainers.
>>>>> Target Personas
>>>>>
>>>>> Devops, data scientists, data engineers, application developers,
>>>>> anyone who can benefit from having Kubernetes
>>>>> <https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/> as
>>>>> a native cluster manager for Spark.
>>>>> Goals
>>>>>
>>>>>    -
>>>>>
>>>>>    Make Kubernetes a first-class cluster manager for Spark, alongside
>>>>>    Spark Standalone, Yarn, and Mesos.
>>>>>    -
>>>>>
>>>>>    Support both client and cluster deployment mode.
>>>>>    -
>>>>>
>>>>>    Support dynamic resource allocation
>>>>>    
>>>>> <http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation>
>>>>>    .
>>>>>    -
>>>>>
>>>>>    Support Spark Java/Scala, PySpark, and Spark R applications.
>>>>>    -
>>>>>
>>>>>    Support secure HDFS access.
>>>>>    -
>>>>>
>>>>>    Allow running applications of different Spark versions in the same
>>>>>    cluster through the ability to specify the driver and executor Docker
>>>>>    images on a per-application basis.
>>>>>    -
>>>>>
>>>>>    Support specification and enforcement of limits on both CPU cores
>>>>>    and memory.
>>>>>
>>>>> Non-Goals
>>>>>
>>>>>    -
>>>>>
>>>>>    Support cluster resource scheduling and sharing beyond
>>>>>    capabilities offered natively by the Kubernetes per-namespace resource
>>>>>    quota model.
>>>>>
>>>>> Proposed API Changes
>>>>>
>>>>> Most API changes are purely incremental, i.e., new Kubernetes-aware
>>>>> implementations of existing interfaces/classes in Spark core are
>>>>> introduced. Detailed changes are as follows.
>>>>>
>>>>>    -
>>>>>
>>>>>    A new cluster manager option KUBERNETES is introduced and some
>>>>>    changes are made to SparkSubmit to make it be aware of this
>>>>>    option.
>>>>>    -
>>>>>
>>>>>    A new implementation of CoarseGrainedSchedulerBackend, namely
>>>>>    KubernetesClusterSchedulerBackend is responsible for managing the
>>>>>    creation and deletion of executor Pods through the Kubernetes API.
>>>>>    -
>>>>>
>>>>>    A new implementation of TaskSchedulerImpl, namely
>>>>>    KubernetesTaskSchedulerImpl, and a new implementation of
>>>>>    TaskSetManager, namely Kubernetes TaskSetManager, are introduced
>>>>>    for Kubernetes-aware task scheduling.
>>>>>    -
>>>>>
>>>>>    When dynamic resource allocation is enabled, a new implementation
>>>>>    of ExternalShuffleService, namely KubernetesExternalShuffleService
>>>>>    is introduced.
>>>>>
>>>>> Design Sketch
>>>>>
>>>>> Below we briefly describe the design. For more details on the design
>>>>> and architecture, please refer to the architecture documentation
>>>>> <https://github.com/apache-spark-on-k8s/spark/tree/branch-2.2-kubernetes/resource-managers/kubernetes/architecture-docs>.
>>>>> The main idea of this design is to run Spark driver and executors inside
>>>>> Kubernetes Pods
>>>>> <https://kubernetes.io/docs/concepts/workloads/pods/pod/>. Pods are a
>>>>> co-located and co-scheduled group of one or more containers run in a 
>>>>> shared
>>>>> context. The driver is responsible for creating and destroying executor
>>>>> Pods through the Kubernetes API, while Kubernetes is fully responsible for
>>>>> scheduling the Pods to run on available nodes in the cluster. In the
>>>>> cluster mode, the driver also runs in a Pod in the cluster, created 
>>>>> through
>>>>> the Kubernetes API by a Kubernetes-aware submission client called by the
>>>>> spark-submit script. Because the driver runs in a Pod, it is
>>>>> reachable by the executors in the cluster using its Pod IP. In the client
>>>>> mode, the driver runs outside the cluster and calls the Kubernetes API to
>>>>> create and destroy executor Pods. The driver must be routable from within
>>>>> the cluster for the executors to communicate with it.
>>>>>
>>>>> The main component running in the driver is the
>>>>> KubernetesClusterSchedulerBackend, an implementation of
>>>>> CoarseGrainedSchedulerBackend, which manages allocating and
>>>>> destroying executors via the Kubernetes API, as instructed by Spark core
>>>>> via calls to methods doRequestTotalExecutors and doKillExecutors,
>>>>> respectively. Within the KubernetesClusterSchedulerBackend, a
>>>>> separate kubernetes-pod-allocator thread handles the creation of new
>>>>> executor Pods with appropriate throttling and monitoring. Throttling is
>>>>> achieved using a feedback loop that makes decision on submitting new
>>>>> requests for executors based on whether previous executor Pod creation
>>>>> requests have completed. This indirection is necessary because the
>>>>> Kubernetes API server accepts requests for new Pods optimistically, with
>>>>> the anticipation of being able to eventually schedule them to run. 
>>>>> However,
>>>>> it is undesirable to have a very large number of Pods that cannot be
>>>>> scheduled and stay pending within the cluster. The throttling mechanism
>>>>> gives us control over how fast an application scales up (which can be
>>>>> configured), and helps prevent Spark applications from DOS-ing the
>>>>> Kubernetes API server with too many Pod creation requests. The executor
>>>>> Pods simply run the CoarseGrainedExecutorBackend class from a
>>>>> pre-built Docker image that contains a Spark distribution.
>>>>>
>>>>> There are auxiliary and optional components: ResourceStagingServer
>>>>> and KubernetesExternalShuffleService, which serve specific purposes
>>>>> described below. The ResourceStagingServer serves as a file store (in
>>>>> the absence of a persistent storage layer in Kubernetes) for application
>>>>> dependencies uploaded from the submission client machine, which then get
>>>>> downloaded from the server by the init-containers in the driver and
>>>>> executor Pods. It is a Jetty server with JAX-RS and has two endpoints for
>>>>> uploading and downloading files, respectively. Security tokens are 
>>>>> returned
>>>>> in the responses for file uploading and must be carried in the requests 
>>>>> for
>>>>> downloading the files. The ResourceStagingServer is deployed as a
>>>>> Kubernetes Service
>>>>> <https://kubernetes.io/docs/concepts/services-networking/service/>
>>>>> backed by a Deployment
>>>>> <https://kubernetes.io/docs/concepts/workloads/controllers/deployment/>
>>>>> in the cluster and multiple instances may be deployed in the same cluster.
>>>>> Spark applications specify which ResourceStagingServer instance to
>>>>> use through a configuration property.
>>>>>
>>>>> The KubernetesExternalShuffleService is used to support dynamic
>>>>> resource allocation, with which the number of executors of a Spark
>>>>> application can change at runtime based on the resource needs. It provides
>>>>> an additional endpoint for drivers that allows the shuffle service to
>>>>> delete driver termination and clean up the shuffle files associated with
>>>>> corresponding application. There are two ways of deploying the
>>>>> KubernetesExternalShuffleService: running a shuffle service Pod on
>>>>> each node in the cluster or a subset of the nodes using a DaemonSet
>>>>> <https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/>,
>>>>> or running a shuffle service container in each of the executor Pods. In 
>>>>> the
>>>>> first option, each shuffle service container mounts a hostPath
>>>>> <https://kubernetes.io/docs/concepts/storage/volumes/#hostpath>
>>>>> volume. The same hostPath volume is also mounted by each of the executor
>>>>> containers, which must also have the environment variable
>>>>> SPARK_LOCAL_DIRS point to the hostPath. In the second option, a
>>>>> shuffle service container is co-located with an executor container in each
>>>>> of the executor Pods. The two containers share an emptyDir
>>>>> <https://kubernetes.io/docs/concepts/storage/volumes/#emptydir>
>>>>> volume where the shuffle data gets written to. There may be multiple
>>>>> instances of the shuffle service deployed in a cluster that may be used 
>>>>> for
>>>>> different versions of Spark, or for different priority levels with
>>>>> different resource quotas.
>>>>>
>>>>> New Kubernetes-specific configuration options are also introduced to
>>>>> facilitate specification and customization of driver and executor Pods and
>>>>> related Kubernetes resources. For example, driver and executor Pods can be
>>>>> created in a particular Kubernetes namespace and on a particular set of 
>>>>> the
>>>>> nodes in the cluster. Users are allowed to apply labels and annotations to
>>>>> the driver and executor Pods.
>>>>>
>>>>> Additionally, secure HDFS support is being actively worked on
>>>>> following the design here
>>>>> <https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg/edit>.
>>>>> Both short-running jobs and long-running jobs that need periodic 
>>>>> delegation
>>>>> token refresh are supported, leveraging built-in Kubernetes constructs 
>>>>> like
>>>>> Secrets. Please refer to the design doc for details.
>>>>> Rejected DesignsResource Staging by the Driver
>>>>>
>>>>> A first implementation effectively included the ResourceStagingServer
>>>>> in the driver container itself. The driver container ran a custom command
>>>>> that opened an HTTP endpoint and waited for the submission client to send
>>>>> resources to it. The server would then run the driver code after it had
>>>>> received the resources from the submission client machine. The problem 
>>>>> with
>>>>> this approach is that the submission client needs to deploy the driver in
>>>>> such a way that the driver itself would be reachable from outside of the
>>>>> cluster, but it is difficult for an automated framework which is not aware
>>>>> of the cluster's configuration to expose an arbitrary pod in a generic 
>>>>> way.
>>>>> The Service-based design chosen allows a cluster administrator to expose
>>>>> the ResourceStagingServer in a manner that makes sense for their
>>>>> cluster, such as with an Ingress or with a NodePort service.
>>>>> Kubernetes External Shuffle Service
>>>>>
>>>>> Several alternatives were considered for the design of the shuffle
>>>>> service. The first design postulated the use of long-lived executor pods
>>>>> and sidecar containers in them running the shuffle service. The advantage
>>>>> of this model was that it would let us use emptyDir for sharing as opposed
>>>>> to using node local storage, which guarantees better lifecycle management
>>>>> of storage by Kubernetes. The apparent disadvantage was that it would be a
>>>>> departure from the traditional Spark methodology of keeping executors for
>>>>> only as long as required in dynamic allocation mode. It would additionally
>>>>> use up more resources than strictly necessary during the course of
>>>>> long-running jobs, partially losing the advantage of dynamic scaling.
>>>>>
>>>>> Another alternative considered was to use a separate shuffle service
>>>>> manager as a nameserver. This design has a few drawbacks. First, this 
>>>>> means
>>>>> another component that needs authentication/authorization management and
>>>>> maintenance. Second, this separate component needs to be kept in sync with
>>>>> the Kubernetes cluster. Last but not least, most of functionality of this
>>>>> separate component can be performed by a combination of the in-cluster
>>>>> shuffle service and the Kubernetes API server.
>>>>> Pluggable Scheduler Backends
>>>>>
>>>>> Fully pluggable scheduler backends were considered as a more
>>>>> generalized solution, and remain interesting as a possible avenue for
>>>>> future-proofing against new scheduling targets.  For the purposes of this
>>>>> project, adding a new specialized scheduler backend for Kubernetes was
>>>>> chosen as the approach due to its very low impact on the core Spark code;
>>>>> making scheduler fully pluggable would be a high-impact high-risk
>>>>> modification to Spark’s core libraries. The pluggable scheduler backends
>>>>> effort is being tracked in JIRA-19700
>>>>> <https://issues.apache.org/jira/browse/SPARK-19700>.
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Anirudh Ramanathan
>>>
>>
>>
>


-- 
Regards,
Vaquar Khan
+1 -224-436-0783
Greater Chicago

Re: SPIP: Spark on Kubernetes

Reply via email to