Folks have (correctly) pointed out that an operator does not need to be
coupled to the Apache Spark project. However, I believe there are some
strategic community benefits to supporting a Spark operator that should be
weighed against the costs of maintaining one.

*) The Kubernetes ecosystem is evolving toward adopting operators as the de
facto standard for deploying and manipulating software resources on a kube
cluster. Supporting an out-of-the-box operator will increase the
attractiveness of Spark for users and stakeholders in the Kubernetes
ecosystem and maximize future uptake; it will continue to keep the barrier
to entry low for Spark on Kubernetes.

*) An operator provides a unified and idiomatic kube front-end not just for
spark job submissions, but also standalone spark clusters in the cloud, the
spark history server and eventually the modernized shuffle service, when
that is completed.

*) It represents an additional channel for exposing kube-specific features,
that might otherwise need to be plumbed through spark-submit or the k8s
backend.

Cheers,
Erik

On Thu, Oct 10, 2019 at 9:23 PM Yinan Li <liyinan...@gmail.com> wrote:

> +1. This and the GCP Spark Operator, although being very useful for k8s
> users, are not something needed by all Spark users, not even by all Spark
> on k8s users.
>
>
> On Thu, Oct 10, 2019 at 6:34 PM Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com> wrote:
>
>> Hi all,
>>
>> I also left a comment on the PR with more details. I dont see why the
>> java operator should be maintained by the Spark project.
>> This is an interesting project and could thrive on its own as an external
>> operator project.
>>
>> Best,
>> Stavros
>>
>> On Thu, Oct 10, 2019 at 7:51 PM Sean Owen <sro...@gmail.com> wrote:
>>
>>> I'd have the same question on the PR - why does this need to be in the
>>> Apache Spark project vs where it is now? Yes, it's not a Spark package
>>> per se, but it seems like this is a tool for K8S to use Spark rather
>>> than a core Spark tool.
>>>
>>> Yes of course all the packages, licenses, etc have to be overhauled,
>>> but that kind of underscores that this is a dump of a third party tool
>>> that works fine on its own?
>>>
>>> On Thu, Oct 10, 2019 at 9:30 AM Jiri Kremser <jkrem...@redhat.com>
>>> wrote:
>>> >
>>> > Hello,
>>> >
>>> >
>>> > Spark Operator is a tool that can deploy/scale and help with
>>> monitoring of Spark clusters on Kubernetes. It follows the operator pattern
>>> [1] introduced by CoreOS so it watches for changes in custom resources
>>> representing the desired state of the clusters and does the steps to
>>> achieve this state in the Kubernetes by using the K8s client. It’s written
>>> in Java and there is an overlap with the spark dependencies (logging, k8s
>>> client, apache-commons-*, fasterxml-jackson, etc.). The operator contains
>>> also metadata that allows it to deploy smoothly using the operatorhub.io
>>> [2]. For a very basic info, check the readme on the project page including
>>> the gif :) Other unique feature to this operator is the ability (it’s
>>> optional) to compile itself to a native image using GraalVM compiler to be
>>> able to start fast and have a very low memory footprint.
>>> >
>>> >
>>> > We would like to contribute this project to Spark’s code base. It
>>> can’t be distributed as a spark package, because it’s not a library that
>>> can be used from Spark environment. So if you are interested, the directory
>>> under resource-managers/kubernetes/spark-operator/ could be a suitable
>>> destination.
>>> >
>>> >
>>> > The current repository is radanalytics/spark-operator [2] on GitHub
>>> and it contains also a test suite [3] that verifies if the operator can
>>> work well on K8s (using minikube) and also on OpenShift. I am not sure how
>>> to transfer those tests in case you would be interested in those as well.
>>> >
>>> >
>>> > I’ve already opened the PR [5], but it got closed, so I am opening the
>>> discussion here first. The PR contained old package names with our
>>> organisation called radanalytics.io but we are willing to change that
>>> to anything that will be more aligned with the existing Spark conventions,
>>> same holds for the license headers in all the source files.
>>> >
>>> >
>>> > jk
>>> >
>>> >
>>> >
>>> > [1]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>>> >
>>> > [2]: https://operatorhub.io/operator/radanalytics-spark
>>> >
>>> > [3]: https://github.com/radanalyticsio/spark-operator
>>> >
>>> > [4]: https://travis-ci.org/radanalyticsio/spark-operator
>>> >
>>> > [5]: https://github.com/apache/spark/pull/26075
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>

Reply via email to