Folks have (correctly) pointed out that an operator does not need to be coupled to the Apache Spark project. However, I believe there are some strategic community benefits to supporting a Spark operator that should be weighed against the costs of maintaining one.
*) The Kubernetes ecosystem is evolving toward adopting operators as the de facto standard for deploying and manipulating software resources on a kube cluster. Supporting an out-of-the-box operator will increase the attractiveness of Spark for users and stakeholders in the Kubernetes ecosystem and maximize future uptake; it will continue to keep the barrier to entry low for Spark on Kubernetes. *) An operator provides a unified and idiomatic kube front-end not just for spark job submissions, but also standalone spark clusters in the cloud, the spark history server and eventually the modernized shuffle service, when that is completed. *) It represents an additional channel for exposing kube-specific features, that might otherwise need to be plumbed through spark-submit or the k8s backend. Cheers, Erik On Thu, Oct 10, 2019 at 9:23 PM Yinan Li <liyinan...@gmail.com> wrote: > +1. This and the GCP Spark Operator, although being very useful for k8s > users, are not something needed by all Spark users, not even by all Spark > on k8s users. > > > On Thu, Oct 10, 2019 at 6:34 PM Stavros Kontopoulos < > stavros.kontopou...@lightbend.com> wrote: > >> Hi all, >> >> I also left a comment on the PR with more details. I dont see why the >> java operator should be maintained by the Spark project. >> This is an interesting project and could thrive on its own as an external >> operator project. >> >> Best, >> Stavros >> >> On Thu, Oct 10, 2019 at 7:51 PM Sean Owen <sro...@gmail.com> wrote: >> >>> I'd have the same question on the PR - why does this need to be in the >>> Apache Spark project vs where it is now? Yes, it's not a Spark package >>> per se, but it seems like this is a tool for K8S to use Spark rather >>> than a core Spark tool. >>> >>> Yes of course all the packages, licenses, etc have to be overhauled, >>> but that kind of underscores that this is a dump of a third party tool >>> that works fine on its own? >>> >>> On Thu, Oct 10, 2019 at 9:30 AM Jiri Kremser <jkrem...@redhat.com> >>> wrote: >>> > >>> > Hello, >>> > >>> > >>> > Spark Operator is a tool that can deploy/scale and help with >>> monitoring of Spark clusters on Kubernetes. It follows the operator pattern >>> [1] introduced by CoreOS so it watches for changes in custom resources >>> representing the desired state of the clusters and does the steps to >>> achieve this state in the Kubernetes by using the K8s client. It’s written >>> in Java and there is an overlap with the spark dependencies (logging, k8s >>> client, apache-commons-*, fasterxml-jackson, etc.). The operator contains >>> also metadata that allows it to deploy smoothly using the operatorhub.io >>> [2]. For a very basic info, check the readme on the project page including >>> the gif :) Other unique feature to this operator is the ability (it’s >>> optional) to compile itself to a native image using GraalVM compiler to be >>> able to start fast and have a very low memory footprint. >>> > >>> > >>> > We would like to contribute this project to Spark’s code base. It >>> can’t be distributed as a spark package, because it’s not a library that >>> can be used from Spark environment. So if you are interested, the directory >>> under resource-managers/kubernetes/spark-operator/ could be a suitable >>> destination. >>> > >>> > >>> > The current repository is radanalytics/spark-operator [2] on GitHub >>> and it contains also a test suite [3] that verifies if the operator can >>> work well on K8s (using minikube) and also on OpenShift. I am not sure how >>> to transfer those tests in case you would be interested in those as well. >>> > >>> > >>> > I’ve already opened the PR [5], but it got closed, so I am opening the >>> discussion here first. The PR contained old package names with our >>> organisation called radanalytics.io but we are willing to change that >>> to anything that will be more aligned with the existing Spark conventions, >>> same holds for the license headers in all the source files. >>> > >>> > >>> > jk >>> > >>> > >>> > >>> > [1]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ >>> > >>> > [2]: https://operatorhub.io/operator/radanalytics-spark >>> > >>> > [3]: https://github.com/radanalyticsio/spark-operator >>> > >>> > [4]: https://travis-ci.org/radanalyticsio/spark-operator >>> > >>> > [5]: https://github.com/apache/spark/pull/26075 >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >>