[DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

Zhou Jiang Thu, 09 Nov 2023 12:07:41 -0800

Hi Spark community,

I'm reaching out to initiate a conversation about the possibility of
developing a Java-based Kubernetes operator for Apache Spark. Following the
operator pattern (
https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
users may manage applications and related components seamlessly using
native tools like kubectl. The primary goal is to simplify the Spark user
experience on Kubernetes, minimizing the learning curve and operational
complexities and therefore enable users to focus on the Spark application
development.


Although there are several open-source Spark on Kubernetes operators
available, none of them are officially integrated into the Apache Spark
project. As a result, these operators may lack active support and
development for new features. Within this proposal, our aim is to introduce
a Java-based Spark operator as an integral component of the Apache Spark
project. This solution has been employed internally at Apple for multiple
years, operating millions of executors in real production environments. The
use of Java in this solution is intended to accommodate a wider user and
contributor audience, especially those who are familiar with Scala.

Ideally, this operator should have its dedicated repository, similar to
Spark Connect Golang or Spark Docker, allowing it to maintain a loose
connection with the Spark release cycle. This model is also followed by the
Apache Flink Kubernetes operator.

We believe that this project holds the potential to evolve into a thriving
community project over the long run. A comparison can be drawn with the
Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes
operator, making it a part of the Apache Flink project (
https://github.com/apache/flink-kubernetes-operator). This move has gained
wide industry adoption and contributions from the community. In a mere
year, the Flink operator has garnered more than 600 stars and has attracted
contributions from over 80 contributors. This showcases the level of
community interest and collaborative momentum that can be achieved in
similar scenarios.

More details can be found at SPIP doc : Spark Kubernetes Operator
https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE

Thanks,
-- 
*Zhou JIANG*

[DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

Reply via email to