Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

L. C. Hsieh Wed, 27 Mar 2024 23:05:28 -0700

Hi Vakaris,

Sorry for the late reply. Thanks for being interested in the official operator.
The developers have been working on code cleaning and refactoring the
internal codes for open source in the last few months.
They are ready to contribute the code to Spark.


We will create a dedicated repository and contribute the code as
initial PR for review soon.

Liang-Chi

On Wed, Mar 20, 2024 at 8:27 AM Vakaris Baškirov
<vakaris.bashki...@gmail.com> wrote:
>
> Hi!
> Just wanted to inquire about the status of the official operator. We are 
> looking forward to contributing and later on switching to a Spark Operator 
> and we would prefer it to be the official one.
>
> Thanks,
> Vakaris
>
> On Thu, Nov 30, 2023 at 7:09 AM Shiqi Sun <jack.sun...@gmail.com> wrote:
>>
>> Hi Zhou,
>>
>> Thanks for the reply. For the language choice, since I don't think I've used 
>> many k8s components written in Java on k8s, I can't really tell, but at 
>> least for the components written in Golang, they are well-organized, easy to 
>> read/maintain and run well in general. In addition, goroutines really ease 
>> things a lot when writing concurrency code. Golang also has a lot less 
>> boilerplates, no complicated inheritance and easier dependency management 
>> and linting toolings. Together with all these points, that's why I prefer 
>> Golang for this k8s operator. I understand the Spark maintainers are more 
>> familiar with JVM languages, but I think we should consider the performance 
>> and maintainability vs the learning curve, to choose an option that can win 
>> in the long run. Plus, I believe most of the Spark maintainers who touch k8s 
>> related parts in the Spark project already have experiences with Golang, so 
>> it shouldn't be a big problem. Our team had some experience with the fabric8 
>> client a couple years ago, and we've experienced some issues with its 
>> reliability, mainly about the request dropping issue (i.e. code call is made 
>> but the apiserver never receives the request), but that was awhile ago and 
>> I'm not sure whether everything is good with the client now. Anyway, this is 
>> my opinion about the language choice, and I will let other people comment 
>> about it as well.
>>
>> For compatibility, yes please make the CRD compatible from the user's 
>> standpoint, so that it's easy for people to adopt the new operator. The goal 
>> is to consolidate the many spark operators on the market to this new 
>> official operator, so an easy adoption experience is the key.
>>
>> Also, I feel that the discussion is pretty high level, and it's because the 
>> only info revealed for this new operator is the SPIP doc and I haven't got a 
>> chance to see the code yet. I understand the new operator project might 
>> still not be open-sourced yet, but is there any way for me to take an early 
>> peek into the code of your operator, so that we can discuss more 
>> specifically about the points of language choice and compatibility? Thank 
>> you so much!
>>
>> Best,
>> Shiqi
>>
>> On Tue, Nov 28, 2023 at 10:42 AM Zhou Jiang <zhou.c.ji...@gmail.com> wrote:
>>>
>>> Hi Shiqi,
>>>
>>> Thanks for the cross-posting here - sorry for the response delay during the 
>>> holiday break :)
>>> We prefer Java for the operator project as it's JVM-based and widely 
>>> familiar within the Spark community. This choice aims to facilitate better 
>>> adoption and ease of onboarding for future maintainers. In addition, the 
>>> Java API client can also be considered as a mature option widely used, by 
>>> Spark itself and by other operator implementations like Flink.
>>> For easier onboarding and potential migration, we'll consider compatibility 
>>> with existing CRD designs - the goal is to maintain compatibility as best 
>>> as possible while minimizing duplication efforts.
>>> I'm enthusiastic about the idea of lean, version agnostic submission 
>>> worker. It aligns with one of the primary goals in the operator design. 
>>> Let's continue exploring this idea further in design doc.
>>>
>>> Thanks,
>>> Zhou
>>>
>>>
>>> On Wed, Nov 22, 2023 at 3:35 PM Shiqi Sun <jack.sun...@gmail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Sorry for being late to the party. I went through the SPIP doc and I think 
>>>> this is a great proposal! I left a comment in the SPIP doc a couple days 
>>>> ago, but I don't see much activity there and no one replied, so I wanted 
>>>> to cross-post it here to get some feedback.
>>>>
>>>> I'm Shiqi Sun, and I work for Big Data Platform in Salesforce. My team has 
>>>> been running the Spark on k8s operator (OSS from Google) in my company to 
>>>> serve Spark users on production for 4+ years, and we've been actively 
>>>> contributing to the Spark on k8s operator OSS and also, occasionally, the 
>>>> Spark OSS. According to our experience, Google's Spark Operator has its 
>>>> own problems, like its close coupling with the spark version, as well as 
>>>> the JVM overhead during job submission. However on the other side, it's 
>>>> been a great component in our team's service in the company, especially 
>>>> being written in golang, it's really easy to have it interact with k8s, 
>>>> and also its CRD covers a lot of different use cases, as it has been built 
>>>> up through time thanks to many users' contribution during these years. 
>>>> There were also a handful of sessions of Google's Spark Operator Spark 
>>>> Summit that made it widely adopted.
>>>>
>>>> For this SPIP, I really love the idea of this proposal for the official 
>>>> k8s operator of Spark project, as well as the separate layer of the 
>>>> submission worker and being spark version agnostic. I think we can get the 
>>>> best of the two:
>>>> 1. I would advocate the new project to still use golang for the 
>>>> implementation, as golang is the go-to cloud native language that works 
>>>> the best with k8s.
>>>> 2. We make sure the functionality of the current Google's spark operator 
>>>> CRD is preserved in the new official Spark Operator; if we can make it 
>>>> compatible or even merge the two projects to make it the new official 
>>>> operator in spark project, it would be the best.
>>>> 3. The new Spark Operator should continue being spark agnostic and 
>>>> continue having this lightweight/separate layer of submission worker. 
>>>> We've seen scalability issues caused by the heavy JVM during spark-submit 
>>>> in Google's Spark Operator and we implemented an internal version of fix 
>>>> for it within our company.
>>>>
>>>> We can continue the discussion in more detail, but generally I love this 
>>>> move of the official spark operator, and I really appreciate the effort! 
>>>> In the SPIP doc. I see my comment has gained several upvotes from someone 
>>>> I don't know, so I believe there are other spark/spark operator users who 
>>>> agree with some of my points. Let me know what you all think and let's 
>>>> continue the discussion, so that we can make this operator a great new 
>>>> component of the Open Source Spark Project!
>>>>
>>>> Thanks!
>>>>
>>>> Shiqi
>>>>
>>>> On Mon, Nov 13, 2023 at 11:50 PM L. C. Hsieh <vii...@gmail.com> wrote:
>>>>>
>>>>> Thanks for all the support from the community for the SPIP proposal.
>>>>>
>>>>> Since all questions/discussion are settled down (if I didn't miss any
>>>>> major ones), if no more questions or concerns, I'll be the shepherd
>>>>> for this SPIP proposal and call for a vote tomorrow.
>>>>>
>>>>> Thank you all!
>>>>>
>>>>> On Mon, Nov 13, 2023 at 6:43 PM Zhou Jiang <zhou.c.ji...@gmail.com> wrote:
>>>>> >
>>>>> > Hi Holden,
>>>>> >
>>>>> > Thanks a lot for your feedback!
>>>>> > Yes, this proposal attempts to integrate existing solutions, especially 
>>>>> > from CRD perspective. The proposed schema retains similarity with 
>>>>> > current designs, while reducing duplicates and maintaining a single 
>>>>> > source of truth from conf properties. It also tends to be close to 
>>>>> > native integration with k8s to minimize schema changes for new features.
>>>>> > For dependencies, packing everything is the easiest way to get started. 
>>>>> > It would be straightforward to add --packages and --repositories 
>>>>> > support for Maven dependencies. It's technically possible to pull 
>>>>> > dependencies in cloud storage from init containers (if defined by 
>>>>> > user). It could be tricky to design a general solution that supports 
>>>>> > different cloud providers from the operator layer. An enhancement that 
>>>>> > I can think of is to add support for profile scripts that can enable 
>>>>> > additional user-defined actions in application containers.
>>>>> > Operator does not have to build everything for k8s version 
>>>>> > compatibility. Similar to Spark, operator can be built on Fabric8 
>>>>> > client(https://github.com/fabric8io/kubernetes-client) for support 
>>>>> > across versions, given that it makes similar API calls for resource 
>>>>> > management as Spark. For tests, in addition to fabric8 mock server, we 
>>>>> > may also borrow the idea from Flink operator to start minikube cluster 
>>>>> > for integration tests.
>>>>> > This operator is not starting from scratch as it is derived from an 
>>>>> > internal project which has been working in prod scale for a few years. 
>>>>> > It aims to include a few new features / enhancements, and a few 
>>>>> > re-architecture mostly to incorporate lessons learnt for designing CRD 
>>>>> > / API perspective.
>>>>> > Benchmarking operator performance alone can be nuanced, often tied to 
>>>>> > the underlying cluster. There's a testing strategy that Aaruna & I 
>>>>> > discussed in a previous Data AI summit, involves scheduling wide 
>>>>> > (massive light-weight applications) and deep (single application 
>>>>> > request a lot of executors with heavy IO) cases, revealing typical 
>>>>> > bottlenecks at the k8s API server and scheduler performance.Similar 
>>>>> > tests can be performed for this as well.
>>>>> >
>>>>> > On Sun, Nov 12, 2023 at 4:32 PM Holden Karau <hol...@pigscanfly.ca> 
>>>>> > wrote:
>>>>> >>
>>>>> >> To be clear: I am generally supportive of the idea (+1) but have some 
>>>>> >> follow-up questions:
>>>>> >>
>>>>> >> Have we taken the time to learn from the other operators? Do we have a 
>>>>> >> compatible CRD/API or not (and if so why?)
>>>>> >> The API seems to assume that everything is packaged in the container 
>>>>> >> in advance, but I imagine that might not be the case for many folks 
>>>>> >> who have Java or Python packages published to cloud storage and they 
>>>>> >> want to use?
>>>>> >> What's our plan for the testing on the potential version explosion 
>>>>> >> (not tying ourselves to operator version -> spark version makes a lot 
>>>>> >> of sense, but how do we reasonably assure ourselves that the cross 
>>>>> >> product of Operator Version, Kube Version, and Spark Version all 
>>>>> >> function)? Do we have CI resources for this?
>>>>> >> Is there a current (non-open source operator) that folks from Apple 
>>>>> >> are using and planning to open source, or is this a fresh "from the 
>>>>> >> ground up" operator proposal?
>>>>> >> One of the key reasons for this is listed as "An out-of-the-box 
>>>>> >> automation solution that scales effectively" but I don't see any 
>>>>> >> discussion of the target scale or plans to achieve it?
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Thu, Nov 9, 2023 at 9:02 PM Zhou Jiang <zhou.c.ji...@gmail.com> 
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hi Spark community,
>>>>> >>>
>>>>> >>> I'm reaching out to initiate a conversation about the possibility of 
>>>>> >>> developing a Java-based Kubernetes operator for Apache Spark. 
>>>>> >>> Following the operator pattern 
>>>>> >>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), 
>>>>> >>> Spark users may manage applications and related components seamlessly 
>>>>> >>> using native tools like kubectl. The primary goal is to simplify the 
>>>>> >>> Spark user experience on Kubernetes, minimizing the learning curve 
>>>>> >>> and operational complexities and therefore enable users to focus on 
>>>>> >>> the Spark application development.
>>>>> >>>
>>>>> >>> Although there are several open-source Spark on Kubernetes operators 
>>>>> >>> available, none of them are officially integrated into the Apache 
>>>>> >>> Spark project. As a result, these operators may lack active support 
>>>>> >>> and development for new features. Within this proposal, our aim is to 
>>>>> >>> introduce a Java-based Spark operator as an integral component of the 
>>>>> >>> Apache Spark project. This solution has been employed internally at 
>>>>> >>> Apple for multiple years, operating millions of executors in real 
>>>>> >>> production environments. The use of Java in this solution is intended 
>>>>> >>> to accommodate a wider user and contributor audience, especially 
>>>>> >>> those who are familiar with Scala.
>>>>> >>>
>>>>> >>> Ideally, this operator should have its dedicated repository, similar 
>>>>> >>> to Spark Connect Golang or Spark Docker, allowing it to maintain a 
>>>>> >>> loose connection with the Spark release cycle. This model is also 
>>>>> >>> followed by the Apache Flink Kubernetes operator.
>>>>> >>>
>>>>> >>> We believe that this project holds the potential to evolve into a 
>>>>> >>> thriving community project over the long run. A comparison can be 
>>>>> >>> drawn with the Flink Kubernetes Operator: Apple has open-sourced 
>>>>> >>> internal Flink Kubernetes operator, making it a part of the Apache 
>>>>> >>> Flink project (https://github.com/apache/flink-kubernetes-operator). 
>>>>> >>> This move has gained wide industry adoption and contributions from 
>>>>> >>> the community. In a mere year, the Flink operator has garnered more 
>>>>> >>> than 600 stars and has attracted contributions from over 80 
>>>>> >>> contributors. This showcases the level of community interest and 
>>>>> >>> collaborative momentum that can be achieved in similar scenarios.
>>>>> >>>
>>>>> >>> More details can be found at SPIP doc : Spark Kubernetes Operator 
>>>>> >>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>>
>>>>> >>> --
>>>>> >>> Zhou JIANG
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Twitter: https://twitter.com/holdenkarau
>>>>> >> Books (Learning Spark, High Performance Spark, etc.): 
>>>>> >> https://amzn.to/2MaRAG9
>>>>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Zhou JIANG
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>
>>>
>>>
>>> --
>>> Zhou JIANG
>>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

Reply via email to