Hi! Just wanted to inquire about the status of the official operator. We are looking forward to contributing and later on switching to a Spark Operator and we would prefer it to be the official one.
Thanks, Vakaris On Thu, Nov 30, 2023 at 7:09 AM Shiqi Sun <jack.sun...@gmail.com> wrote: > Hi Zhou, > > Thanks for the reply. For the language choice, since I don't think I've > used many k8s components written in Java on k8s, I can't really tell, but > at least for the components written in Golang, they are well-organized, > easy to read/maintain and run well in general. In addition, goroutines > really ease things a lot when writing concurrency code. Golang also has a > lot less boilerplates, no complicated inheritance and easier dependency > management and linting toolings. Together with all these points, that's why > I prefer Golang for this k8s operator. I understand the Spark maintainers > are more familiar with JVM languages, but I think we should consider the > performance and maintainability vs the learning curve, to choose an option > that can win in the long run. Plus, I believe most of the Spark maintainers > who touch k8s related parts in the Spark project already have experiences > with Golang, so it shouldn't be a big problem. Our team had some experience > with the fabric8 client a couple years ago, and we've experienced some > issues with its reliability, mainly about the request dropping issue (i.e. > code call is made but the apiserver never receives the request), but that > was awhile ago and I'm not sure whether everything is good with the client > now. Anyway, this is my opinion about the language choice, and I will let > other people comment about it as well. > > For compatibility, yes please make the CRD compatible from the user's > standpoint, so that it's easy for people to adopt the new operator. The > goal is to consolidate the many spark operators on the market to this new > official operator, so an easy adoption experience is the key. > > Also, I feel that the discussion is pretty high level, and it's because > the only info revealed for this new operator is the SPIP doc and I haven't > got a chance to see the code yet. I understand the new operator project > might still not be open-sourced yet, but is there any way for me to take an > early peek into the code of your operator, so that we can discuss more > specifically about the points of language choice and compatibility? Thank > you so much! > > Best, > Shiqi > > On Tue, Nov 28, 2023 at 10:42 AM Zhou Jiang <zhou.c.ji...@gmail.com> > wrote: > >> Hi Shiqi, >> >> Thanks for the cross-posting here - sorry for the response delay during >> the holiday break :) >> We prefer Java for the operator project as it's JVM-based and widely >> familiar within the Spark community. This choice aims to facilitate better >> adoption and ease of onboarding for future maintainers. In addition, the >> Java API client can also be considered as a mature option widely used, by >> Spark itself and by other operator implementations like Flink. >> For easier onboarding and potential migration, we'll consider >> compatibility with existing CRD designs - the goal is to maintain >> compatibility as best as possible while minimizing duplication efforts. >> I'm enthusiastic about the idea of lean, version agnostic submission >> worker. It aligns with one of the primary goals in the operator design. >> Let's continue exploring this idea further in design doc. >> >> Thanks, >> Zhou >> >> >> On Wed, Nov 22, 2023 at 3:35 PM Shiqi Sun <jack.sun...@gmail.com> wrote: >> >>> Hi all, >>> >>> Sorry for being late to the party. I went through the SPIP doc and I >>> think this is a great proposal! I left a comment in the SPIP doc a couple >>> days ago, but I don't see much activity there and no one replied, so I >>> wanted to cross-post it here to get some feedback. >>> >>> I'm Shiqi Sun, and I work for Big Data Platform in Salesforce. My team >>> has been running the Spark on k8s operator >>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator> (OSS >>> from Google) in my company to serve Spark users on production for 4+ years, >>> and we've been actively contributing to the Spark on k8s operator OSS and >>> also, occasionally, the Spark OSS. According to our experience, Google's >>> Spark Operator has its own problems, like its close coupling with the spark >>> version, as well as the JVM overhead during job submission. However on the >>> other side, it's been a great component in our team's service in the >>> company, especially being written in golang, it's really easy to have it >>> interact with k8s, and also its CRD covers a lot of different use cases, as >>> it has been built up through time thanks to many users' contribution during >>> these years. There were also a handful of sessions of Google's Spark >>> Operator Spark Summit that made it widely adopted. >>> >>> For this SPIP, I really love the idea of this proposal for the official >>> k8s operator of Spark project, as well as the separate layer of the >>> submission worker and being spark version agnostic. I think we can get the >>> best of the two: >>> 1. I would advocate the new project to still use golang for the >>> implementation, as golang is the go-to cloud native language that works the >>> best with k8s. >>> 2. We make sure the functionality of the current Google's spark operator >>> CRD is preserved in the new official Spark Operator; if we can make it >>> compatible or even merge the two projects to make it the new official >>> operator in spark project, it would be the best. >>> 3. The new Spark Operator should continue being spark agnostic and >>> continue having this lightweight/separate layer of submission worker. We've >>> seen scalability issues caused by the heavy JVM during spark-submit in >>> Google's Spark Operator and we implemented an internal version of fix for >>> it within our company. >>> >>> We can continue the discussion in more detail, but generally I love this >>> move of the official spark operator, and I really appreciate the effort! In >>> the SPIP doc. I see my comment has gained several upvotes from someone I >>> don't know, so I believe there are other spark/spark operator users who >>> agree with some of my points. Let me know what you all think and let's >>> continue the discussion, so that we can make this operator a great new >>> component of the Open Source Spark Project! >>> >>> Thanks! >>> >>> Shiqi >>> >>> On Mon, Nov 13, 2023 at 11:50 PM L. C. Hsieh <vii...@gmail.com> wrote: >>> >>>> Thanks for all the support from the community for the SPIP proposal. >>>> >>>> Since all questions/discussion are settled down (if I didn't miss any >>>> major ones), if no more questions or concerns, I'll be the shepherd >>>> for this SPIP proposal and call for a vote tomorrow. >>>> >>>> Thank you all! >>>> >>>> On Mon, Nov 13, 2023 at 6:43 PM Zhou Jiang <zhou.c.ji...@gmail.com> >>>> wrote: >>>> > >>>> > Hi Holden, >>>> > >>>> > Thanks a lot for your feedback! >>>> > Yes, this proposal attempts to integrate existing solutions, >>>> especially from CRD perspective. The proposed schema retains similarity >>>> with current designs, while reducing duplicates and maintaining a single >>>> source of truth from conf properties. It also tends to be close to native >>>> integration with k8s to minimize schema changes for new features. >>>> > For dependencies, packing everything is the easiest way to get >>>> started. It would be straightforward to add --packages and --repositories >>>> support for Maven dependencies. It's technically possible to pull >>>> dependencies in cloud storage from init containers (if defined by user). It >>>> could be tricky to design a general solution that supports different cloud >>>> providers from the operator layer. An enhancement that I can think of is to >>>> add support for profile scripts that can enable additional user-defined >>>> actions in application containers. >>>> > Operator does not have to build everything for k8s version >>>> compatibility. Similar to Spark, operator can be built on Fabric8 client( >>>> https://github.com/fabric8io/kubernetes-client) for support across >>>> versions, given that it makes similar API calls for resource management as >>>> Spark. For tests, in addition to fabric8 mock server, we may also borrow >>>> the idea from Flink operator to start minikube cluster for integration >>>> tests. >>>> > This operator is not starting from scratch as it is derived from an >>>> internal project which has been working in prod scale for a few years. It >>>> aims to include a few new features / enhancements, and a few >>>> re-architecture mostly to incorporate lessons learnt for designing CRD / >>>> API perspective. >>>> > Benchmarking operator performance alone can be nuanced, often tied to >>>> the underlying cluster. There's a testing strategy that Aaruna & I >>>> discussed in a previous Data AI summit, involves scheduling wide (massive >>>> light-weight applications) and deep (single application request a lot of >>>> executors with heavy IO) cases, revealing typical bottlenecks at the k8s >>>> API server and scheduler performance.Similar tests can be performed for >>>> this as well. >>>> > >>>> > On Sun, Nov 12, 2023 at 4:32 PM Holden Karau <hol...@pigscanfly.ca> >>>> wrote: >>>> >> >>>> >> To be clear: I am generally supportive of the idea (+1) but have >>>> some follow-up questions: >>>> >> >>>> >> Have we taken the time to learn from the other operators? Do we have >>>> a compatible CRD/API or not (and if so why?) >>>> >> The API seems to assume that everything is packaged in the container >>>> in advance, but I imagine that might not be the case for many folks who >>>> have Java or Python packages published to cloud storage and they want to >>>> use? >>>> >> What's our plan for the testing on the potential version explosion >>>> (not tying ourselves to operator version -> spark version makes a lot of >>>> sense, but how do we reasonably assure ourselves that the cross product of >>>> Operator Version, Kube Version, and Spark Version all function)? Do we have >>>> CI resources for this? >>>> >> Is there a current (non-open source operator) that folks from Apple >>>> are using and planning to open source, or is this a fresh "from the ground >>>> up" operator proposal? >>>> >> One of the key reasons for this is listed as "An out-of-the-box >>>> automation solution that scales effectively" but I don't see any discussion >>>> of the target scale or plans to achieve it? >>>> >> >>>> >> >>>> >> >>>> >> On Thu, Nov 9, 2023 at 9:02 PM Zhou Jiang <zhou.c.ji...@gmail.com> >>>> wrote: >>>> >>> >>>> >>> Hi Spark community, >>>> >>> >>>> >>> I'm reaching out to initiate a conversation about the possibility >>>> of developing a Java-based Kubernetes operator for Apache Spark. Following >>>> the operator pattern ( >>>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), >>>> Spark users may manage applications and related components seamlessly using >>>> native tools like kubectl. The primary goal is to simplify the Spark user >>>> experience on Kubernetes, minimizing the learning curve and operational >>>> complexities and therefore enable users to focus on the Spark application >>>> development. >>>> >>> >>>> >>> Although there are several open-source Spark on Kubernetes >>>> operators available, none of them are officially integrated into the Apache >>>> Spark project. As a result, these operators may lack active support and >>>> development for new features. Within this proposal, our aim is to introduce >>>> a Java-based Spark operator as an integral component of the Apache Spark >>>> project. This solution has been employed internally at Apple for multiple >>>> years, operating millions of executors in real production environments. The >>>> use of Java in this solution is intended to accommodate a wider user and >>>> contributor audience, especially those who are familiar with Scala. >>>> >>> >>>> >>> Ideally, this operator should have its dedicated repository, >>>> similar to Spark Connect Golang or Spark Docker, allowing it to maintain a >>>> loose connection with the Spark release cycle. This model is also followed >>>> by the Apache Flink Kubernetes operator. >>>> >>> >>>> >>> We believe that this project holds the potential to evolve into a >>>> thriving community project over the long run. A comparison can be drawn >>>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink >>>> Kubernetes operator, making it a part of the Apache Flink project ( >>>> https://github.com/apache/flink-kubernetes-operator). This move has >>>> gained wide industry adoption and contributions from the community. In a >>>> mere year, the Flink operator has garnered more than 600 stars and has >>>> attracted contributions from over 80 contributors. This showcases the level >>>> of community interest and collaborative momentum that can be achieved in >>>> similar scenarios. >>>> >>> >>>> >>> More details can be found at SPIP doc : Spark Kubernetes Operator >>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE >>>> >>> >>>> >>> Thanks, >>>> >>> >>>> >>> -- >>>> >>> Zhou JIANG >>>> >>> >>>> >> >>>> >> >>>> >> -- >>>> >> Twitter: https://twitter.com/holdenkarau >>>> >> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 >>>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> > >>>> > >>>> > >>>> > -- >>>> > Zhou JIANG >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> >> >> -- >> *Zhou JIANG* >> >>