Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2024-03-28 Thread L. C. Hsieh
Hi Vakaris,

Sorry for the late reply. Thanks for being interested in the official operator.
The developers have been working on code cleaning and refactoring the
internal codes for open source in the last few months.
They are ready to contribute the code to Spark.

We will create a dedicated repository and contribute the code as
initial PR for review soon.

Liang-Chi

On Wed, Mar 20, 2024 at 8:27 AM Vakaris Baškirov
 wrote:
>
> Hi!
> Just wanted to inquire about the status of the official operator. We are 
> looking forward to contributing and later on switching to a Spark Operator 
> and we would prefer it to be the official one.
>
> Thanks,
> Vakaris
>
> On Thu, Nov 30, 2023 at 7:09 AM Shiqi Sun  wrote:
>>
>> Hi Zhou,
>>
>> Thanks for the reply. For the language choice, since I don't think I've used 
>> many k8s components written in Java on k8s, I can't really tell, but at 
>> least for the components written in Golang, they are well-organized, easy to 
>> read/maintain and run well in general. In addition, goroutines really ease 
>> things a lot when writing concurrency code. Golang also has a lot less 
>> boilerplates, no complicated inheritance and easier dependency management 
>> and linting toolings. Together with all these points, that's why I prefer 
>> Golang for this k8s operator. I understand the Spark maintainers are more 
>> familiar with JVM languages, but I think we should consider the performance 
>> and maintainability vs the learning curve, to choose an option that can win 
>> in the long run. Plus, I believe most of the Spark maintainers who touch k8s 
>> related parts in the Spark project already have experiences with Golang, so 
>> it shouldn't be a big problem. Our team had some experience with the fabric8 
>> client a couple years ago, and we've experienced some issues with its 
>> reliability, mainly about the request dropping issue (i.e. code call is made 
>> but the apiserver never receives the request), but that was awhile ago and 
>> I'm not sure whether everything is good with the client now. Anyway, this is 
>> my opinion about the language choice, and I will let other people comment 
>> about it as well.
>>
>> For compatibility, yes please make the CRD compatible from the user's 
>> standpoint, so that it's easy for people to adopt the new operator. The goal 
>> is to consolidate the many spark operators on the market to this new 
>> official operator, so an easy adoption experience is the key.
>>
>> Also, I feel that the discussion is pretty high level, and it's because the 
>> only info revealed for this new operator is the SPIP doc and I haven't got a 
>> chance to see the code yet. I understand the new operator project might 
>> still not be open-sourced yet, but is there any way for me to take an early 
>> peek into the code of your operator, so that we can discuss more 
>> specifically about the points of language choice and compatibility? Thank 
>> you so much!
>>
>> Best,
>> Shiqi
>>
>> On Tue, Nov 28, 2023 at 10:42 AM Zhou Jiang  wrote:
>>>
>>> Hi Shiqi,
>>>
>>> Thanks for the cross-posting here - sorry for the response delay during the 
>>> holiday break :)
>>> We prefer Java for the operator project as it's JVM-based and widely 
>>> familiar within the Spark community. This choice aims to facilitate better 
>>> adoption and ease of onboarding for future maintainers. In addition, the 
>>> Java API client can also be considered as a mature option widely used, by 
>>> Spark itself and by other operator implementations like Flink.
>>> For easier onboarding and potential migration, we'll consider compatibility 
>>> with existing CRD designs - the goal is to maintain compatibility as best 
>>> as possible while minimizing duplication efforts.
>>> I'm enthusiastic about the idea of lean, version agnostic submission 
>>> worker. It aligns with one of the primary goals in the operator design. 
>>> Let's continue exploring this idea further in design doc.
>>>
>>> Thanks,
>>> Zhou
>>>
>>>
>>> On Wed, Nov 22, 2023 at 3:35 PM Shiqi Sun  wrote:

 Hi all,

 Sorry for being late to the party. I went through the SPIP doc and I think 
 this is a great proposal! I left a comment in the SPIP doc a couple days 
 ago, but I don't see much activity there and no one replied, so I wanted 
 to cross-post it here to get some feedback.

 I'm Shiqi Sun, and I work for Big Data Platform in Salesforce. My team has 
 been running the Spark on k8s operator (OSS from Google) in my company to 
 serve Spark users on production for 4+ years, and we've been actively 
 contributing to the Spark on k8s operator OSS and also, occasionally, the 
 Spark OSS. According to our experience, Google's Spark Operator has its 
 own problems, like its close coupling with the spark version, as well as 
 the JVM overhead during job submission. However on the other side, it's 
 been a great component in our team's service in the company, especially 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2024-03-20 Thread Vakaris Baškirov
Hi!
Just wanted to inquire about the status of the official operator. We are
looking forward to contributing and later on switching to a Spark Operator
and we would prefer it to be the official one.

Thanks,
Vakaris

On Thu, Nov 30, 2023 at 7:09 AM Shiqi Sun  wrote:

> Hi Zhou,
>
> Thanks for the reply. For the language choice, since I don't think I've
> used many k8s components written in Java on k8s, I can't really tell, but
> at least for the components written in Golang, they are well-organized,
> easy to read/maintain and run well in general. In addition, goroutines
> really ease things a lot when writing concurrency code. Golang also has a
> lot less boilerplates, no complicated inheritance and easier dependency
> management and linting toolings. Together with all these points, that's why
> I prefer Golang for this k8s operator. I understand the Spark maintainers
> are more familiar with JVM languages, but I think we should consider the
> performance and maintainability vs the learning curve, to choose an option
> that can win in the long run. Plus, I believe most of the Spark maintainers
> who touch k8s related parts in the Spark project already have experiences
> with Golang, so it shouldn't be a big problem. Our team had some experience
> with the fabric8 client a couple years ago, and we've experienced some
> issues with its reliability, mainly about the request dropping issue (i.e.
> code call is made but the apiserver never receives the request), but that
> was awhile ago and I'm not sure whether everything is good with the client
> now. Anyway, this is my opinion about the language choice, and I will let
> other people comment about it as well.
>
> For compatibility, yes please make the CRD compatible from the user's
> standpoint, so that it's easy for people to adopt the new operator. The
> goal is to consolidate the many spark operators on the market to this new
> official operator, so an easy adoption experience is the key.
>
> Also, I feel that the discussion is pretty high level, and it's because
> the only info revealed for this new operator is the SPIP doc and I haven't
> got a chance to see the code yet. I understand the new operator project
> might still not be open-sourced yet, but is there any way for me to take an
> early peek into the code of your operator, so that we can discuss more
> specifically about the points of language choice and compatibility? Thank
> you so much!
>
> Best,
> Shiqi
>
> On Tue, Nov 28, 2023 at 10:42 AM Zhou Jiang 
> wrote:
>
>> Hi Shiqi,
>>
>> Thanks for the cross-posting here - sorry for the response delay during
>> the holiday break :)
>> We prefer Java for the operator project as it's JVM-based and widely
>> familiar within the Spark community. This choice aims to facilitate better
>> adoption and ease of onboarding for future maintainers. In addition, the
>> Java API client can also be considered as a mature option widely used, by
>> Spark itself and by other operator implementations like Flink.
>> For easier onboarding and potential migration, we'll consider
>> compatibility with existing CRD designs - the goal is to maintain
>> compatibility as best as possible while minimizing duplication efforts.
>> I'm enthusiastic about the idea of lean, version agnostic submission
>> worker. It aligns with one of the primary goals in the operator design.
>> Let's continue exploring this idea further in design doc.
>>
>> Thanks,
>> Zhou
>>
>>
>> On Wed, Nov 22, 2023 at 3:35 PM Shiqi Sun  wrote:
>>
>>> Hi all,
>>>
>>> Sorry for being late to the party. I went through the SPIP doc and I
>>> think this is a great proposal! I left a comment in the SPIP doc a couple
>>> days ago, but I don't see much activity there and no one replied, so I
>>> wanted to cross-post it here to get some feedback.
>>>
>>> I'm Shiqi Sun, and I work for Big Data Platform in Salesforce. My team
>>> has been running the Spark on k8s operator
>>>  (OSS
>>> from Google) in my company to serve Spark users on production for 4+ years,
>>> and we've been actively contributing to the Spark on k8s operator OSS and
>>> also, occasionally, the Spark OSS. According to our experience, Google's
>>> Spark Operator has its own problems, like its close coupling with the spark
>>> version, as well as the JVM overhead during job submission. However on the
>>> other side, it's been a great component in our team's service in the
>>> company, especially being written in golang, it's really easy to have it
>>> interact with k8s, and also its CRD covers a lot of different use cases, as
>>> it has been built up through time thanks to many users' contribution during
>>> these years. There were also a handful of sessions of Google's Spark
>>> Operator Spark Summit that made it widely adopted.
>>>
>>> For this SPIP, I really love the idea of this proposal for the official
>>> k8s operator of Spark project, as well as the separate layer of the
>>> 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-30 Thread Kumar K
+1

On Fri, Nov 10, 2023 at 8:51 PM Khalid Mammadov 
wrote:

> +1
>
> On Fri, 10 Nov 2023, 15:23 Peter Toth,  wrote:
>
>> +1
>>
>> On Fri, Nov 10, 2023, 14:09 Bjørn Jørgensen 
>> wrote:
>>
>>> +1
>>>
>>> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :
>>>
 just curious what happened on google’s spark operator?

 On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>
>> +1
>>
>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala 
>> wrote:
>>
>>> +1 for creating an official Kubernetes operator for Apache Spark
>>>
>>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>>> wrote:
>>>
 +1

>>>
 On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:

> +1
>
> To be completely transparent, I am employed in the same department
> as Zhou at Apple.
>
> I support this proposal, provided that we witness community
> adoption following the release of the Flink Kubernetes operator,
> streamlining Flink deployment on Kubernetes.
>
> A well-maintained official Spark Kubernetes operator is essential
> for our Spark community as well.
>
> DB Tsai  |  https://www.dbtsai.com/
> 
>  |  PGP 42E5B25A8F7A82C1
>
> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
> wrote:
>
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility
> of developing a Java-based Kubernetes operator for Apache Spark. 
> Following
> the operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
> ),
> Spark users may manage applications and related components seamlessly 
> using
> native tools like kubectl. The primary goal is to simplify the Spark 
> user
> experience on Kubernetes, minimizing the learning curve and 
> operational
> complexities and therefore enable users to focus on the Spark 
> application
> development.
> Although there are several open-source Spark on Kubernetes
> operators available, none of them are officially integrated into the 
> Apache
> Spark project. As a result, these operators may lack active support 
> and
> development for new features. Within this proposal, our aim is to 
> introduce
> a Java-based Spark operator as an integral component of the Apache 
> Spark
> project. This solution has been employed internally at Apple for 
> multiple
> years, operating millions of executors in real production 
> environments. The
> use of Java in this solution is intended to accommodate a wider user 
> and
> contributor audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository,
> similar to Spark Connect Golang or Spark Docker, allowing it to 
> maintain a
> loose connection with the Spark release cycle. This model is also 
> followed
> by the Apache Flink Kubernetes operator.
> We believe that this project holds the potential to evolve into a
> thriving community project over the long run. A comparison can be 
> drawn
> with the Flink Kubernetes Operator: Apple has open-sourced internal 
> Flink
> Kubernetes operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator
> ).
> This move has gained wide industry adoption and contributions 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-29 Thread Shiqi Sun
Hi Zhou,

Thanks for the reply. For the language choice, since I don't think I've
used many k8s components written in Java on k8s, I can't really tell, but
at least for the components written in Golang, they are well-organized,
easy to read/maintain and run well in general. In addition, goroutines
really ease things a lot when writing concurrency code. Golang also has a
lot less boilerplates, no complicated inheritance and easier dependency
management and linting toolings. Together with all these points, that's why
I prefer Golang for this k8s operator. I understand the Spark maintainers
are more familiar with JVM languages, but I think we should consider the
performance and maintainability vs the learning curve, to choose an option
that can win in the long run. Plus, I believe most of the Spark maintainers
who touch k8s related parts in the Spark project already have experiences
with Golang, so it shouldn't be a big problem. Our team had some experience
with the fabric8 client a couple years ago, and we've experienced some
issues with its reliability, mainly about the request dropping issue (i.e.
code call is made but the apiserver never receives the request), but that
was awhile ago and I'm not sure whether everything is good with the client
now. Anyway, this is my opinion about the language choice, and I will let
other people comment about it as well.

For compatibility, yes please make the CRD compatible from the user's
standpoint, so that it's easy for people to adopt the new operator. The
goal is to consolidate the many spark operators on the market to this new
official operator, so an easy adoption experience is the key.

Also, I feel that the discussion is pretty high level, and it's because the
only info revealed for this new operator is the SPIP doc and I haven't got
a chance to see the code yet. I understand the new operator project might
still not be open-sourced yet, but is there any way for me to take an early
peek into the code of your operator, so that we can discuss more
specifically about the points of language choice and compatibility? Thank
you so much!

Best,
Shiqi

On Tue, Nov 28, 2023 at 10:42 AM Zhou Jiang  wrote:

> Hi Shiqi,
>
> Thanks for the cross-posting here - sorry for the response delay during
> the holiday break :)
> We prefer Java for the operator project as it's JVM-based and widely
> familiar within the Spark community. This choice aims to facilitate better
> adoption and ease of onboarding for future maintainers. In addition, the
> Java API client can also be considered as a mature option widely used, by
> Spark itself and by other operator implementations like Flink.
> For easier onboarding and potential migration, we'll consider
> compatibility with existing CRD designs - the goal is to maintain
> compatibility as best as possible while minimizing duplication efforts.
> I'm enthusiastic about the idea of lean, version agnostic submission
> worker. It aligns with one of the primary goals in the operator design.
> Let's continue exploring this idea further in design doc.
>
> Thanks,
> Zhou
>
>
> On Wed, Nov 22, 2023 at 3:35 PM Shiqi Sun  wrote:
>
>> Hi all,
>>
>> Sorry for being late to the party. I went through the SPIP doc and I
>> think this is a great proposal! I left a comment in the SPIP doc a couple
>> days ago, but I don't see much activity there and no one replied, so I
>> wanted to cross-post it here to get some feedback.
>>
>> I'm Shiqi Sun, and I work for Big Data Platform in Salesforce. My team
>> has been running the Spark on k8s operator
>>  (OSS from
>> Google) in my company to serve Spark users on production for 4+ years, and
>> we've been actively contributing to the Spark on k8s operator OSS and also,
>> occasionally, the Spark OSS. According to our experience, Google's Spark
>> Operator has its own problems, like its close coupling with the spark
>> version, as well as the JVM overhead during job submission. However on the
>> other side, it's been a great component in our team's service in the
>> company, especially being written in golang, it's really easy to have it
>> interact with k8s, and also its CRD covers a lot of different use cases, as
>> it has been built up through time thanks to many users' contribution during
>> these years. There were also a handful of sessions of Google's Spark
>> Operator Spark Summit that made it widely adopted.
>>
>> For this SPIP, I really love the idea of this proposal for the official
>> k8s operator of Spark project, as well as the separate layer of the
>> submission worker and being spark version agnostic. I think we can get the
>> best of the two:
>> 1. I would advocate the new project to still use golang for the
>> implementation, as golang is the go-to cloud native language that works the
>> best with k8s.
>> 2. We make sure the functionality of the current Google's spark operator
>> CRD is preserved in the new official Spark Operator; if we can 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-28 Thread Zhou Jiang
Hi Shiqi,

Thanks for the cross-posting here - sorry for the response delay during the
holiday break :)
We prefer Java for the operator project as it's JVM-based and widely
familiar within the Spark community. This choice aims to facilitate better
adoption and ease of onboarding for future maintainers. In addition, the
Java API client can also be considered as a mature option widely used, by
Spark itself and by other operator implementations like Flink.
For easier onboarding and potential migration, we'll consider
compatibility with existing CRD designs - the goal is to maintain
compatibility as best as possible while minimizing duplication efforts.
I'm enthusiastic about the idea of lean, version agnostic submission
worker. It aligns with one of the primary goals in the operator design.
Let's continue exploring this idea further in design doc.

Thanks,
Zhou


On Wed, Nov 22, 2023 at 3:35 PM Shiqi Sun  wrote:

> Hi all,
>
> Sorry for being late to the party. I went through the SPIP doc and I think
> this is a great proposal! I left a comment in the SPIP doc a couple days
> ago, but I don't see much activity there and no one replied, so I wanted to
> cross-post it here to get some feedback.
>
> I'm Shiqi Sun, and I work for Big Data Platform in Salesforce. My team has
> been running the Spark on k8s operator
>  (OSS from
> Google) in my company to serve Spark users on production for 4+ years, and
> we've been actively contributing to the Spark on k8s operator OSS and also,
> occasionally, the Spark OSS. According to our experience, Google's Spark
> Operator has its own problems, like its close coupling with the spark
> version, as well as the JVM overhead during job submission. However on the
> other side, it's been a great component in our team's service in the
> company, especially being written in golang, it's really easy to have it
> interact with k8s, and also its CRD covers a lot of different use cases, as
> it has been built up through time thanks to many users' contribution during
> these years. There were also a handful of sessions of Google's Spark
> Operator Spark Summit that made it widely adopted.
>
> For this SPIP, I really love the idea of this proposal for the official
> k8s operator of Spark project, as well as the separate layer of the
> submission worker and being spark version agnostic. I think we can get the
> best of the two:
> 1. I would advocate the new project to still use golang for the
> implementation, as golang is the go-to cloud native language that works the
> best with k8s.
> 2. We make sure the functionality of the current Google's spark operator
> CRD is preserved in the new official Spark Operator; if we can make it
> compatible or even merge the two projects to make it the new official
> operator in spark project, it would be the best.
> 3. The new Spark Operator should continue being spark agnostic and
> continue having this lightweight/separate layer of submission worker. We've
> seen scalability issues caused by the heavy JVM during spark-submit in
> Google's Spark Operator and we implemented an internal version of fix for
> it within our company.
>
> We can continue the discussion in more detail, but generally I love this
> move of the official spark operator, and I really appreciate the effort! In
> the SPIP doc. I see my comment has gained several upvotes from someone I
> don't know, so I believe there are other spark/spark operator users who
> agree with some of my points. Let me know what you all think and let's
> continue the discussion, so that we can make this operator a great new
> component of the Open Source Spark Project!
>
> Thanks!
>
> Shiqi
>
> On Mon, Nov 13, 2023 at 11:50 PM L. C. Hsieh  wrote:
>
>> Thanks for all the support from the community for the SPIP proposal.
>>
>> Since all questions/discussion are settled down (if I didn't miss any
>> major ones), if no more questions or concerns, I'll be the shepherd
>> for this SPIP proposal and call for a vote tomorrow.
>>
>> Thank you all!
>>
>> On Mon, Nov 13, 2023 at 6:43 PM Zhou Jiang 
>> wrote:
>> >
>> > Hi Holden,
>> >
>> > Thanks a lot for your feedback!
>> > Yes, this proposal attempts to integrate existing solutions, especially
>> from CRD perspective. The proposed schema retains similarity with current
>> designs, while reducing duplicates and maintaining a single source of truth
>> from conf properties. It also tends to be close to native integration with
>> k8s to minimize schema changes for new features.
>> > For dependencies, packing everything is the easiest way to get started.
>> It would be straightforward to add --packages and --repositories support
>> for Maven dependencies. It's technically possible to pull dependencies in
>> cloud storage from init containers (if defined by user). It could be tricky
>> to design a general solution that supports different cloud providers from
>> the operator layer. An enhancement that I can 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-22 Thread Shiqi Sun
Hi all,

Sorry for being late to the party. I went through the SPIP doc and I think
this is a great proposal! I left a comment in the SPIP doc a couple days
ago, but I don't see much activity there and no one replied, so I wanted to
cross-post it here to get some feedback.

I'm Shiqi Sun, and I work for Big Data Platform in Salesforce. My team has
been running the Spark on k8s operator
 (OSS from
Google) in my company to serve Spark users on production for 4+ years, and
we've been actively contributing to the Spark on k8s operator OSS and also,
occasionally, the Spark OSS. According to our experience, Google's Spark
Operator has its own problems, like its close coupling with the spark
version, as well as the JVM overhead during job submission. However on the
other side, it's been a great component in our team's service in the
company, especially being written in golang, it's really easy to have it
interact with k8s, and also its CRD covers a lot of different use cases, as
it has been built up through time thanks to many users' contribution during
these years. There were also a handful of sessions of Google's Spark
Operator Spark Summit that made it widely adopted.

For this SPIP, I really love the idea of this proposal for the official k8s
operator of Spark project, as well as the separate layer of the submission
worker and being spark version agnostic. I think we can get the best of the
two:
1. I would advocate the new project to still use golang for the
implementation, as golang is the go-to cloud native language that works the
best with k8s.
2. We make sure the functionality of the current Google's spark operator
CRD is preserved in the new official Spark Operator; if we can make it
compatible or even merge the two projects to make it the new official
operator in spark project, it would be the best.
3. The new Spark Operator should continue being spark agnostic and continue
having this lightweight/separate layer of submission worker. We've seen
scalability issues caused by the heavy JVM during spark-submit in Google's
Spark Operator and we implemented an internal version of fix for it within
our company.

We can continue the discussion in more detail, but generally I love this
move of the official spark operator, and I really appreciate the effort! In
the SPIP doc. I see my comment has gained several upvotes from someone I
don't know, so I believe there are other spark/spark operator users who
agree with some of my points. Let me know what you all think and let's
continue the discussion, so that we can make this operator a great new
component of the Open Source Spark Project!

Thanks!

Shiqi

On Mon, Nov 13, 2023 at 11:50 PM L. C. Hsieh  wrote:

> Thanks for all the support from the community for the SPIP proposal.
>
> Since all questions/discussion are settled down (if I didn't miss any
> major ones), if no more questions or concerns, I'll be the shepherd
> for this SPIP proposal and call for a vote tomorrow.
>
> Thank you all!
>
> On Mon, Nov 13, 2023 at 6:43 PM Zhou Jiang  wrote:
> >
> > Hi Holden,
> >
> > Thanks a lot for your feedback!
> > Yes, this proposal attempts to integrate existing solutions, especially
> from CRD perspective. The proposed schema retains similarity with current
> designs, while reducing duplicates and maintaining a single source of truth
> from conf properties. It also tends to be close to native integration with
> k8s to minimize schema changes for new features.
> > For dependencies, packing everything is the easiest way to get started.
> It would be straightforward to add --packages and --repositories support
> for Maven dependencies. It's technically possible to pull dependencies in
> cloud storage from init containers (if defined by user). It could be tricky
> to design a general solution that supports different cloud providers from
> the operator layer. An enhancement that I can think of is to add support
> for profile scripts that can enable additional user-defined actions in
> application containers.
> > Operator does not have to build everything for k8s version
> compatibility. Similar to Spark, operator can be built on Fabric8 client(
> https://github.com/fabric8io/kubernetes-client) for support across
> versions, given that it makes similar API calls for resource management as
> Spark. For tests, in addition to fabric8 mock server, we may also borrow
> the idea from Flink operator to start minikube cluster for integration
> tests.
> > This operator is not starting from scratch as it is derived from an
> internal project which has been working in prod scale for a few years. It
> aims to include a few new features / enhancements, and a few
> re-architecture mostly to incorporate lessons learnt for designing CRD /
> API perspective.
> > Benchmarking operator performance alone can be nuanced, often tied to
> the underlying cluster. There's a testing strategy that Aaruna & I
> discussed in a previous Data AI 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-13 Thread L. C. Hsieh
Thanks for all the support from the community for the SPIP proposal.

Since all questions/discussion are settled down (if I didn't miss any
major ones), if no more questions or concerns, I'll be the shepherd
for this SPIP proposal and call for a vote tomorrow.

Thank you all!

On Mon, Nov 13, 2023 at 6:43 PM Zhou Jiang  wrote:
>
> Hi Holden,
>
> Thanks a lot for your feedback!
> Yes, this proposal attempts to integrate existing solutions, especially from 
> CRD perspective. The proposed schema retains similarity with current designs, 
> while reducing duplicates and maintaining a single source of truth from conf 
> properties. It also tends to be close to native integration with k8s to 
> minimize schema changes for new features.
> For dependencies, packing everything is the easiest way to get started. It 
> would be straightforward to add --packages and --repositories support for 
> Maven dependencies. It's technically possible to pull dependencies in cloud 
> storage from init containers (if defined by user). It could be tricky to 
> design a general solution that supports different cloud providers from the 
> operator layer. An enhancement that I can think of is to add support for 
> profile scripts that can enable additional user-defined actions in 
> application containers.
> Operator does not have to build everything for k8s version compatibility. 
> Similar to Spark, operator can be built on Fabric8 
> client(https://github.com/fabric8io/kubernetes-client) for support across 
> versions, given that it makes similar API calls for resource management as 
> Spark. For tests, in addition to fabric8 mock server, we may also borrow the 
> idea from Flink operator to start minikube cluster for integration tests.
> This operator is not starting from scratch as it is derived from an internal 
> project which has been working in prod scale for a few years. It aims to 
> include a few new features / enhancements, and a few re-architecture mostly 
> to incorporate lessons learnt for designing CRD / API perspective.
> Benchmarking operator performance alone can be nuanced, often tied to the 
> underlying cluster. There's a testing strategy that Aaruna & I discussed in a 
> previous Data AI summit, involves scheduling wide (massive light-weight 
> applications) and deep (single application request a lot of executors with 
> heavy IO) cases, revealing typical bottlenecks at the k8s API server and 
> scheduler performance.Similar tests can be performed for this as well.
>
> On Sun, Nov 12, 2023 at 4:32 PM Holden Karau  wrote:
>>
>> To be clear: I am generally supportive of the idea (+1) but have some 
>> follow-up questions:
>>
>> Have we taken the time to learn from the other operators? Do we have a 
>> compatible CRD/API or not (and if so why?)
>> The API seems to assume that everything is packaged in the container in 
>> advance, but I imagine that might not be the case for many folks who have 
>> Java or Python packages published to cloud storage and they want to use?
>> What's our plan for the testing on the potential version explosion (not 
>> tying ourselves to operator version -> spark version makes a lot of sense, 
>> but how do we reasonably assure ourselves that the cross product of Operator 
>> Version, Kube Version, and Spark Version all function)? Do we have CI 
>> resources for this?
>> Is there a current (non-open source operator) that folks from Apple are 
>> using and planning to open source, or is this a fresh "from the ground up" 
>> operator proposal?
>> One of the key reasons for this is listed as "An out-of-the-box automation 
>> solution that scales effectively" but I don't see any discussion of the 
>> target scale or plans to achieve it?
>>
>>
>>
>> On Thu, Nov 9, 2023 at 9:02 PM Zhou Jiang  wrote:
>>>
>>> Hi Spark community,
>>>
>>> I'm reaching out to initiate a conversation about the possibility of 
>>> developing a Java-based Kubernetes operator for Apache Spark. Following the 
>>> operator pattern 
>>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
>>> users may manage applications and related components seamlessly using 
>>> native tools like kubectl. The primary goal is to simplify the Spark user 
>>> experience on Kubernetes, minimizing the learning curve and operational 
>>> complexities and therefore enable users to focus on the Spark application 
>>> development.
>>>
>>> Although there are several open-source Spark on Kubernetes operators 
>>> available, none of them are officially integrated into the Apache Spark 
>>> project. As a result, these operators may lack active support and 
>>> development for new features. Within this proposal, our aim is to introduce 
>>> a Java-based Spark operator as an integral component of the Apache Spark 
>>> project. This solution has been employed internally at Apple for multiple 
>>> years, operating millions of executors in real production environments. The 
>>> use of Java in this solution is intended to 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-13 Thread Zhou Jiang
Hi Holden,

Thanks a lot for your feedback!
Yes, this proposal attempts to integrate existing solutions, especially
from CRD perspective. The proposed schema retains similarity with current
designs, while reducing duplicates and maintaining a single source of truth
from conf properties. It also tends to be close to native integration with
k8s to minimize schema changes for new features.
For dependencies, packing everything is the easiest way to get started. It
would be straightforward to add --packages and --repositories support for
Maven dependencies. It's technically possible to pull dependencies in cloud
storage from init containers (if defined by user). It could be tricky to
design a general solution that supports different cloud providers from the
operator layer. An enhancement that I can think of is to add support for
profile scripts that can enable additional user-defined actions in
application containers.
Operator does not have to build everything for k8s version compatibility.
Similar to Spark, operator can be built on Fabric8 client(
https://github.com/fabric8io/kubernetes-client) for support across
versions, given that it makes similar API calls for resource management as
Spark. For tests, in addition to fabric8 mock server, we may also borrow
the idea from Flink operator to start minikube cluster for integration
tests.
This operator is not starting from scratch as it is derived from an
internal project which has been working in prod scale for a few years. It
aims to include a few new features / enhancements, and a few
re-architecture mostly to incorporate lessons learnt for designing CRD /
API perspective.
Benchmarking operator performance alone can be nuanced, often tied to the
underlying cluster. There's a testing strategy that Aaruna & I discussed in
a previous Data AI summit, involves scheduling wide (massive light-weight
applications) and deep (single application request a lot of executors with
heavy IO) cases, revealing typical bottlenecks at the k8s API server and
scheduler performance.Similar tests can be performed for this as well.

On Sun, Nov 12, 2023 at 4:32 PM Holden Karau  wrote:

> To be clear: I am generally supportive of the idea (+1) but have some
> follow-up questions:
>
> Have we taken the time to learn from the other operators? Do we have a
> compatible CRD/API or not (and if so why?)
> The API seems to assume that everything is packaged in the container in
> advance, but I imagine that might not be the case for many folks who have
> Java or Python packages published to cloud storage and they want to use?
> What's our plan for the testing on the potential version explosion (not
> tying ourselves to operator version -> spark version makes a lot of sense,
> but how do we reasonably assure ourselves that the cross product of
> Operator Version, Kube Version, and Spark Version all function)? Do we have
> CI resources for this?
> Is there a current (non-open source operator) that folks from Apple are
> using and planning to open source, or is this a fresh "from the ground up"
> operator proposal?
> One of the key reasons for this is listed as "An out-of-the-box automation
> solution that scales effectively" but I don't see any discussion of the
> target scale or plans to achieve it?
>
>
>
> On Thu, Nov 9, 2023 at 9:02 PM Zhou Jiang  wrote:
>
>> Hi Spark community,
>>
>> I'm reaching out to initiate a conversation about the possibility of
>> developing a Java-based Kubernetes operator for Apache Spark. Following the
>> operator pattern (
>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
>> users may manage applications and related components seamlessly using
>> native tools like kubectl. The primary goal is to simplify the Spark user
>> experience on Kubernetes, minimizing the learning curve and operational
>> complexities and therefore enable users to focus on the Spark application
>> development.
>>
>> Although there are several open-source Spark on Kubernetes operators
>> available, none of them are officially integrated into the Apache Spark
>> project. As a result, these operators may lack active support and
>> development for new features. Within this proposal, our aim is to introduce
>> a Java-based Spark operator as an integral component of the Apache Spark
>> project. This solution has been employed internally at Apple for multiple
>> years, operating millions of executors in real production environments. The
>> use of Java in this solution is intended to accommodate a wider user and
>> contributor audience, especially those who are familiar with Scala.
>>
>> Ideally, this operator should have its dedicated repository, similar to
>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>> connection with the Spark release cycle. This model is also followed by the
>> Apache Flink Kubernetes operator.
>>
>> We believe that this project holds the potential to evolve into a
>> thriving community project over the long run. A 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Holden Karau
To be clear: I am generally supportive of the idea (+1) but have some
follow-up questions:

Have we taken the time to learn from the other operators? Do we have a
compatible CRD/API or not (and if so why?)
The API seems to assume that everything is packaged in the container in
advance, but I imagine that might not be the case for many folks who have
Java or Python packages published to cloud storage and they want to use?
What's our plan for the testing on the potential version explosion (not
tying ourselves to operator version -> spark version makes a lot of sense,
but how do we reasonably assure ourselves that the cross product of
Operator Version, Kube Version, and Spark Version all function)? Do we have
CI resources for this?
Is there a current (non-open source operator) that folks from Apple are
using and planning to open source, or is this a fresh "from the ground up"
operator proposal?
One of the key reasons for this is listed as "An out-of-the-box automation
solution that scales effectively" but I don't see any discussion of the
target scale or plans to achieve it?



On Thu, Nov 9, 2023 at 9:02 PM Zhou Jiang  wrote:

> Hi Spark community,
>
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
> users may manage applications and related components seamlessly using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
>
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
>
> Ideally, this operator should have its dedicated repository, similar to
> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by the
> Apache Flink Kubernetes operator.
>
> We believe that this project holds the potential to evolve into a thriving
> community project over the long run. A comparison can be drawn with the
> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes
> operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator). This move has
> gained wide industry adoption and contributions from the community. In a
> mere year, the Flink operator has garnered more than 600 stars and has
> attracted contributions from over 80 contributors. This showcases the level
> of community interest and collaborative momentum that can be achieved in
> similar scenarios.
>
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>
> Thanks,
> --
> *Zhou JIANG*
>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Zhou Jiang
resending cc dev for record - sorry forgot to reply all earlier :)

For 1 - I'm more leaning towards 'official' as this aims to provide Spark
users a community-recommended way to automate and manage Spark deployments
on k8s. It does not mean the current / other options would become
off-standard from my point of view.

For 2/3 - as the operator starts driver pods in the same way as
spark-submit, I would not expect start-up time to be significantly reduced
by using the operator. However there are indeed some optimizations we can
do in practice. For example, with operator we can enable users to separate
the application packaging from Spark: use an init container to load Spark
binary, and apply application jar / packages on top of that in a
different container. The benefit is - application image or package would be
relatively lean and therefore, taking less time to upload to registry or to
download onto nodes. Spark images could be relatively static (e.g. use the
official docker images  ) and hence
can be cached on nodes. There are more technical details that can be
discussed in the upcoming design doc if we agree to proceed with the
operator proposal.

On Fri, Nov 10, 2023 at 8:11 AM Mich Talebzadeh 
wrote:

> Hi,
>
> Looks like a good idea but before committing myself, I have a number of
> design questions having looked at SPIP itself:
>
>
>1. Will the name "Standard add-on Kubernetes operator to Spark ''
>describe it better?
>2. We  are still struggling with improving Spark driver start-up time.
>What would be the footprint of this add-on on the driver start-up time?
>3. In  a commercial world will there be (?) a static image for this
>besides the base image that is maintained in the so called  container
>registry (ECR, GCR etc), It takes time to upload these images. Will this
>bea  static image (docker file)? Other alternative would be that this
>docker file is created by the user through set of scripts?
>
>
> These are the things that come into my mind.
>
> HTH
>
>
> Mich Talebzadeh,
> Distinguished Technologist, Solutions Architect & Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 10 Nov 2023 at 14:19, Bjørn Jørgensen 
> wrote:
>
>> +1
>>
>> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :
>>
>>> just curious what happened on google’s spark operator?
>>>
>>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>
>> +1 for creating an official Kubernetes operator for Apache Spark
>>
>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>> wrote:
>>
>>> +1
>>>
>>
>>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>>
 +1

 To be completely transparent, I am employed in the same department
 as Zhou at Apple.

 I support this proposal, provided that we witness community
 adoption following the release of the Flink Kubernetes operator,
 streamlining Flink deployment on Kubernetes.

 A well-maintained official Spark Kubernetes operator is essential
 for our Spark community as well.

 DB Tsai  |  https://www.dbtsai.com/
 
  |  PGP 42E5B25A8F7A82C1

 On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
 wrote:

 Hi Spark community,
 I'm reaching out to initiate a conversation about the possibility
 of developing a Java-based Kubernetes operator for Apache Spark. 
 Following
 the operator pattern (
 https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Zhou Jiang
I'd say that's actually the other way round. A user may either
1. Use spark-submit, this works with or without operator. Or,
2. Deploy the operator, create the Spark Applications with kubectl /
clients - so that the Operator does spark-submit for you.
We may also continue this discussion in the proposal doc.

On Fri, Nov 10, 2023 at 8:57 PM Cheng Pan  wrote:

> > Not really - this is not designed to be a replacement for the current
> approach.
>
> That's what I assumed too. But my question is, as a user, how to write a
> spark-submit command to submit a Spark app to leverage this operator?
>
> Thanks,
> Cheng Pan
>
>
> > On Nov 11, 2023, at 03:21, Zhou Jiang  wrote:
> >
> > Not really - this is not designed to be a replacement for the current
> approach. Kubernetes operator fits in the scenario for automation and
> application lifecycle management at scale. Users can choose between
> spark-submit and operator approach based on their specific needs and
> requirements.
> >
> > On Thu, Nov 9, 2023 at 9:16 PM Cheng Pan  wrote:
> > Thanks for this impressive proposal, I have a basic question, how does
> spark-submit work with this operator? Or it enforces that we must use
> `kubectl apply -f spark-job.yaml`(or K8s client in programming way) to
> submit Spark app?
> >
> > Thanks,
> > Cheng Pan
> >
> >
> > > On Nov 10, 2023, at 04:05, Zhou Jiang  wrote:
> > >
> > > Hi Spark community,
> > > I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
> users may manage applications and related components seamlessly using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
> > > Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
> > > Ideally, this operator should have its dedicated repository, similar
> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by the
> Apache Flink Kubernetes operator.
> > > We believe that this project holds the potential to evolve into a
> thriving community project over the long run. A comparison can be drawn
> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
> Kubernetes operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator). This move has
> gained wide industry adoption and contributions from the community. In a
> mere year, the Flink operator has garnered more than 600 stars and has
> attracted contributions from over 80 contributors. This showcases the level
> of community interest and collaborative momentum that can be achieved in
> similar scenarios.
> > > More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> > > Thanks,--
> > > Zhou JIANG
> > >
> >
> >
> >
> > --
> > Zhou JIANG
> >
>
>

-- 
*Zhou JIANG*


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-11 Thread Mich Talebzadeh
Thanks Zhou for your response to my points raised (private communication)

If we start with a base model and cluster, minimal footprint for the tool, then
we can establish the operational parameters needed. So +1 for me too.

HTH



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 10 Nov 2023 at 05:02, Zhou Jiang  wrote:

> Hi Spark community,
>
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
> users may manage applications and related components seamlessly using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
>
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
>
> Ideally, this operator should have its dedicated repository, similar to
> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by the
> Apache Flink Kubernetes operator.
>
> We believe that this project holds the potential to evolve into a thriving
> community project over the long run. A comparison can be drawn with the
> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes
> operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator). This move has
> gained wide industry adoption and contributions from the community. In a
> mere year, the Flink operator has garnered more than 600 stars and has
> attracted contributions from over 80 contributors. This showcases the level
> of community interest and collaborative momentum that can be achieved in
> similar scenarios.
>
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>
> Thanks,
> --
> *Zhou JIANG*
>
>


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Cheng Pan
> Not really - this is not designed to be a replacement for the current 
> approach.

That's what I assumed too. But my question is, as a user, how to write a 
spark-submit command to submit a Spark app to leverage this operator?

Thanks,
Cheng Pan


> On Nov 11, 2023, at 03:21, Zhou Jiang  wrote:
> 
> Not really - this is not designed to be a replacement for the current 
> approach. Kubernetes operator fits in the scenario for automation and 
> application lifecycle management at scale. Users can choose between 
> spark-submit and operator approach based on their specific needs and 
> requirements.
> 
> On Thu, Nov 9, 2023 at 9:16 PM Cheng Pan  wrote:
> Thanks for this impressive proposal, I have a basic question, how does 
> spark-submit work with this operator? Or it enforces that we must use 
> `kubectl apply -f spark-job.yaml`(or K8s client in programming way) to submit 
> Spark app?
> 
> Thanks,
> Cheng Pan
> 
> 
> > On Nov 10, 2023, at 04:05, Zhou Jiang  wrote:
> > 
> > Hi Spark community,
> > I'm reaching out to initiate a conversation about the possibility of 
> > developing a Java-based Kubernetes operator for Apache Spark. Following the 
> > operator pattern 
> > (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
> > users may manage applications and related components seamlessly using 
> > native tools like kubectl. The primary goal is to simplify the Spark user 
> > experience on Kubernetes, minimizing the learning curve and operational 
> > complexities and therefore enable users to focus on the Spark application 
> > development.
> > Although there are several open-source Spark on Kubernetes operators 
> > available, none of them are officially integrated into the Apache Spark 
> > project. As a result, these operators may lack active support and 
> > development for new features. Within this proposal, our aim is to introduce 
> > a Java-based Spark operator as an integral component of the Apache Spark 
> > project. This solution has been employed internally at Apple for multiple 
> > years, operating millions of executors in real production environments. The 
> > use of Java in this solution is intended to accommodate a wider user and 
> > contributor audience, especially those who are familiar with Scala.
> > Ideally, this operator should have its dedicated repository, similar to 
> > Spark Connect Golang or Spark Docker, allowing it to maintain a loose 
> > connection with the Spark release cycle. This model is also followed by the 
> > Apache Flink Kubernetes operator.
> > We believe that this project holds the potential to evolve into a thriving 
> > community project over the long run. A comparison can be drawn with the 
> > Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes 
> > operator, making it a part of the Apache Flink project 
> > (https://github.com/apache/flink-kubernetes-operator). This move has gained 
> > wide industry adoption and contributions from the community. In a mere 
> > year, the Flink operator has garnered more than 600 stars and has attracted 
> > contributions from over 80 contributors. This showcases the level of 
> > community interest and collaborative momentum that can be achieved in 
> > similar scenarios.
> > More details can be found at SPIP doc : Spark Kubernetes Operator 
> > https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> > Thanks,-- 
> > Zhou JIANG
> > 
> 
> 
> 
> -- 
> Zhou JIANG
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread kazuyuki tanimura
+1

Kazu

> On Nov 10, 2023, at 10:05 AM, Khalid Mammadov  
> wrote:
> 
> +1
> 
> On Fri, 10 Nov 2023, 15:23 Peter Toth,  > wrote:
>> +1
>> 
>> On Fri, Nov 10, 2023, 14:09 Bjørn Jørgensen > > wrote:
>>> +1
>>> 
>>> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu >> >:
 just curious what happened on google’s spark operator? 
 
 On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko >>> > wrote:
> +1
> 
> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  > wrote:
>> +1
>> 
>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala > > wrote:
>>> +1 for creating an official Kubernetes operator for Apache Spark
>>> 
>>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao >> > wrote:
 +1
> 
 
 On Thu, Nov 9, 2023 at 3:14 PM DB Tsai >>> > wrote:
> +1
> 
> To be completely transparent, I am employed in the same department as 
> Zhou at Apple.
> 
> I support this proposal, provided that we witness community adoption 
> following the release of the Flink Kubernetes operator, streamlining 
> Flink deployment on Kubernetes. 
> 
> A well-maintained official Spark Kubernetes operator is essential for 
> our Spark community as well.
> 
> DB Tsai  |  https://www.dbtsai.com/ 
> 
>   |  PGP 42E5B25A8F7A82C1
> 
>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang > > wrote:
>> 
>> Hi Spark community,
>> I'm reaching out to initiate a conversation about the possibility of 
>> developing a Java-based Kubernetes operator for Apache Spark. 
>> Following the operator pattern 
>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ 
>> ),
>>  Spark users may manage applications and related components 
>> seamlessly using native tools like kubectl. The primary goal is to 
>> simplify the Spark user experience on Kubernetes, minimizing the 
>> learning curve and operational complexities and therefore enable 
>> users to focus on the Spark application development.
>> Although there are several open-source Spark on Kubernetes operators 
>> available, none of them are officially integrated into the Apache 
>> Spark project. As a result, these operators may lack active support 
>> and development for new features. Within this proposal, our aim is 
>> to introduce a Java-based Spark operator as an integral component of 
>> the Apache Spark project. This solution has been employed internally 
>> at Apple for multiple years, operating millions of executors in real 
>> production environments. The use of Java in this solution is 
>> intended to accommodate a wider user and contributor audience, 
>> especially those who are familiar with Scala.
>> Ideally, this operator should have its dedicated repository, similar 
>> to Spark Connect Golang or Spark Docker, allowing it to maintain a 
>> loose connection with the Spark release cycle. This model is also 
>> followed by the Apache Flink Kubernetes operator.
>> We believe that this project holds the potential to evolve into a 
>> thriving community project over the long run. A comparison can be 
>> drawn with the Flink Kubernetes Operator: Apple has open-sourced 
>> internal Flink Kubernetes operator, making it a part of the Apache 
>> Flink project (https://github.com/apache/flink-kubernetes-operator 
>> 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Khalid Mammadov
+1

On Fri, 10 Nov 2023, 15:23 Peter Toth,  wrote:

> +1
>
> On Fri, Nov 10, 2023, 14:09 Bjørn Jørgensen 
> wrote:
>
>> +1
>>
>> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :
>>
>>> just curious what happened on google’s spark operator?
>>>
>>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>
>> +1 for creating an official Kubernetes operator for Apache Spark
>>
>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>> wrote:
>>
>>> +1
>>>
>>
>>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>>
 +1

 To be completely transparent, I am employed in the same department
 as Zhou at Apple.

 I support this proposal, provided that we witness community
 adoption following the release of the Flink Kubernetes operator,
 streamlining Flink deployment on Kubernetes.

 A well-maintained official Spark Kubernetes operator is essential
 for our Spark community as well.

 DB Tsai  |  https://www.dbtsai.com/
 
  |  PGP 42E5B25A8F7A82C1

 On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
 wrote:

 Hi Spark community,
 I'm reaching out to initiate a conversation about the possibility
 of developing a Java-based Kubernetes operator for Apache Spark. 
 Following
 the operator pattern (
 https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
 ),
 Spark users may manage applications and related components seamlessly 
 using
 native tools like kubectl. The primary goal is to simplify the Spark 
 user
 experience on Kubernetes, minimizing the learning curve and operational
 complexities and therefore enable users to focus on the Spark 
 application
 development.
 Although there are several open-source Spark on Kubernetes
 operators available, none of them are officially integrated into the 
 Apache
 Spark project. As a result, these operators may lack active support and
 development for new features. Within this proposal, our aim is to 
 introduce
 a Java-based Spark operator as an integral component of the Apache 
 Spark
 project. This solution has been employed internally at Apple for 
 multiple
 years, operating millions of executors in real production 
 environments. The
 use of Java in this solution is intended to accommodate a wider user 
 and
 contributor audience, especially those who are familiar with Scala.
 Ideally, this operator should have its dedicated repository,
 similar to Spark Connect Golang or Spark Docker, allowing it to 
 maintain a
 loose connection with the Spark release cycle. This model is also 
 followed
 by the Apache Flink Kubernetes operator.
 We believe that this project holds the potential to evolve into a
 thriving community project over the long run. A comparison can be drawn
 with the Flink Kubernetes Operator: Apple has open-sourced internal 
 Flink
 Kubernetes operator, making it a part of the Apache Flink project (
 https://github.com/apache/flink-kubernetes-operator
 ).
 This move has gained wide industry adoption and contributions from the
 community. In a mere year, the Flink operator has garnered more than 
 600
 stars and has attracted contributions from over 80 contributors. This
 showcases the 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Mich Talebzadeh
Hi,

Looks like a good idea but before committing myself, I have a number of
design questions having looked at SPIP itself:


   1. Will the name "Standard add-on Kubernetes operator to Spark ''
   describe it better?
   2. We  are still struggling with improving Spark driver start-up time.
   What would be the footprint of this add-on on the driver start-up time?
   3. In  a commercial world will there be (?) a static image for this
   besides the base image that is maintained in the so called  container
   registry (ECR, GCR etc), It takes time to upload these images. Will this
   bea  static image (docker file)? Other alternative would be that this
   docker file is created by the user through set of scripts?


These are the things that come into my mind.

HTH


Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 10 Nov 2023 at 14:19, Bjørn Jørgensen 
wrote:

> +1
>
> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :
>
>> just curious what happened on google’s spark operator?
>>
>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>>
>>> +1
>>>
>>> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:

> +1 for creating an official Kubernetes operator for Apache Spark
>
> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
> wrote:
>
>> +1
>>
>
>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>
>>> +1
>>>
>>> To be completely transparent, I am employed in the same department
>>> as Zhou at Apple.
>>>
>>> I support this proposal, provided that we witness community adoption
>>> following the release of the Flink Kubernetes operator, streamlining 
>>> Flink
>>> deployment on Kubernetes.
>>>
>>> A well-maintained official Spark Kubernetes operator is essential
>>> for our Spark community as well.
>>>
>>> DB Tsai  |  https://www.dbtsai.com/
>>> 
>>>  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
>>> wrote:
>>>
>>> Hi Spark community,
>>> I'm reaching out to initiate a conversation about the possibility of
>>> developing a Java-based Kubernetes operator for Apache Spark. Following 
>>> the
>>> operator pattern (
>>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>>> ),
>>> Spark users may manage applications and related components seamlessly 
>>> using
>>> native tools like kubectl. The primary goal is to simplify the Spark 
>>> user
>>> experience on Kubernetes, minimizing the learning curve and operational
>>> complexities and therefore enable users to focus on the Spark 
>>> application
>>> development.
>>> Although there are several open-source Spark on Kubernetes operators
>>> available, none of them are officially integrated into the Apache Spark
>>> project. As a result, these operators may lack active support and
>>> development for new features. Within this proposal, our aim is to 
>>> introduce
>>> a Java-based Spark operator as an integral component of the Apache Spark
>>> project. This solution has been employed internally at Apple for 
>>> multiple
>>> years, operating millions of executors in real production environments. 
>>> The
>>> use of Java in this solution is intended to accommodate a wider user and
>>> contributor audience, especially those who are familiar with Scala.
>>> Ideally, this operator should have its dedicated repository, similar
>>> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>>> connection with the Spark 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Peter Toth
+1

On Fri, Nov 10, 2023, 14:09 Bjørn Jørgensen 
wrote:

> +1
>
> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :
>
>> just curious what happened on google’s spark operator?
>>
>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>>
>>> +1
>>>
>>> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:

> +1 for creating an official Kubernetes operator for Apache Spark
>
> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
> wrote:
>
>> +1
>>
>
>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>
>>> +1
>>>
>>> To be completely transparent, I am employed in the same department
>>> as Zhou at Apple.
>>>
>>> I support this proposal, provided that we witness community adoption
>>> following the release of the Flink Kubernetes operator, streamlining 
>>> Flink
>>> deployment on Kubernetes.
>>>
>>> A well-maintained official Spark Kubernetes operator is essential
>>> for our Spark community as well.
>>>
>>> DB Tsai  |  https://www.dbtsai.com/
>>> 
>>>  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
>>> wrote:
>>>
>>> Hi Spark community,
>>> I'm reaching out to initiate a conversation about the possibility of
>>> developing a Java-based Kubernetes operator for Apache Spark. Following 
>>> the
>>> operator pattern (
>>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>>> ),
>>> Spark users may manage applications and related components seamlessly 
>>> using
>>> native tools like kubectl. The primary goal is to simplify the Spark 
>>> user
>>> experience on Kubernetes, minimizing the learning curve and operational
>>> complexities and therefore enable users to focus on the Spark 
>>> application
>>> development.
>>> Although there are several open-source Spark on Kubernetes operators
>>> available, none of them are officially integrated into the Apache Spark
>>> project. As a result, these operators may lack active support and
>>> development for new features. Within this proposal, our aim is to 
>>> introduce
>>> a Java-based Spark operator as an integral component of the Apache Spark
>>> project. This solution has been employed internally at Apple for 
>>> multiple
>>> years, operating millions of executors in real production environments. 
>>> The
>>> use of Java in this solution is intended to accommodate a wider user and
>>> contributor audience, especially those who are familiar with Scala.
>>> Ideally, this operator should have its dedicated repository, similar
>>> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>>> connection with the Spark release cycle. This model is also followed by 
>>> the
>>> Apache Flink Kubernetes operator.
>>> We believe that this project holds the potential to evolve into a
>>> thriving community project over the long run. A comparison can be drawn
>>> with the Flink Kubernetes Operator: Apple has open-sourced internal 
>>> Flink
>>> Kubernetes operator, making it a part of the Apache Flink project (
>>> https://github.com/apache/flink-kubernetes-operator
>>> ).
>>> This move has gained wide industry adoption and contributions from the
>>> community. In a mere year, the Flink operator has garnered more than 600
>>> stars and has attracted contributions from over 80 contributors. This
>>> showcases the level of community interest and collaborative momentum 
>>> that
>>> can be achieved in similar scenarios.
>>> More details can be found at SPIP doc : Spark Kubernetes Operator

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Bjørn Jørgensen
+1

fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :

> just curious what happened on google’s spark operator?
>
> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>
>> +1
>>
>> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>>
>>> +1
>>>
>>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>>>
 +1 for creating an official Kubernetes operator for Apache Spark

 On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
 wrote:

> +1
>

> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>
>> +1
>>
>> To be completely transparent, I am employed in the same department as
>> Zhou at Apple.
>>
>> I support this proposal, provided that we witness community adoption
>> following the release of the Flink Kubernetes operator, streamlining 
>> Flink
>> deployment on Kubernetes.
>>
>> A well-maintained official Spark Kubernetes operator is essential for
>> our Spark community as well.
>>
>> DB Tsai  |  https://www.dbtsai.com/
>> 
>>  |  PGP 42E5B25A8F7A82C1
>>
>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
>> wrote:
>>
>> Hi Spark community,
>> I'm reaching out to initiate a conversation about the possibility of
>> developing a Java-based Kubernetes operator for Apache Spark. Following 
>> the
>> operator pattern (
>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>> ),
>> Spark users may manage applications and related components seamlessly 
>> using
>> native tools like kubectl. The primary goal is to simplify the Spark user
>> experience on Kubernetes, minimizing the learning curve and operational
>> complexities and therefore enable users to focus on the Spark application
>> development.
>> Although there are several open-source Spark on Kubernetes operators
>> available, none of them are officially integrated into the Apache Spark
>> project. As a result, these operators may lack active support and
>> development for new features. Within this proposal, our aim is to 
>> introduce
>> a Java-based Spark operator as an integral component of the Apache Spark
>> project. This solution has been employed internally at Apple for multiple
>> years, operating millions of executors in real production environments. 
>> The
>> use of Java in this solution is intended to accommodate a wider user and
>> contributor audience, especially those who are familiar with Scala.
>> Ideally, this operator should have its dedicated repository, similar
>> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>> connection with the Spark release cycle. This model is also followed by 
>> the
>> Apache Flink Kubernetes operator.
>> We believe that this project holds the potential to evolve into a
>> thriving community project over the long run. A comparison can be drawn
>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
>> Kubernetes operator, making it a part of the Apache Flink project (
>> https://github.com/apache/flink-kubernetes-operator
>> ).
>> This move has gained wide industry adoption and contributions from the
>> community. In a mere year, the Flink operator has garnered more than 600
>> stars and has attracted contributions from over 80 contributors. This
>> showcases the level of community interest and collaborative momentum that
>> can be achieved in similar scenarios.
>> More details can be found at SPIP doc : Spark Kubernetes Operator
>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>> 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Yuming Wang
+1

On Fri, Nov 10, 2023 at 10:01 AM Ilan Filonenko  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>
>> +1
>>
>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>>
>>> +1 for creating an official Kubernetes operator for Apache Spark
>>>
>>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>>> wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:

> +1
>
> To be completely transparent, I am employed in the same department as
> Zhou at Apple.
>
> I support this proposal, provided that we witness community adoption
> following the release of the Flink Kubernetes operator, streamlining Flink
> deployment on Kubernetes.
>
> A well-maintained official Spark Kubernetes operator is essential for
> our Spark community as well.
>
> DB Tsai  |  https://www.dbtsai.com/
> 
>  |  PGP 42E5B25A8F7A82C1
>
> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
> wrote:
>
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following 
> the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
> ),
> Spark users may manage applications and related components seamlessly 
> using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to 
> introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. 
> The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar
> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by 
> the
> Apache Flink Kubernetes operator.
> We believe that this project holds the potential to evolve into a
> thriving community project over the long run. A comparison can be drawn
> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
> Kubernetes operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator
> ).
> This move has gained wide industry adoption and contributions from the
> community. In a mere year, the Flink operator has garnered more than 600
> stars and has attracted contributions from over 80 contributors. This
> showcases the level of community interest and collaborative momentum that
> can be achieved in similar scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Cheng Pan
Thanks for this impressive proposal, I have a basic question, how does 
spark-submit work with this operator? Or it enforces that we must use `kubectl 
apply -f spark-job.yaml`(or K8s client in programming way) to submit Spark app?

Thanks,
Cheng Pan


> On Nov 10, 2023, at 04:05, Zhou Jiang  wrote:
> 
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of 
> developing a Java-based Kubernetes operator for Apache Spark. Following the 
> operator pattern 
> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
> users may manage applications and related components seamlessly using native 
> tools like kubectl. The primary goal is to simplify the Spark user experience 
> on Kubernetes, minimizing the learning curve and operational complexities and 
> therefore enable users to focus on the Spark application development.
> Although there are several open-source Spark on Kubernetes operators 
> available, none of them are officially integrated into the Apache Spark 
> project. As a result, these operators may lack active support and development 
> for new features. Within this proposal, our aim is to introduce a Java-based 
> Spark operator as an integral component of the Apache Spark project. This 
> solution has been employed internally at Apple for multiple years, operating 
> millions of executors in real production environments. The use of Java in 
> this solution is intended to accommodate a wider user and contributor 
> audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar to Spark 
> Connect Golang or Spark Docker, allowing it to maintain a loose connection 
> with the Spark release cycle. This model is also followed by the Apache Flink 
> Kubernetes operator.
> We believe that this project holds the potential to evolve into a thriving 
> community project over the long run. A comparison can be drawn with the Flink 
> Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes 
> operator, making it a part of the Apache Flink project 
> (https://github.com/apache/flink-kubernetes-operator). This move has gained 
> wide industry adoption and contributions from the community. In a mere year, 
> the Flink operator has garnered more than 600 stars and has attracted 
> contributions from over 80 contributors. This showcases the level of 
> community interest and collaborative momentum that can be achieved in similar 
> scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator 
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> Thanks,-- 
> Zhou JIANG
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread L. C. Hsieh
+1

On Thu, Nov 9, 2023 at 7:57 PM Chao Sun  wrote:
>
> +1
>
>
> On Thu, Nov 9, 2023 at 6:36 PM Xiao Li  wrote:
> >
> > +1
> >
> > huaxin gao  于2023年11月9日周四 16:53写道:
> >>
> >> +1
> >>
> >> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
> >>>
> >>> +1
> >>>
> >>> To be completely transparent, I am employed in the same department as 
> >>> Zhou at Apple.
> >>>
> >>> I support this proposal, provided that we witness community adoption 
> >>> following the release of the Flink Kubernetes operator, streamlining 
> >>> Flink deployment on Kubernetes.
> >>>
> >>> A well-maintained official Spark Kubernetes operator is essential for our 
> >>> Spark community as well.
> >>>
> >>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
> >>>
> >>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
> >>>
> >>> Hi Spark community,
> >>>
> >>> I'm reaching out to initiate a conversation about the possibility of 
> >>> developing a Java-based Kubernetes operator for Apache Spark. Following 
> >>> the operator pattern 
> >>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
> >>> users may manage applications and related components seamlessly using 
> >>> native tools like kubectl. The primary goal is to simplify the Spark user 
> >>> experience on Kubernetes, minimizing the learning curve and operational 
> >>> complexities and therefore enable users to focus on the Spark application 
> >>> development.
> >>> Although there are several open-source Spark on Kubernetes operators 
> >>> available, none of them are officially integrated into the Apache Spark 
> >>> project. As a result, these operators may lack active support and 
> >>> development for new features. Within this proposal, our aim is to 
> >>> introduce a Java-based Spark operator as an integral component of the 
> >>> Apache Spark project. This solution has been employed internally at Apple 
> >>> for multiple years, operating millions of executors in real production 
> >>> environments. The use of Java in this solution is intended to accommodate 
> >>> a wider user and contributor audience, especially those who are familiar 
> >>> with Scala.
> >>> Ideally, this operator should have its dedicated repository, similar to 
> >>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose 
> >>> connection with the Spark release cycle. This model is also followed by 
> >>> the Apache Flink Kubernetes operator.
> >>> We believe that this project holds the potential to evolve into a 
> >>> thriving community project over the long run. A comparison can be drawn 
> >>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink 
> >>> Kubernetes operator, making it a part of the Apache Flink project 
> >>> (https://github.com/apache/flink-kubernetes-operator). This move has 
> >>> gained wide industry adoption and contributions from the community. In a 
> >>> mere year, the Flink operator has garnered more than 600 stars and has 
> >>> attracted contributions from over 80 contributors. This showcases the 
> >>> level of community interest and collaborative momentum that can be 
> >>> achieved in similar scenarios.
> >>> More details can be found at SPIP doc : Spark Kubernetes Operator 
> >>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>> Zhou JIANG
> >>>
> >>>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Nan Zhu
just curious what happened on google’s spark operator?

On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>
>> +1
>>
>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>>
>>> +1 for creating an official Kubernetes operator for Apache Spark
>>>
>>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>>> wrote:
>>>
 +1

>>>
 On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:

> +1
>
> To be completely transparent, I am employed in the same department as
> Zhou at Apple.
>
> I support this proposal, provided that we witness community adoption
> following the release of the Flink Kubernetes operator, streamlining Flink
> deployment on Kubernetes.
>
> A well-maintained official Spark Kubernetes operator is essential for
> our Spark community as well.
>
> DB Tsai  |  https://www.dbtsai.com/
> 
>  |  PGP 42E5B25A8F7A82C1
>
> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
> wrote:
>
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following 
> the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
> ),
> Spark users may manage applications and related components seamlessly 
> using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to 
> introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. 
> The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar
> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by 
> the
> Apache Flink Kubernetes operator.
> We believe that this project holds the potential to evolve into a
> thriving community project over the long run. A comparison can be drawn
> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
> Kubernetes operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator
> ).
> This move has gained wide industry adoption and contributions from the
> community. In a mere year, the Flink operator has garnered more than 600
> stars and has attracted contributions from over 80 contributors. This
> showcases the level of community interest and collaborative momentum that
> can be achieved in similar scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Chao Sun
+1


On Thu, Nov 9, 2023 at 6:36 PM Xiao Li  wrote:
>
> +1
>
> huaxin gao  于2023年11月9日周四 16:53写道:
>>
>> +1
>>
>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>>
>>> +1
>>>
>>> To be completely transparent, I am employed in the same department as Zhou 
>>> at Apple.
>>>
>>> I support this proposal, provided that we witness community adoption 
>>> following the release of the Flink Kubernetes operator, streamlining Flink 
>>> deployment on Kubernetes.
>>>
>>> A well-maintained official Spark Kubernetes operator is essential for our 
>>> Spark community as well.
>>>
>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>>>
>>> Hi Spark community,
>>>
>>> I'm reaching out to initiate a conversation about the possibility of 
>>> developing a Java-based Kubernetes operator for Apache Spark. Following the 
>>> operator pattern 
>>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
>>> users may manage applications and related components seamlessly using 
>>> native tools like kubectl. The primary goal is to simplify the Spark user 
>>> experience on Kubernetes, minimizing the learning curve and operational 
>>> complexities and therefore enable users to focus on the Spark application 
>>> development.
>>> Although there are several open-source Spark on Kubernetes operators 
>>> available, none of them are officially integrated into the Apache Spark 
>>> project. As a result, these operators may lack active support and 
>>> development for new features. Within this proposal, our aim is to introduce 
>>> a Java-based Spark operator as an integral component of the Apache Spark 
>>> project. This solution has been employed internally at Apple for multiple 
>>> years, operating millions of executors in real production environments. The 
>>> use of Java in this solution is intended to accommodate a wider user and 
>>> contributor audience, especially those who are familiar with Scala.
>>> Ideally, this operator should have its dedicated repository, similar to 
>>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose 
>>> connection with the Spark release cycle. This model is also followed by the 
>>> Apache Flink Kubernetes operator.
>>> We believe that this project holds the potential to evolve into a thriving 
>>> community project over the long run. A comparison can be drawn with the 
>>> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes 
>>> operator, making it a part of the Apache Flink project 
>>> (https://github.com/apache/flink-kubernetes-operator). This move has gained 
>>> wide industry adoption and contributions from the community. In a mere 
>>> year, the Flink operator has garnered more than 600 stars and has attracted 
>>> contributions from over 80 contributors. This showcases the level of 
>>> community interest and collaborative momentum that can be achieved in 
>>> similar scenarios.
>>> More details can be found at SPIP doc : Spark Kubernetes Operator 
>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>>
>>> Thanks,
>>>
>>> --
>>> Zhou JIANG
>>>
>>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Xiao Li
+1

huaxin gao  于2023年11月9日周四 16:53写道:

> +1
>
> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>
>> +1
>>
>> To be completely transparent, I am employed in the same department as
>> Zhou at Apple.
>>
>> I support this proposal, provided that we witness community adoption
>> following the release of the Flink Kubernetes operator, streamlining Flink
>> deployment on Kubernetes.
>>
>> A well-maintained official Spark Kubernetes operator is essential for our
>> Spark community as well.
>>
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>
>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>>
>> Hi Spark community,
>> I'm reaching out to initiate a conversation about the possibility of
>> developing a Java-based Kubernetes operator for Apache Spark. Following the
>> operator pattern (
>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
>> users may manage applications and related components seamlessly using
>> native tools like kubectl. The primary goal is to simplify the Spark user
>> experience on Kubernetes, minimizing the learning curve and operational
>> complexities and therefore enable users to focus on the Spark application
>> development.
>> Although there are several open-source Spark on Kubernetes operators
>> available, none of them are officially integrated into the Apache Spark
>> project. As a result, these operators may lack active support and
>> development for new features. Within this proposal, our aim is to introduce
>> a Java-based Spark operator as an integral component of the Apache Spark
>> project. This solution has been employed internally at Apple for multiple
>> years, operating millions of executors in real production environments. The
>> use of Java in this solution is intended to accommodate a wider user and
>> contributor audience, especially those who are familiar with Scala.
>> Ideally, this operator should have its dedicated repository, similar to
>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>> connection with the Spark release cycle. This model is also followed by the
>> Apache Flink Kubernetes operator.
>> We believe that this project holds the potential to evolve into a
>> thriving community project over the long run. A comparison can be drawn
>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
>> Kubernetes operator, making it a part of the Apache Flink project (
>> https://github.com/apache/flink-kubernetes-operator). This move has
>> gained wide industry adoption and contributions from the community. In a
>> mere year, the Flink operator has garnered more than 600 stars and has
>> attracted contributions from over 80 contributors. This showcases the level
>> of community interest and collaborative momentum that can be achieved in
>> similar scenarios.
>> More details can be found at SPIP doc : Spark Kubernetes Operator
>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>
>> Thanks,
>> --
>> *Zhou JIANG*
>>
>>
>>


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Ilan Filonenko
+1

On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>
>> +1 for creating an official Kubernetes operator for Apache Spark
>>
>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>> wrote:
>>
>>> +1
>>>
>>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>>
 +1

 To be completely transparent, I am employed in the same department as
 Zhou at Apple.

 I support this proposal, provided that we witness community adoption
 following the release of the Flink Kubernetes operator, streamlining Flink
 deployment on Kubernetes.

 A well-maintained official Spark Kubernetes operator is essential for
 our Spark community as well.

 DB Tsai  |  https://www.dbtsai.com/
 
  |  PGP 42E5B25A8F7A82C1

 On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:

 Hi Spark community,
 I'm reaching out to initiate a conversation about the possibility of
 developing a Java-based Kubernetes operator for Apache Spark. Following the
 operator pattern (
 https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
 ),
 Spark users may manage applications and related components seamlessly using
 native tools like kubectl. The primary goal is to simplify the Spark user
 experience on Kubernetes, minimizing the learning curve and operational
 complexities and therefore enable users to focus on the Spark application
 development.
 Although there are several open-source Spark on Kubernetes operators
 available, none of them are officially integrated into the Apache Spark
 project. As a result, these operators may lack active support and
 development for new features. Within this proposal, our aim is to introduce
 a Java-based Spark operator as an integral component of the Apache Spark
 project. This solution has been employed internally at Apple for multiple
 years, operating millions of executors in real production environments. The
 use of Java in this solution is intended to accommodate a wider user and
 contributor audience, especially those who are familiar with Scala.
 Ideally, this operator should have its dedicated repository, similar to
 Spark Connect Golang or Spark Docker, allowing it to maintain a loose
 connection with the Spark release cycle. This model is also followed by the
 Apache Flink Kubernetes operator.
 We believe that this project holds the potential to evolve into a
 thriving community project over the long run. A comparison can be drawn
 with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
 Kubernetes operator, making it a part of the Apache Flink project (
 https://github.com/apache/flink-kubernetes-operator
 ).
 This move has gained wide industry adoption and contributions from the
 community. In a mere year, the Flink operator has garnered more than 600
 stars and has attracted contributions from over 80 contributors. This
 showcases the level of community interest and collaborative momentum that
 can be achieved in similar scenarios.
 More details can be found at SPIP doc : Spark Kubernetes Operator
 https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
 

 Thanks,
 --
 *Zhou 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Ryan Blue
+1

On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:

> +1 for creating an official Kubernetes operator for Apache Spark
>
> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
> wrote:
>
>> +1
>>
>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>
>>> +1
>>>
>>> To be completely transparent, I am employed in the same department as
>>> Zhou at Apple.
>>>
>>> I support this proposal, provided that we witness community adoption
>>> following the release of the Flink Kubernetes operator, streamlining Flink
>>> deployment on Kubernetes.
>>>
>>> A well-maintained official Spark Kubernetes operator is essential for
>>> our Spark community as well.
>>>
>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>>>
>>> Hi Spark community,
>>> I'm reaching out to initiate a conversation about the possibility of
>>> developing a Java-based Kubernetes operator for Apache Spark. Following the
>>> operator pattern (
>>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
>>> users may manage applications and related components seamlessly using
>>> native tools like kubectl. The primary goal is to simplify the Spark user
>>> experience on Kubernetes, minimizing the learning curve and operational
>>> complexities and therefore enable users to focus on the Spark application
>>> development.
>>> Although there are several open-source Spark on Kubernetes operators
>>> available, none of them are officially integrated into the Apache Spark
>>> project. As a result, these operators may lack active support and
>>> development for new features. Within this proposal, our aim is to introduce
>>> a Java-based Spark operator as an integral component of the Apache Spark
>>> project. This solution has been employed internally at Apple for multiple
>>> years, operating millions of executors in real production environments. The
>>> use of Java in this solution is intended to accommodate a wider user and
>>> contributor audience, especially those who are familiar with Scala.
>>> Ideally, this operator should have its dedicated repository, similar to
>>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>>> connection with the Spark release cycle. This model is also followed by the
>>> Apache Flink Kubernetes operator.
>>> We believe that this project holds the potential to evolve into a
>>> thriving community project over the long run. A comparison can be drawn
>>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
>>> Kubernetes operator, making it a part of the Apache Flink project (
>>> https://github.com/apache/flink-kubernetes-operator). This move has
>>> gained wide industry adoption and contributions from the community. In a
>>> mere year, the Flink operator has garnered more than 600 stars and has
>>> attracted contributions from over 80 contributors. This showcases the level
>>> of community interest and collaborative momentum that can be achieved in
>>> similar scenarios.
>>> More details can be found at SPIP doc : Spark Kubernetes Operator
>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>>
>>> Thanks,
>>> --
>>> *Zhou JIANG*
>>>
>>>
>>>

-- 
Ryan Blue
Tabular


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Hussein Awala
+1 for creating an official Kubernetes operator for Apache Spark

On Fri, Nov 10, 2023 at 12:38 AM huaxin gao  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>
>> +1
>>
>> To be completely transparent, I am employed in the same department as
>> Zhou at Apple.
>>
>> I support this proposal, provided that we witness community adoption
>> following the release of the Flink Kubernetes operator, streamlining Flink
>> deployment on Kubernetes.
>>
>> A well-maintained official Spark Kubernetes operator is essential for our
>> Spark community as well.
>>
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>
>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>>
>> Hi Spark community,
>> I'm reaching out to initiate a conversation about the possibility of
>> developing a Java-based Kubernetes operator for Apache Spark. Following the
>> operator pattern (
>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
>> users may manage applications and related components seamlessly using
>> native tools like kubectl. The primary goal is to simplify the Spark user
>> experience on Kubernetes, minimizing the learning curve and operational
>> complexities and therefore enable users to focus on the Spark application
>> development.
>> Although there are several open-source Spark on Kubernetes operators
>> available, none of them are officially integrated into the Apache Spark
>> project. As a result, these operators may lack active support and
>> development for new features. Within this proposal, our aim is to introduce
>> a Java-based Spark operator as an integral component of the Apache Spark
>> project. This solution has been employed internally at Apple for multiple
>> years, operating millions of executors in real production environments. The
>> use of Java in this solution is intended to accommodate a wider user and
>> contributor audience, especially those who are familiar with Scala.
>> Ideally, this operator should have its dedicated repository, similar to
>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>> connection with the Spark release cycle. This model is also followed by the
>> Apache Flink Kubernetes operator.
>> We believe that this project holds the potential to evolve into a
>> thriving community project over the long run. A comparison can be drawn
>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
>> Kubernetes operator, making it a part of the Apache Flink project (
>> https://github.com/apache/flink-kubernetes-operator). This move has
>> gained wide industry adoption and contributions from the community. In a
>> mere year, the Flink operator has garnered more than 600 stars and has
>> attracted contributions from over 80 contributors. This showcases the level
>> of community interest and collaborative momentum that can be achieved in
>> similar scenarios.
>> More details can be found at SPIP doc : Spark Kubernetes Operator
>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>
>> Thanks,
>> --
>> *Zhou JIANG*
>>
>>
>>


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread huaxin gao
+1

On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:

> +1
>
> To be completely transparent, I am employed in the same department as Zhou
> at Apple.
>
> I support this proposal, provided that we witness community adoption
> following the release of the Flink Kubernetes operator, streamlining Flink
> deployment on Kubernetes.
>
> A well-maintained official Spark Kubernetes operator is essential for our
> Spark community as well.
>
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
> users may manage applications and related components seamlessly using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar to
> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by the
> Apache Flink Kubernetes operator.
> We believe that this project holds the potential to evolve into a thriving
> community project over the long run. A comparison can be drawn with the
> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes
> operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator). This move has
> gained wide industry adoption and contributions from the community. In a
> mere year, the Flink operator has garnered more than 600 stars and has
> attracted contributions from over 80 contributors. This showcases the level
> of community interest and collaborative momentum that can be achieved in
> similar scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>
> Thanks,
> --
> *Zhou JIANG*
>
>
>


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread DB Tsai
+1

To be completely transparent, I am employed in the same department as Zhou at 
Apple.

I support this proposal, provided that we witness community adoption following 
the release of the Flink Kubernetes operator, streamlining Flink deployment on 
Kubernetes. 

A well-maintained official Spark Kubernetes operator is essential for our Spark 
community as well.

DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1

> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
> 
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of 
> developing a Java-based Kubernetes operator for Apache Spark. Following the 
> operator pattern 
> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
> users may manage applications and related components seamlessly using native 
> tools like kubectl. The primary goal is to simplify the Spark user experience 
> on Kubernetes, minimizing the learning curve and operational complexities and 
> therefore enable users to focus on the Spark application development.
> Although there are several open-source Spark on Kubernetes operators 
> available, none of them are officially integrated into the Apache Spark 
> project. As a result, these operators may lack active support and development 
> for new features. Within this proposal, our aim is to introduce a Java-based 
> Spark operator as an integral component of the Apache Spark project. This 
> solution has been employed internally at Apple for multiple years, operating 
> millions of executors in real production environments. The use of Java in 
> this solution is intended to accommodate a wider user and contributor 
> audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar to Spark 
> Connect Golang or Spark Docker, allowing it to maintain a loose connection 
> with the Spark release cycle. This model is also followed by the Apache Flink 
> Kubernetes operator.
> We believe that this project holds the potential to evolve into a thriving 
> community project over the long run. A comparison can be drawn with the Flink 
> Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes 
> operator, making it a part of the Apache Flink project 
> (https://github.com/apache/flink-kubernetes-operator). This move has gained 
> wide industry adoption and contributions from the community. In a mere year, 
> the Flink operator has garnered more than 600 stars and has attracted 
> contributions from over 80 contributors. This showcases the level of 
> community interest and collaborative momentum that can be achieved in similar 
> scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator 
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> Thanks,
> 
> --
> Zhou JIANG
> 



[DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Zhou Jiang
Hi Spark community,

I'm reaching out to initiate a conversation about the possibility of
developing a Java-based Kubernetes operator for Apache Spark. Following the
operator pattern (
https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
users may manage applications and related components seamlessly using
native tools like kubectl. The primary goal is to simplify the Spark user
experience on Kubernetes, minimizing the learning curve and operational
complexities and therefore enable users to focus on the Spark application
development.

Although there are several open-source Spark on Kubernetes operators
available, none of them are officially integrated into the Apache Spark
project. As a result, these operators may lack active support and
development for new features. Within this proposal, our aim is to introduce
a Java-based Spark operator as an integral component of the Apache Spark
project. This solution has been employed internally at Apple for multiple
years, operating millions of executors in real production environments. The
use of Java in this solution is intended to accommodate a wider user and
contributor audience, especially those who are familiar with Scala.

Ideally, this operator should have its dedicated repository, similar to
Spark Connect Golang or Spark Docker, allowing it to maintain a loose
connection with the Spark release cycle. This model is also followed by the
Apache Flink Kubernetes operator.

We believe that this project holds the potential to evolve into a thriving
community project over the long run. A comparison can be drawn with the
Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes
operator, making it a part of the Apache Flink project (
https://github.com/apache/flink-kubernetes-operator). This move has gained
wide industry adoption and contributions from the community. In a mere
year, the Flink operator has garnered more than 600 stars and has attracted
contributions from over 80 contributors. This showcases the level of
community interest and collaborative momentum that can be achieved in
similar scenarios.

More details can be found at SPIP doc : Spark Kubernetes Operator
https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE

Thanks,
-- 
*Zhou JIANG*