Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Yuming Wang
+1

On Fri, Nov 10, 2023 at 10:01 AM Ilan Filonenko  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>
>> +1
>>
>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>>
>>> +1 for creating an official Kubernetes operator for Apache Spark
>>>
>>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>>> wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:

> +1
>
> To be completely transparent, I am employed in the same department as
> Zhou at Apple.
>
> I support this proposal, provided that we witness community adoption
> following the release of the Flink Kubernetes operator, streamlining Flink
> deployment on Kubernetes.
>
> A well-maintained official Spark Kubernetes operator is essential for
> our Spark community as well.
>
> DB Tsai  |  https://www.dbtsai.com/
> 
>  |  PGP 42E5B25A8F7A82C1
>
> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
> wrote:
>
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following 
> the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
> ),
> Spark users may manage applications and related components seamlessly 
> using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to 
> introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. 
> The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar
> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by 
> the
> Apache Flink Kubernetes operator.
> We believe that this project holds the potential to evolve into a
> thriving community project over the long run. A comparison can be drawn
> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
> Kubernetes operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator
> ).
> This move has gained wide industry adoption and contributions from the
> community. In a mere year, the Flink operator has garnered more than 600
> stars and has attracted contributions from over 80 contributors. This
> showcases the level of community interest and collaborative momentum that
> can be achieved in similar scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Cheng Pan
Thanks for this impressive proposal, I have a basic question, how does 
spark-submit work with this operator? Or it enforces that we must use `kubectl 
apply -f spark-job.yaml`(or K8s client in programming way) to submit Spark app?

Thanks,
Cheng Pan


> On Nov 10, 2023, at 04:05, Zhou Jiang  wrote:
> 
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of 
> developing a Java-based Kubernetes operator for Apache Spark. Following the 
> operator pattern 
> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
> users may manage applications and related components seamlessly using native 
> tools like kubectl. The primary goal is to simplify the Spark user experience 
> on Kubernetes, minimizing the learning curve and operational complexities and 
> therefore enable users to focus on the Spark application development.
> Although there are several open-source Spark on Kubernetes operators 
> available, none of them are officially integrated into the Apache Spark 
> project. As a result, these operators may lack active support and development 
> for new features. Within this proposal, our aim is to introduce a Java-based 
> Spark operator as an integral component of the Apache Spark project. This 
> solution has been employed internally at Apple for multiple years, operating 
> millions of executors in real production environments. The use of Java in 
> this solution is intended to accommodate a wider user and contributor 
> audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar to Spark 
> Connect Golang or Spark Docker, allowing it to maintain a loose connection 
> with the Spark release cycle. This model is also followed by the Apache Flink 
> Kubernetes operator.
> We believe that this project holds the potential to evolve into a thriving 
> community project over the long run. A comparison can be drawn with the Flink 
> Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes 
> operator, making it a part of the Apache Flink project 
> (https://github.com/apache/flink-kubernetes-operator). This move has gained 
> wide industry adoption and contributions from the community. In a mere year, 
> the Flink operator has garnered more than 600 stars and has attracted 
> contributions from over 80 contributors. This showcases the level of 
> community interest and collaborative momentum that can be achieved in similar 
> scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator 
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> Thanks,-- 
> Zhou JIANG
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread L. C. Hsieh
+1

On Thu, Nov 9, 2023 at 7:57 PM Chao Sun  wrote:
>
> +1
>
>
> On Thu, Nov 9, 2023 at 6:36 PM Xiao Li  wrote:
> >
> > +1
> >
> > huaxin gao  于2023年11月9日周四 16:53写道:
> >>
> >> +1
> >>
> >> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
> >>>
> >>> +1
> >>>
> >>> To be completely transparent, I am employed in the same department as 
> >>> Zhou at Apple.
> >>>
> >>> I support this proposal, provided that we witness community adoption 
> >>> following the release of the Flink Kubernetes operator, streamlining 
> >>> Flink deployment on Kubernetes.
> >>>
> >>> A well-maintained official Spark Kubernetes operator is essential for our 
> >>> Spark community as well.
> >>>
> >>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
> >>>
> >>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
> >>>
> >>> Hi Spark community,
> >>>
> >>> I'm reaching out to initiate a conversation about the possibility of 
> >>> developing a Java-based Kubernetes operator for Apache Spark. Following 
> >>> the operator pattern 
> >>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
> >>> users may manage applications and related components seamlessly using 
> >>> native tools like kubectl. The primary goal is to simplify the Spark user 
> >>> experience on Kubernetes, minimizing the learning curve and operational 
> >>> complexities and therefore enable users to focus on the Spark application 
> >>> development.
> >>> Although there are several open-source Spark on Kubernetes operators 
> >>> available, none of them are officially integrated into the Apache Spark 
> >>> project. As a result, these operators may lack active support and 
> >>> development for new features. Within this proposal, our aim is to 
> >>> introduce a Java-based Spark operator as an integral component of the 
> >>> Apache Spark project. This solution has been employed internally at Apple 
> >>> for multiple years, operating millions of executors in real production 
> >>> environments. The use of Java in this solution is intended to accommodate 
> >>> a wider user and contributor audience, especially those who are familiar 
> >>> with Scala.
> >>> Ideally, this operator should have its dedicated repository, similar to 
> >>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose 
> >>> connection with the Spark release cycle. This model is also followed by 
> >>> the Apache Flink Kubernetes operator.
> >>> We believe that this project holds the potential to evolve into a 
> >>> thriving community project over the long run. A comparison can be drawn 
> >>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink 
> >>> Kubernetes operator, making it a part of the Apache Flink project 
> >>> (https://github.com/apache/flink-kubernetes-operator). This move has 
> >>> gained wide industry adoption and contributions from the community. In a 
> >>> mere year, the Flink operator has garnered more than 600 stars and has 
> >>> attracted contributions from over 80 contributors. This showcases the 
> >>> level of community interest and collaborative momentum that can be 
> >>> achieved in similar scenarios.
> >>> More details can be found at SPIP doc : Spark Kubernetes Operator 
> >>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>> Zhou JIANG
> >>>
> >>>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Nan Zhu
just curious what happened on google’s spark operator?

On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>
>> +1
>>
>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>>
>>> +1 for creating an official Kubernetes operator for Apache Spark
>>>
>>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>>> wrote:
>>>
 +1

>>>
 On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:

> +1
>
> To be completely transparent, I am employed in the same department as
> Zhou at Apple.
>
> I support this proposal, provided that we witness community adoption
> following the release of the Flink Kubernetes operator, streamlining Flink
> deployment on Kubernetes.
>
> A well-maintained official Spark Kubernetes operator is essential for
> our Spark community as well.
>
> DB Tsai  |  https://www.dbtsai.com/
> 
>  |  PGP 42E5B25A8F7A82C1
>
> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
> wrote:
>
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following 
> the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
> ),
> Spark users may manage applications and related components seamlessly 
> using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to 
> introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. 
> The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar
> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by 
> the
> Apache Flink Kubernetes operator.
> We believe that this project holds the potential to evolve into a
> thriving community project over the long run. A comparison can be drawn
> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
> Kubernetes operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator
> ).
> This move has gained wide industry adoption and contributions from the
> community. In a mere year, the Flink operator has garnered more than 600
> stars and has attracted contributions from over 80 contributors. This
> showcases the level of community interest and collaborative momentum that
> can be achieved in similar scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Chao Sun
+1


On Thu, Nov 9, 2023 at 6:36 PM Xiao Li  wrote:
>
> +1
>
> huaxin gao  于2023年11月9日周四 16:53写道:
>>
>> +1
>>
>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>>
>>> +1
>>>
>>> To be completely transparent, I am employed in the same department as Zhou 
>>> at Apple.
>>>
>>> I support this proposal, provided that we witness community adoption 
>>> following the release of the Flink Kubernetes operator, streamlining Flink 
>>> deployment on Kubernetes.
>>>
>>> A well-maintained official Spark Kubernetes operator is essential for our 
>>> Spark community as well.
>>>
>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>>>
>>> Hi Spark community,
>>>
>>> I'm reaching out to initiate a conversation about the possibility of 
>>> developing a Java-based Kubernetes operator for Apache Spark. Following the 
>>> operator pattern 
>>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
>>> users may manage applications and related components seamlessly using 
>>> native tools like kubectl. The primary goal is to simplify the Spark user 
>>> experience on Kubernetes, minimizing the learning curve and operational 
>>> complexities and therefore enable users to focus on the Spark application 
>>> development.
>>> Although there are several open-source Spark on Kubernetes operators 
>>> available, none of them are officially integrated into the Apache Spark 
>>> project. As a result, these operators may lack active support and 
>>> development for new features. Within this proposal, our aim is to introduce 
>>> a Java-based Spark operator as an integral component of the Apache Spark 
>>> project. This solution has been employed internally at Apple for multiple 
>>> years, operating millions of executors in real production environments. The 
>>> use of Java in this solution is intended to accommodate a wider user and 
>>> contributor audience, especially those who are familiar with Scala.
>>> Ideally, this operator should have its dedicated repository, similar to 
>>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose 
>>> connection with the Spark release cycle. This model is also followed by the 
>>> Apache Flink Kubernetes operator.
>>> We believe that this project holds the potential to evolve into a thriving 
>>> community project over the long run. A comparison can be drawn with the 
>>> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes 
>>> operator, making it a part of the Apache Flink project 
>>> (https://github.com/apache/flink-kubernetes-operator). This move has gained 
>>> wide industry adoption and contributions from the community. In a mere 
>>> year, the Flink operator has garnered more than 600 stars and has attracted 
>>> contributions from over 80 contributors. This showcases the level of 
>>> community interest and collaborative momentum that can be achieved in 
>>> similar scenarios.
>>> More details can be found at SPIP doc : Spark Kubernetes Operator 
>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>>
>>> Thanks,
>>>
>>> --
>>> Zhou JIANG
>>>
>>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Xiao Li
+1

huaxin gao  于2023年11月9日周四 16:53写道:

> +1
>
> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>
>> +1
>>
>> To be completely transparent, I am employed in the same department as
>> Zhou at Apple.
>>
>> I support this proposal, provided that we witness community adoption
>> following the release of the Flink Kubernetes operator, streamlining Flink
>> deployment on Kubernetes.
>>
>> A well-maintained official Spark Kubernetes operator is essential for our
>> Spark community as well.
>>
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>
>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>>
>> Hi Spark community,
>> I'm reaching out to initiate a conversation about the possibility of
>> developing a Java-based Kubernetes operator for Apache Spark. Following the
>> operator pattern (
>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
>> users may manage applications and related components seamlessly using
>> native tools like kubectl. The primary goal is to simplify the Spark user
>> experience on Kubernetes, minimizing the learning curve and operational
>> complexities and therefore enable users to focus on the Spark application
>> development.
>> Although there are several open-source Spark on Kubernetes operators
>> available, none of them are officially integrated into the Apache Spark
>> project. As a result, these operators may lack active support and
>> development for new features. Within this proposal, our aim is to introduce
>> a Java-based Spark operator as an integral component of the Apache Spark
>> project. This solution has been employed internally at Apple for multiple
>> years, operating millions of executors in real production environments. The
>> use of Java in this solution is intended to accommodate a wider user and
>> contributor audience, especially those who are familiar with Scala.
>> Ideally, this operator should have its dedicated repository, similar to
>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>> connection with the Spark release cycle. This model is also followed by the
>> Apache Flink Kubernetes operator.
>> We believe that this project holds the potential to evolve into a
>> thriving community project over the long run. A comparison can be drawn
>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
>> Kubernetes operator, making it a part of the Apache Flink project (
>> https://github.com/apache/flink-kubernetes-operator). This move has
>> gained wide industry adoption and contributions from the community. In a
>> mere year, the Flink operator has garnered more than 600 stars and has
>> attracted contributions from over 80 contributors. This showcases the level
>> of community interest and collaborative momentum that can be achieved in
>> similar scenarios.
>> More details can be found at SPIP doc : Spark Kubernetes Operator
>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>
>> Thanks,
>> --
>> *Zhou JIANG*
>>
>>
>>


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Ilan Filonenko
+1

On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>
>> +1 for creating an official Kubernetes operator for Apache Spark
>>
>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>> wrote:
>>
>>> +1
>>>
>>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>>
 +1

 To be completely transparent, I am employed in the same department as
 Zhou at Apple.

 I support this proposal, provided that we witness community adoption
 following the release of the Flink Kubernetes operator, streamlining Flink
 deployment on Kubernetes.

 A well-maintained official Spark Kubernetes operator is essential for
 our Spark community as well.

 DB Tsai  |  https://www.dbtsai.com/
 
  |  PGP 42E5B25A8F7A82C1

 On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:

 Hi Spark community,
 I'm reaching out to initiate a conversation about the possibility of
 developing a Java-based Kubernetes operator for Apache Spark. Following the
 operator pattern (
 https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
 ),
 Spark users may manage applications and related components seamlessly using
 native tools like kubectl. The primary goal is to simplify the Spark user
 experience on Kubernetes, minimizing the learning curve and operational
 complexities and therefore enable users to focus on the Spark application
 development.
 Although there are several open-source Spark on Kubernetes operators
 available, none of them are officially integrated into the Apache Spark
 project. As a result, these operators may lack active support and
 development for new features. Within this proposal, our aim is to introduce
 a Java-based Spark operator as an integral component of the Apache Spark
 project. This solution has been employed internally at Apple for multiple
 years, operating millions of executors in real production environments. The
 use of Java in this solution is intended to accommodate a wider user and
 contributor audience, especially those who are familiar with Scala.
 Ideally, this operator should have its dedicated repository, similar to
 Spark Connect Golang or Spark Docker, allowing it to maintain a loose
 connection with the Spark release cycle. This model is also followed by the
 Apache Flink Kubernetes operator.
 We believe that this project holds the potential to evolve into a
 thriving community project over the long run. A comparison can be drawn
 with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
 Kubernetes operator, making it a part of the Apache Flink project (
 https://github.com/apache/flink-kubernetes-operator
 ).
 This move has gained wide industry adoption and contributions from the
 community. In a mere year, the Flink operator has garnered more than 600
 stars and has attracted contributions from over 80 contributors. This
 showcases the level of community interest and collaborative momentum that
 can be achieved in similar scenarios.
 More details can be found at SPIP doc : Spark Kubernetes Operator
 https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
 

 Thanks,
 --
 *Zhou 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Ryan Blue
+1

On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:

> +1 for creating an official Kubernetes operator for Apache Spark
>
> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
> wrote:
>
>> +1
>>
>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>
>>> +1
>>>
>>> To be completely transparent, I am employed in the same department as
>>> Zhou at Apple.
>>>
>>> I support this proposal, provided that we witness community adoption
>>> following the release of the Flink Kubernetes operator, streamlining Flink
>>> deployment on Kubernetes.
>>>
>>> A well-maintained official Spark Kubernetes operator is essential for
>>> our Spark community as well.
>>>
>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>>>
>>> Hi Spark community,
>>> I'm reaching out to initiate a conversation about the possibility of
>>> developing a Java-based Kubernetes operator for Apache Spark. Following the
>>> operator pattern (
>>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
>>> users may manage applications and related components seamlessly using
>>> native tools like kubectl. The primary goal is to simplify the Spark user
>>> experience on Kubernetes, minimizing the learning curve and operational
>>> complexities and therefore enable users to focus on the Spark application
>>> development.
>>> Although there are several open-source Spark on Kubernetes operators
>>> available, none of them are officially integrated into the Apache Spark
>>> project. As a result, these operators may lack active support and
>>> development for new features. Within this proposal, our aim is to introduce
>>> a Java-based Spark operator as an integral component of the Apache Spark
>>> project. This solution has been employed internally at Apple for multiple
>>> years, operating millions of executors in real production environments. The
>>> use of Java in this solution is intended to accommodate a wider user and
>>> contributor audience, especially those who are familiar with Scala.
>>> Ideally, this operator should have its dedicated repository, similar to
>>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>>> connection with the Spark release cycle. This model is also followed by the
>>> Apache Flink Kubernetes operator.
>>> We believe that this project holds the potential to evolve into a
>>> thriving community project over the long run. A comparison can be drawn
>>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
>>> Kubernetes operator, making it a part of the Apache Flink project (
>>> https://github.com/apache/flink-kubernetes-operator). This move has
>>> gained wide industry adoption and contributions from the community. In a
>>> mere year, the Flink operator has garnered more than 600 stars and has
>>> attracted contributions from over 80 contributors. This showcases the level
>>> of community interest and collaborative momentum that can be achieved in
>>> similar scenarios.
>>> More details can be found at SPIP doc : Spark Kubernetes Operator
>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>>
>>> Thanks,
>>> --
>>> *Zhou JIANG*
>>>
>>>
>>>

-- 
Ryan Blue
Tabular


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Hussein Awala
+1 for creating an official Kubernetes operator for Apache Spark

On Fri, Nov 10, 2023 at 12:38 AM huaxin gao  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>
>> +1
>>
>> To be completely transparent, I am employed in the same department as
>> Zhou at Apple.
>>
>> I support this proposal, provided that we witness community adoption
>> following the release of the Flink Kubernetes operator, streamlining Flink
>> deployment on Kubernetes.
>>
>> A well-maintained official Spark Kubernetes operator is essential for our
>> Spark community as well.
>>
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>
>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>>
>> Hi Spark community,
>> I'm reaching out to initiate a conversation about the possibility of
>> developing a Java-based Kubernetes operator for Apache Spark. Following the
>> operator pattern (
>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
>> users may manage applications and related components seamlessly using
>> native tools like kubectl. The primary goal is to simplify the Spark user
>> experience on Kubernetes, minimizing the learning curve and operational
>> complexities and therefore enable users to focus on the Spark application
>> development.
>> Although there are several open-source Spark on Kubernetes operators
>> available, none of them are officially integrated into the Apache Spark
>> project. As a result, these operators may lack active support and
>> development for new features. Within this proposal, our aim is to introduce
>> a Java-based Spark operator as an integral component of the Apache Spark
>> project. This solution has been employed internally at Apple for multiple
>> years, operating millions of executors in real production environments. The
>> use of Java in this solution is intended to accommodate a wider user and
>> contributor audience, especially those who are familiar with Scala.
>> Ideally, this operator should have its dedicated repository, similar to
>> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>> connection with the Spark release cycle. This model is also followed by the
>> Apache Flink Kubernetes operator.
>> We believe that this project holds the potential to evolve into a
>> thriving community project over the long run. A comparison can be drawn
>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
>> Kubernetes operator, making it a part of the Apache Flink project (
>> https://github.com/apache/flink-kubernetes-operator). This move has
>> gained wide industry adoption and contributions from the community. In a
>> mere year, the Flink operator has garnered more than 600 stars and has
>> attracted contributions from over 80 contributors. This showcases the level
>> of community interest and collaborative momentum that can be achieved in
>> similar scenarios.
>> More details can be found at SPIP doc : Spark Kubernetes Operator
>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>
>> Thanks,
>> --
>> *Zhou JIANG*
>>
>>
>>


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread huaxin gao
+1

On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:

> +1
>
> To be completely transparent, I am employed in the same department as Zhou
> at Apple.
>
> I support this proposal, provided that we witness community adoption
> following the release of the Flink Kubernetes operator, streamlining Flink
> deployment on Kubernetes.
>
> A well-maintained official Spark Kubernetes operator is essential for our
> Spark community as well.
>
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
>
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
> users may manage applications and related components seamlessly using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar to
> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by the
> Apache Flink Kubernetes operator.
> We believe that this project holds the potential to evolve into a thriving
> community project over the long run. A comparison can be drawn with the
> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes
> operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator). This move has
> gained wide industry adoption and contributions from the community. In a
> mere year, the Flink operator has garnered more than 600 stars and has
> attracted contributions from over 80 contributors. This showcases the level
> of community interest and collaborative momentum that can be achieved in
> similar scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>
> Thanks,
> --
> *Zhou JIANG*
>
>
>


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread DB Tsai
+1

To be completely transparent, I am employed in the same department as Zhou at 
Apple.

I support this proposal, provided that we witness community adoption following 
the release of the Flink Kubernetes operator, streamlining Flink deployment on 
Kubernetes. 

A well-maintained official Spark Kubernetes operator is essential for our Spark 
community as well.

DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1

> On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:
> 
> Hi Spark community,
> I'm reaching out to initiate a conversation about the possibility of 
> developing a Java-based Kubernetes operator for Apache Spark. Following the 
> operator pattern 
> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
> users may manage applications and related components seamlessly using native 
> tools like kubectl. The primary goal is to simplify the Spark user experience 
> on Kubernetes, minimizing the learning curve and operational complexities and 
> therefore enable users to focus on the Spark application development.
> Although there are several open-source Spark on Kubernetes operators 
> available, none of them are officially integrated into the Apache Spark 
> project. As a result, these operators may lack active support and development 
> for new features. Within this proposal, our aim is to introduce a Java-based 
> Spark operator as an integral component of the Apache Spark project. This 
> solution has been employed internally at Apple for multiple years, operating 
> millions of executors in real production environments. The use of Java in 
> this solution is intended to accommodate a wider user and contributor 
> audience, especially those who are familiar with Scala.
> Ideally, this operator should have its dedicated repository, similar to Spark 
> Connect Golang or Spark Docker, allowing it to maintain a loose connection 
> with the Spark release cycle. This model is also followed by the Apache Flink 
> Kubernetes operator.
> We believe that this project holds the potential to evolve into a thriving 
> community project over the long run. A comparison can be drawn with the Flink 
> Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes 
> operator, making it a part of the Apache Flink project 
> (https://github.com/apache/flink-kubernetes-operator). This move has gained 
> wide industry adoption and contributions from the community. In a mere year, 
> the Flink operator has garnered more than 600 stars and has attracted 
> contributions from over 80 contributors. This showcases the level of 
> community interest and collaborative momentum that can be achieved in similar 
> scenarios.
> More details can be found at SPIP doc : Spark Kubernetes Operator 
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> Thanks,
> 
> --
> Zhou JIANG
> 



Re: ASF board report draft for Nov 2023

2023-11-09 Thread Matei Zaharia
Alright, done and posted.

> On Nov 6, 2023, at 10:55 AM, Dongjoon Hyun  wrote:
> 
> Thank you, Matei.
> 
> It would be great if we can include upcoming plans briefly.
> 
> - Apache Spark 3.4.2 
> (https://lists.apache.org/thread/35o2169l5r05k2mknqjy9mztq3ty1btr)
> - Apache Spark 3.3.4 EOL (December 16th)
> 
> Dongjoon.
> 
> On 2023/11/06 05:32:11 Matei Zaharia wrote:
>> It’s time to send our project’s quarterly report to the ASF board on 
>> Wednesday November 8th. Here’s what I wrote as a draft; let me know any 
>> suggested changes.
>> 
>> =
>> 
>> Issues for the board:
>> 
>> - None
>> 
>> Project status:
>> 
>> - We released Apache Spark 3.5 on September 15, a feature release with over 
>> 1300 patches. This release introduced more scenarios with general 
>> availability for Spark Connect, like Scala and Go client, distributed 
>> training and inference support, and enhancement of compatibility for 
>> Structured streaming. It also introduced new PySpark and SQL functionality, 
>> including the SQL IDENTIFIER clause, named argument support for SQL function 
>> calls, SQL function support for HyperLogLog approximate aggregations, and 
>> Python user-defined table functions; simplified distributed training with 
>> DeepSpeed; introduced watermark propagation among operators; and added the 
>> dropDuplicatesWithinWatermark operation in Structured Streaming.
>> - We made a patch release, Spark 3.3.3, on August 21, 2023.
>> - Apache Spark 4.0.0-SNAPSHOT is now ready for Java 21. [SPARK-43831]
>> - The vote on "Updating documentation hosted for EOL and maintenance 
>> releases" has passed.
>> - The vote on the Spark Project Improvement Proposals (SPIPs) for "State 
>> Data Source - Reader" has passed.
>> - The PMC has voted to add two new PMC members, Yuanjian Li and Yikun Jiang, 
>> and one new committer, Jiaan Geng, to the project.
>> 
>> Trademarks:
>> 
>> - No changes since the last report.
>> 
>> Latest releases:
>> 
>> - Spark 3.5.0 was released on September 13, 2023
>> - Spark 3.3.3 was released on August 21, 2023
>> - Spark 3.4.1 was released on June 23, 2023
>> 
>> Committers and PMC:
>> 
>> - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng).
>> - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and Yikun 
>> Jiang).
>> 
>> =
>> 
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> 
>> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Zhou Jiang
Hi Spark community,

I'm reaching out to initiate a conversation about the possibility of
developing a Java-based Kubernetes operator for Apache Spark. Following the
operator pattern (
https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
users may manage applications and related components seamlessly using
native tools like kubectl. The primary goal is to simplify the Spark user
experience on Kubernetes, minimizing the learning curve and operational
complexities and therefore enable users to focus on the Spark application
development.

Although there are several open-source Spark on Kubernetes operators
available, none of them are officially integrated into the Apache Spark
project. As a result, these operators may lack active support and
development for new features. Within this proposal, our aim is to introduce
a Java-based Spark operator as an integral component of the Apache Spark
project. This solution has been employed internally at Apple for multiple
years, operating millions of executors in real production environments. The
use of Java in this solution is intended to accommodate a wider user and
contributor audience, especially those who are familiar with Scala.

Ideally, this operator should have its dedicated repository, similar to
Spark Connect Golang or Spark Docker, allowing it to maintain a loose
connection with the Spark release cycle. This model is also followed by the
Apache Flink Kubernetes operator.

We believe that this project holds the potential to evolve into a thriving
community project over the long run. A comparison can be drawn with the
Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes
operator, making it a part of the Apache Flink project (
https://github.com/apache/flink-kubernetes-operator). This move has gained
wide industry adoption and contributions from the community. In a mere
year, the Flink operator has garnered more than 600 stars and has attracted
contributions from over 80 contributors. This showcases the level of
community interest and collaborative momentum that can be achieved in
similar scenarios.

More details can be found at SPIP doc : Spark Kubernetes Operator
https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE

Thanks,
-- 
*Zhou JIANG*


Re: Apache Spark 3.4.2 (?)

2023-11-09 Thread Maxim Gekk
+1

On Wed, Nov 8, 2023 at 5:29 AM kazuyuki tanimura
 wrote:

> +1
>
> Kazu
>
> On Nov 7, 2023, at 5:23 PM, L. C. Hsieh  wrote:
>
> +1
>
> On Tue, Nov 7, 2023 at 4:56 PM Dongjoon Hyun 
> wrote:
>
>
> Thank you all!
>
> Dongjoon
>
> On Mon, Nov 6, 2023 at 6:03 PM Holden Karau  wrote:
>
>
> +1
>
> On Mon, Nov 6, 2023 at 4:30 PM yangjie01 
> wrote:
>
>
> +1
>
>
>
> 发件人: Yuming Wang 
> 日期: 2023年11月7日 星期二 07:00
> 收件人: Santosh Pingale 
> 抄送: Dongjoon Hyun , dev 
> 主题: Re: Apache Spark 3.4.2 (?)
>
>
>
> +1
>
>
>
> On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale
>  wrote:
>
> Makes sense given the nature of those commits.
>
>
>
> On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun 
> wrote:
>
> Hi, All.
>
> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103
> commits including important security and correctness patches like
> SPARK-44251, SPARK-44805, and SPARK-44940.
>
>https://github.com/apache/spark/releases/tag/v3.4.1
>
>$ git log --oneline v3.4.1..HEAD | wc -l
>103
>
>SPARK-44251 Potential for incorrect results or NPE when full outer
> USING join has null key value
>SPARK-44805 Data lost after union using
> spark.sql.parquet.enableNestedColumnVectorizedReader=true
>SPARK-44940 Improve performance of JSON parsing when
> "spark.sql.json.enablePartialResults" is enabled
>
> Currently, I'm checking the following open correctness issues. I'd like to
> propose to release Apache Spark 3.4.2 after resolving them and volunteer as
> the release manager for Apache Spark 3.4.2. If there are no additional
> blockers, the first tentative RC1 vote date is November 13rd (Monday). If
> it takes some time to resolve the open correctness issues, we can start the
> vote after Thanksgiving holiday.
>
>SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column
>SPARK-45282 Join loses records for cached datasets
>
> WDTY?
>
> Dongjoon.
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>