Re: ASF board report draft for May

2024-05-06 Thread Matei Zaharia
I’ll mention that we’re working toward a preview release, even if the details 
are not finalized by the time we sent the report.

> On May 6, 2024, at 10:52 AM, Holden Karau  wrote:
> 
> I trust Wenchen to manage the preview release effectively but if there are 
> concerns around how to manage a developer preview release lets split that off 
> from the board report discussion.
> 
> On Mon, May 6, 2024 at 10:44 AM Mich Talebzadeh  > wrote:
>> I did some historical digging on this.
>> 
>> Whilst both preview release and RCs are pre-release versions, the main 
>> difference lies in their maturity and readiness for production use. Preview 
>> releases are early versions aimed at gathering feedback, while release 
>> candidates (RCs) are nearly finished versions that undergo final testing and 
>> voting before the official release.
>> 
>> So in our case, we have two options:
>> 
>> Skip mentioning of the Preview and focus on "We are intending to gather 
>> feedback on version 4 by releasing an earlier version to the community for 
>> look and feel feedback, especially focused on APIs
>> Mention Preview in the form. "There will be a Preview release with the aim 
>> of gathering feedback from the community focused on APIs"
>> IMO Preview release does not require a formal vote. Preview releases are 
>> often considered experimental or pre-alpha versions and are not expected to 
>> meet the same level of stability and completeness as release candidates or 
>> final releases.
>> 
>> HTH
>> 
>> Mich Talebzadeh,
>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>> 
>> London
>> United Kingdom
>> 
>>view my Linkedin profile 
>> 
>> 
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> 
>>  
>> Disclaimer: The information provided is correct to the best of my knowledge 
>> but of course cannot be guaranteed . It is essential to note that, as with 
>> any advice, quote "one test result is worth one-thousand expert opinions 
>> (Werner  Von Braun 
>> )".
>> 
>> 
>> On Mon, 6 May 2024 at 14:10, Mich Talebzadeh > > wrote:
>>> @Wenchen Fan  
>>> 
>>> Thanks for the update! To clarify, is the vote for approving a specific 
>>> preview build, or is it for moving towards an RC stage? I gather there is a 
>>> distinction between these two?
>>> 
>>> 
>>> Mich Talebzadeh,
>>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>>> 
>>> London
>>> United Kingdom
>>> 
>>>view my Linkedin profile 
>>> 
>>> 
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>> 
>>>  
>>> Disclaimer: The information provided is correct to the best of my knowledge 
>>> but of course cannot be guaranteed . It is essential to note that, as with 
>>> any advice, quote "one test result is worth one-thousand expert opinions 
>>> (Werner  Von Braun 
>>> )".
>>> 
>>> 
>>> On Mon, 6 May 2024 at 13:03, Wenchen Fan >> > wrote:
 The preview release also needs a vote. I'll try my best to cut the RC on 
 Monday, but the actual release may take some time. Hopefully, we can get 
 it out this week but if the vote fails, it will take longer as we need 
 more RCs.
 
 On Mon, May 6, 2024 at 7:22 AM Dongjoon Hyun >>> > wrote:
> +1 for Holden's comment. Yes, it would be great to mention `it` as 
> "soon". 
> (If Wenchen release it on Monday, we can simply mention the release)
> 
> In addition, Apache Spark PMC received an official notice from ASF Infra 
> team.
> 
> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg
> > [NOTICE] Apache Spark's GitHub Actions usage exceeds allowances for ASF 
> > projects
> 
> To track and comply with the new ASF Infra Policy as much as possible, we 
> opened a blocker-level JIRA issue and have been working on it.
> - https://infra.apache.org/github-actions-policy.html
> 
> Please include a sentence that Apache Spark PMC is working on under the 
> following umbrella JIRA issue.
> 
> https://issues.apache.org/jira/browse/SPARK-48094
> > Reduce GitHub Action usage according to ASF project allowance
> 
> Thanks,
> Dongjoon.
> 
> 
> On Sun, May 5, 2024 at 3:45 PM Holden Karau  > wrote:
>> Do we want to include that we’re planning on having a preview release of 
>> Spark 4 so folks can see the APIs “soon”?
>> 
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.): 
>> https://amzn.to/2MaRAG9  

Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
I trust Wenchen to manage the preview release effectively but if there are
concerns around how to manage a developer preview release lets split that
off from the board report discussion.

On Mon, May 6, 2024 at 10:44 AM Mich Talebzadeh 
wrote:

> I did some historical digging on this.
>
> Whilst both preview release and RCs are pre-release versions, the main
> difference lies in their maturity and readiness for production use. Preview
> releases are early versions aimed at gathering feedback, while release
> candidates (RCs) are nearly finished versions that undergo final testing
> and voting before the official release.
>
> So in our case, we have two options:
>
>
>1. Skip mentioning of the Preview and focus on "We are intending to
>gather feedback on version 4 by releasing an earlier version to the
>community for look and feel feedback, especially focused on APIs
>2. Mention Preview in the form. "There will be a Preview release with
>the aim of gathering feedback from the community focused on APIs"
>
> IMO Preview release does not require a formal vote. Preview releases are
> often considered experimental or pre-alpha versions and are not expected to
> meet the same level of stability and completeness as release candidates or
> final releases.
>
> HTH
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Mon, 6 May 2024 at 14:10, Mich Talebzadeh 
> wrote:
>
>> @Wenchen Fan 
>>
>> Thanks for the update! To clarify, is the vote for approving a specific
>> preview build, or is it for moving towards an RC stage? I gather there is a
>> distinction between these two?
>>
>>
>> Mich Talebzadeh,
>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> Von Braun
>> )".
>>
>>
>> On Mon, 6 May 2024 at 13:03, Wenchen Fan  wrote:
>>
>>> The preview release also needs a vote. I'll try my best to cut the RC on
>>> Monday, but the actual release may take some time. Hopefully, we can get it
>>> out this week but if the vote fails, it will take longer as we need more
>>> RCs.
>>>
>>> On Mon, May 6, 2024 at 7:22 AM Dongjoon Hyun 
>>> wrote:
>>>
 +1 for Holden's comment. Yes, it would be great to mention `it` as
 "soon".
 (If Wenchen release it on Monday, we can simply mention the release)

 In addition, Apache Spark PMC received an official notice from ASF
 Infra team.

 https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg
 > [NOTICE] Apache Spark's GitHub Actions usage exceeds allowances for
 ASF projects

 To track and comply with the new ASF Infra Policy as much as possible,
 we opened a blocker-level JIRA issue and have been working on it.
 - https://infra.apache.org/github-actions-policy.html

 Please include a sentence that Apache Spark PMC is working on under the
 following umbrella JIRA issue.

 https://issues.apache.org/jira/browse/SPARK-48094
 > Reduce GitHub Action usage according to ASF project allowance

 Thanks,
 Dongjoon.


 On Sun, May 5, 2024 at 3:45 PM Holden Karau 
 wrote:

> Do we want to include that we’re planning on having a preview release
> of Spark 4 so folks can see the APIs “soon”?
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Sun, May 5, 2024 at 3:24 PM Matei Zaharia 
> wrote:
>
>> It’s time for our quarterly ASF board report on Apache Spark this
>> Wednesday. Here’s a draft, feel free to suggest changes.
>>
>> 
>>
>> Description:
>>
>> Apache Spark is a fast and general purpose engine for large-scale
>> data processing. It offers high-level APIs in Java, 

Re: ASF board report draft for May

2024-05-06 Thread Mich Talebzadeh
I did some historical digging on this.

Whilst both preview release and RCs are pre-release versions, the main
difference lies in their maturity and readiness for production use. Preview
releases are early versions aimed at gathering feedback, while release
candidates (RCs) are nearly finished versions that undergo final testing
and voting before the official release.

So in our case, we have two options:


   1. Skip mentioning of the Preview and focus on "We are intending to
   gather feedback on version 4 by releasing an earlier version to the
   community for look and feel feedback, especially focused on APIs
   2. Mention Preview in the form. "There will be a Preview release with
   the aim of gathering feedback from the community focused on APIs"

IMO Preview release does not require a formal vote. Preview releases are
often considered experimental or pre-alpha versions and are not expected to
meet the same level of stability and completeness as release candidates or
final releases.

HTH

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Mon, 6 May 2024 at 14:10, Mich Talebzadeh 
wrote:

> @Wenchen Fan 
>
> Thanks for the update! To clarify, is the vote for approving a specific
> preview build, or is it for moving towards an RC stage? I gather there is a
> distinction between these two?
>
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Mon, 6 May 2024 at 13:03, Wenchen Fan  wrote:
>
>> The preview release also needs a vote. I'll try my best to cut the RC on
>> Monday, but the actual release may take some time. Hopefully, we can get it
>> out this week but if the vote fails, it will take longer as we need more
>> RCs.
>>
>> On Mon, May 6, 2024 at 7:22 AM Dongjoon Hyun 
>> wrote:
>>
>>> +1 for Holden's comment. Yes, it would be great to mention `it` as
>>> "soon".
>>> (If Wenchen release it on Monday, we can simply mention the release)
>>>
>>> In addition, Apache Spark PMC received an official notice from ASF Infra
>>> team.
>>>
>>> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg
>>> > [NOTICE] Apache Spark's GitHub Actions usage exceeds allowances for
>>> ASF projects
>>>
>>> To track and comply with the new ASF Infra Policy as much as possible,
>>> we opened a blocker-level JIRA issue and have been working on it.
>>> - https://infra.apache.org/github-actions-policy.html
>>>
>>> Please include a sentence that Apache Spark PMC is working on under the
>>> following umbrella JIRA issue.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-48094
>>> > Reduce GitHub Action usage according to ASF project allowance
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>>
>>> On Sun, May 5, 2024 at 3:45 PM Holden Karau 
>>> wrote:
>>>
 Do we want to include that we’re planning on having a preview release
 of Spark 4 so folks can see the APIs “soon”?

 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau


 On Sun, May 5, 2024 at 3:24 PM Matei Zaharia 
 wrote:

> It’s time for our quarterly ASF board report on Apache Spark this
> Wednesday. Here’s a draft, feel free to suggest changes.
>
> 
>
> Description:
>
> Apache Spark is a fast and general purpose engine for large-scale data
> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
> well as a rich set of libraries including stream processing, machine
> learning, and graph analytics.
>
> Issues for the board:
>
> - None
>
> Project status:
>
> - We made two patch releases: Spark 3.5.1 on February 28, 2024, and
> Spark 3.4.2 on April 18, 2024.
> - The votes on "SPIP: Structured Logging Framework for Apache Spark"

Re: ASF board report draft for May

2024-05-06 Thread Mich Talebzadeh
@Wenchen Fan 

Thanks for the update! To clarify, is the vote for approving a specific
preview build, or is it for moving towards an RC stage? I gather there is a
distinction between these two?


Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Mon, 6 May 2024 at 13:03, Wenchen Fan  wrote:

> The preview release also needs a vote. I'll try my best to cut the RC on
> Monday, but the actual release may take some time. Hopefully, we can get it
> out this week but if the vote fails, it will take longer as we need more
> RCs.
>
> On Mon, May 6, 2024 at 7:22 AM Dongjoon Hyun 
> wrote:
>
>> +1 for Holden's comment. Yes, it would be great to mention `it` as
>> "soon".
>> (If Wenchen release it on Monday, we can simply mention the release)
>>
>> In addition, Apache Spark PMC received an official notice from ASF Infra
>> team.
>>
>> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg
>> > [NOTICE] Apache Spark's GitHub Actions usage exceeds allowances for ASF
>> projects
>>
>> To track and comply with the new ASF Infra Policy as much as possible, we
>> opened a blocker-level JIRA issue and have been working on it.
>> - https://infra.apache.org/github-actions-policy.html
>>
>> Please include a sentence that Apache Spark PMC is working on under the
>> following umbrella JIRA issue.
>>
>> https://issues.apache.org/jira/browse/SPARK-48094
>> > Reduce GitHub Action usage according to ASF project allowance
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Sun, May 5, 2024 at 3:45 PM Holden Karau 
>> wrote:
>>
>>> Do we want to include that we’re planning on having a preview release of
>>> Spark 4 so folks can see the APIs “soon”?
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Sun, May 5, 2024 at 3:24 PM Matei Zaharia 
>>> wrote:
>>>
 It’s time for our quarterly ASF board report on Apache Spark this
 Wednesday. Here’s a draft, feel free to suggest changes.

 

 Description:

 Apache Spark is a fast and general purpose engine for large-scale data
 processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
 well as a rich set of libraries including stream processing, machine
 learning, and graph analytics.

 Issues for the board:

 - None

 Project status:

 - We made two patch releases: Spark 3.5.1 on February 28, 2024, and
 Spark 3.4.2 on April 18, 2024.
 - The votes on "SPIP: Structured Logging Framework for Apache Spark"
 and "Pure Python Package in PyPI (Spark Connect)" have passed.
 - The votes for two behavior changes have passed: "SPARK-4: Use
 ANSI SQL mode by default" and "SPARK-46122: Set
 spark.sql.legacy.createHiveTableByDefault to false".
 - The community decided that upcoming Spark 4.0 release will drop
 support for Python 3.8.
 - We started a discussion about the definition of behavior changes that
 is critical for version upgrades and user experience.
 - We've opened a dedicated repository for the Spark Kubernetes Operator
 at https://github.com/apache/spark-kubernetes-operator. We added a new
 version in Apache Spark JIRA for versioning of the Spark operator based on
 a vote result.

 Trademarks:

 - No changes since the last report.

 Latest releases:
 - Spark 3.4.3 was released on April 18, 2024
 - Spark 3.5.1 was released on February 28, 2024
 - Spark 3.3.4 was released on December 16, 2023

 Committers and PMC:

 - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng).
 - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and
 Yikun Jiang).

 
 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
If folks are against the term soon we could say “in-progress”

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Mon, May 6, 2024 at 2:08 AM Mich Talebzadeh 
wrote:

> Hi,
>
> We should reconsider using the term "soon" for ASF board as it is
> subjective with no date (assuming this is an official communication on
> Wednesday). We ought to say
>
>  "Spark 4, the next major release after Spark 3.x, is currently under
> development. We plan to make a preview version available for evaluation as
> soon as it is feasible"
>
> HTH
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Mon, 6 May 2024 at 05:09, Dongjoon Hyun 
> wrote:
>
>> +1 for Holden's comment. Yes, it would be great to mention `it` as
>> "soon".
>> (If Wenchen release it on Monday, we can simply mention the release)
>>
>> In addition, Apache Spark PMC received an official notice from ASF Infra
>> team.
>>
>> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg
>> > [NOTICE] Apache Spark's GitHub Actions usage exceeds allowances for ASF
>> projects
>>
>> To track and comply with the new ASF Infra Policy as much as possible, we
>> opened a blocker-level JIRA issue and have been working on it.
>> - https://infra.apache.org/github-actions-policy.html
>>
>> Please include a sentence that Apache Spark PMC is working on under the
>> following umbrella JIRA issue.
>>
>> https://issues.apache.org/jira/browse/SPARK-48094
>> > Reduce GitHub Action usage according to ASF project allowance
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Sun, May 5, 2024 at 3:45 PM Holden Karau 
>> wrote:
>>
>>> Do we want to include that we’re planning on having a preview release of
>>> Spark 4 so folks can see the APIs “soon”?
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Sun, May 5, 2024 at 3:24 PM Matei Zaharia 
>>> wrote:
>>>
 It’s time for our quarterly ASF board report on Apache Spark this
 Wednesday. Here’s a draft, feel free to suggest changes.

 

 Description:

 Apache Spark is a fast and general purpose engine for large-scale data
 processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
 well as a rich set of libraries including stream processing, machine
 learning, and graph analytics.

 Issues for the board:

 - None

 Project status:

 - We made two patch releases: Spark 3.5.1 on February 28, 2024, and
 Spark 3.4.2 on April 18, 2024.
 - The votes on "SPIP: Structured Logging Framework for Apache Spark"
 and "Pure Python Package in PyPI (Spark Connect)" have passed.
 - The votes for two behavior changes have passed: "SPARK-4: Use
 ANSI SQL mode by default" and "SPARK-46122: Set
 spark.sql.legacy.createHiveTableByDefault to false".
 - The community decided that upcoming Spark 4.0 release will drop
 support for Python 3.8.
 - We started a discussion about the definition of behavior changes that
 is critical for version upgrades and user experience.
 - We've opened a dedicated repository for the Spark Kubernetes Operator
 at https://github.com/apache/spark-kubernetes-operator. We added a new
 version in Apache Spark JIRA for versioning of the Spark operator based on
 a vote result.

 Trademarks:

 - No changes since the last report.

 Latest releases:
 - Spark 3.4.3 was released on April 18, 2024
 - Spark 3.5.1 was released on February 28, 2024
 - Spark 3.3.4 was released on December 16, 2023

 Committers and PMC:

 - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng).
 - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and
 Yikun Jiang).

 
 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: ASF board report draft for May

2024-05-06 Thread Mich Talebzadeh
Hi,

We should reconsider using the term "soon" for ASF board as it is
subjective with no date (assuming this is an official communication on
Wednesday). We ought to say

 "Spark 4, the next major release after Spark 3.x, is currently under
development. We plan to make a preview version available for evaluation as
soon as it is feasible"

HTH

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Mon, 6 May 2024 at 05:09, Dongjoon Hyun  wrote:

> +1 for Holden's comment. Yes, it would be great to mention `it` as "soon".
> (If Wenchen release it on Monday, we can simply mention the release)
>
> In addition, Apache Spark PMC received an official notice from ASF Infra
> team.
>
> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg
> > [NOTICE] Apache Spark's GitHub Actions usage exceeds allowances for ASF
> projects
>
> To track and comply with the new ASF Infra Policy as much as possible, we
> opened a blocker-level JIRA issue and have been working on it.
> - https://infra.apache.org/github-actions-policy.html
>
> Please include a sentence that Apache Spark PMC is working on under the
> following umbrella JIRA issue.
>
> https://issues.apache.org/jira/browse/SPARK-48094
> > Reduce GitHub Action usage according to ASF project allowance
>
> Thanks,
> Dongjoon.
>
>
> On Sun, May 5, 2024 at 3:45 PM Holden Karau 
> wrote:
>
>> Do we want to include that we’re planning on having a preview release of
>> Spark 4 so folks can see the APIs “soon”?
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Sun, May 5, 2024 at 3:24 PM Matei Zaharia 
>> wrote:
>>
>>> It’s time for our quarterly ASF board report on Apache Spark this
>>> Wednesday. Here’s a draft, feel free to suggest changes.
>>>
>>> 
>>>
>>> Description:
>>>
>>> Apache Spark is a fast and general purpose engine for large-scale data
>>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>>> well as a rich set of libraries including stream processing, machine
>>> learning, and graph analytics.
>>>
>>> Issues for the board:
>>>
>>> - None
>>>
>>> Project status:
>>>
>>> - We made two patch releases: Spark 3.5.1 on February 28, 2024, and
>>> Spark 3.4.2 on April 18, 2024.
>>> - The votes on "SPIP: Structured Logging Framework for Apache Spark" and
>>> "Pure Python Package in PyPI (Spark Connect)" have passed.
>>> - The votes for two behavior changes have passed: "SPARK-4: Use ANSI
>>> SQL mode by default" and "SPARK-46122: Set
>>> spark.sql.legacy.createHiveTableByDefault to false".
>>> - The community decided that upcoming Spark 4.0 release will drop
>>> support for Python 3.8.
>>> - We started a discussion about the definition of behavior changes that
>>> is critical for version upgrades and user experience.
>>> - We've opened a dedicated repository for the Spark Kubernetes Operator
>>> at https://github.com/apache/spark-kubernetes-operator. We added a new
>>> version in Apache Spark JIRA for versioning of the Spark operator based on
>>> a vote result.
>>>
>>> Trademarks:
>>>
>>> - No changes since the last report.
>>>
>>> Latest releases:
>>> - Spark 3.4.3 was released on April 18, 2024
>>> - Spark 3.5.1 was released on February 28, 2024
>>> - Spark 3.3.4 was released on December 16, 2023
>>>
>>> Committers and PMC:
>>>
>>> - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng).
>>> - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and
>>> Yikun Jiang).
>>>
>>> 
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: ASF board report draft for May

2024-05-06 Thread Wenchen Fan
The preview release also needs a vote. I'll try my best to cut the RC on
Monday, but the actual release may take some time. Hopefully, we can get it
out this week but if the vote fails, it will take longer as we need more
RCs.

On Mon, May 6, 2024 at 7:22 AM Dongjoon Hyun 
wrote:

> +1 for Holden's comment. Yes, it would be great to mention `it` as "soon".
> (If Wenchen release it on Monday, we can simply mention the release)
>
> In addition, Apache Spark PMC received an official notice from ASF Infra
> team.
>
> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg
> > [NOTICE] Apache Spark's GitHub Actions usage exceeds allowances for ASF
> projects
>
> To track and comply with the new ASF Infra Policy as much as possible, we
> opened a blocker-level JIRA issue and have been working on it.
> - https://infra.apache.org/github-actions-policy.html
>
> Please include a sentence that Apache Spark PMC is working on under the
> following umbrella JIRA issue.
>
> https://issues.apache.org/jira/browse/SPARK-48094
> > Reduce GitHub Action usage according to ASF project allowance
>
> Thanks,
> Dongjoon.
>
>
> On Sun, May 5, 2024 at 3:45 PM Holden Karau 
> wrote:
>
>> Do we want to include that we’re planning on having a preview release of
>> Spark 4 so folks can see the APIs “soon”?
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Sun, May 5, 2024 at 3:24 PM Matei Zaharia 
>> wrote:
>>
>>> It’s time for our quarterly ASF board report on Apache Spark this
>>> Wednesday. Here’s a draft, feel free to suggest changes.
>>>
>>> 
>>>
>>> Description:
>>>
>>> Apache Spark is a fast and general purpose engine for large-scale data
>>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>>> well as a rich set of libraries including stream processing, machine
>>> learning, and graph analytics.
>>>
>>> Issues for the board:
>>>
>>> - None
>>>
>>> Project status:
>>>
>>> - We made two patch releases: Spark 3.5.1 on February 28, 2024, and
>>> Spark 3.4.2 on April 18, 2024.
>>> - The votes on "SPIP: Structured Logging Framework for Apache Spark" and
>>> "Pure Python Package in PyPI (Spark Connect)" have passed.
>>> - The votes for two behavior changes have passed: "SPARK-4: Use ANSI
>>> SQL mode by default" and "SPARK-46122: Set
>>> spark.sql.legacy.createHiveTableByDefault to false".
>>> - The community decided that upcoming Spark 4.0 release will drop
>>> support for Python 3.8.
>>> - We started a discussion about the definition of behavior changes that
>>> is critical for version upgrades and user experience.
>>> - We've opened a dedicated repository for the Spark Kubernetes Operator
>>> at https://github.com/apache/spark-kubernetes-operator. We added a new
>>> version in Apache Spark JIRA for versioning of the Spark operator based on
>>> a vote result.
>>>
>>> Trademarks:
>>>
>>> - No changes since the last report.
>>>
>>> Latest releases:
>>> - Spark 3.4.3 was released on April 18, 2024
>>> - Spark 3.5.1 was released on February 28, 2024
>>> - Spark 3.3.4 was released on December 16, 2023
>>>
>>> Committers and PMC:
>>>
>>> - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng).
>>> - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and
>>> Yikun Jiang).
>>>
>>> 
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: ASF board report draft for May

2024-05-05 Thread Dongjoon Hyun
+1 for Holden's comment. Yes, it would be great to mention `it` as "soon".
(If Wenchen release it on Monday, we can simply mention the release)

In addition, Apache Spark PMC received an official notice from ASF Infra
team.

https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg
> [NOTICE] Apache Spark's GitHub Actions usage exceeds allowances for ASF
projects

To track and comply with the new ASF Infra Policy as much as possible, we
opened a blocker-level JIRA issue and have been working on it.
- https://infra.apache.org/github-actions-policy.html

Please include a sentence that Apache Spark PMC is working on under the
following umbrella JIRA issue.

https://issues.apache.org/jira/browse/SPARK-48094
> Reduce GitHub Action usage according to ASF project allowance

Thanks,
Dongjoon.


On Sun, May 5, 2024 at 3:45 PM Holden Karau  wrote:

> Do we want to include that we’re planning on having a preview release of
> Spark 4 so folks can see the APIs “soon”?
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Sun, May 5, 2024 at 3:24 PM Matei Zaharia 
> wrote:
>
>> It’s time for our quarterly ASF board report on Apache Spark this
>> Wednesday. Here’s a draft, feel free to suggest changes.
>>
>> 
>>
>> Description:
>>
>> Apache Spark is a fast and general purpose engine for large-scale data
>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>> well as a rich set of libraries including stream processing, machine
>> learning, and graph analytics.
>>
>> Issues for the board:
>>
>> - None
>>
>> Project status:
>>
>> - We made two patch releases: Spark 3.5.1 on February 28, 2024, and Spark
>> 3.4.2 on April 18, 2024.
>> - The votes on "SPIP: Structured Logging Framework for Apache Spark" and
>> "Pure Python Package in PyPI (Spark Connect)" have passed.
>> - The votes for two behavior changes have passed: "SPARK-4: Use ANSI
>> SQL mode by default" and "SPARK-46122: Set
>> spark.sql.legacy.createHiveTableByDefault to false".
>> - The community decided that upcoming Spark 4.0 release will drop support
>> for Python 3.8.
>> - We started a discussion about the definition of behavior changes that
>> is critical for version upgrades and user experience.
>> - We've opened a dedicated repository for the Spark Kubernetes Operator
>> at https://github.com/apache/spark-kubernetes-operator. We added a new
>> version in Apache Spark JIRA for versioning of the Spark operator based on
>> a vote result.
>>
>> Trademarks:
>>
>> - No changes since the last report.
>>
>> Latest releases:
>> - Spark 3.4.3 was released on April 18, 2024
>> - Spark 3.5.1 was released on February 28, 2024
>> - Spark 3.3.4 was released on December 16, 2023
>>
>> Committers and PMC:
>>
>> - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng).
>> - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and
>> Yikun Jiang).
>>
>> 
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: ASF board report draft for May

2024-05-05 Thread Holden Karau
Do we want to include that we’re planning on having a preview release of
Spark 4 so folks can see the APIs “soon”?

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Sun, May 5, 2024 at 3:24 PM Matei Zaharia 
wrote:

> It’s time for our quarterly ASF board report on Apache Spark this
> Wednesday. Here’s a draft, feel free to suggest changes.
>
> 
>
> Description:
>
> Apache Spark is a fast and general purpose engine for large-scale data
> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
> well as a rich set of libraries including stream processing, machine
> learning, and graph analytics.
>
> Issues for the board:
>
> - None
>
> Project status:
>
> - We made two patch releases: Spark 3.5.1 on February 28, 2024, and Spark
> 3.4.2 on April 18, 2024.
> - The votes on "SPIP: Structured Logging Framework for Apache Spark" and
> "Pure Python Package in PyPI (Spark Connect)" have passed.
> - The votes for two behavior changes have passed: "SPARK-4: Use ANSI
> SQL mode by default" and "SPARK-46122: Set
> spark.sql.legacy.createHiveTableByDefault to false".
> - The community decided that upcoming Spark 4.0 release will drop support
> for Python 3.8.
> - We started a discussion about the definition of behavior changes that is
> critical for version upgrades and user experience.
> - We've opened a dedicated repository for the Spark Kubernetes Operator at
> https://github.com/apache/spark-kubernetes-operator. We added a new
> version in Apache Spark JIRA for versioning of the Spark operator based on
> a vote result.
>
> Trademarks:
>
> - No changes since the last report.
>
> Latest releases:
> - Spark 3.4.3 was released on April 18, 2024
> - Spark 3.5.1 was released on February 28, 2024
> - Spark 3.3.4 was released on December 16, 2023
>
> Committers and PMC:
>
> - The latest committer was added on Oct 2nd, 2023 (Jiaan Geng).
> - The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and
> Yikun Jiang).
>
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


ASF board report draft for May

2024-05-05 Thread Matei Zaharia
It’s time for our quarterly ASF board report on Apache Spark this Wednesday. 
Here’s a draft, feel free to suggest changes.



Description:

Apache Spark is a fast and general purpose engine for large-scale data 
processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well 
as a rich set of libraries including stream processing, machine learning, and 
graph analytics.

Issues for the board:

- None

Project status:

- We made two patch releases: Spark 3.5.1 on February 28, 2024, and Spark 3.4.2 
on April 18, 2024.
- The votes on "SPIP: Structured Logging Framework for Apache Spark" and "Pure 
Python Package in PyPI (Spark Connect)" have passed.
- The votes for two behavior changes have passed: "SPARK-4: Use ANSI SQL 
mode by default" and "SPARK-46122: Set 
spark.sql.legacy.createHiveTableByDefault to false".
- The community decided that upcoming Spark 4.0 release will drop support for 
Python 3.8.
- We started a discussion about the definition of behavior changes that is 
critical for version upgrades and user experience.
- We've opened a dedicated repository for the Spark Kubernetes Operator at 
https://github.com/apache/spark-kubernetes-operator. We added a new version in 
Apache Spark JIRA for versioning of the Spark operator based on a vote result.

Trademarks:

- No changes since the last report.

Latest releases:
- Spark 3.4.3 was released on April 18, 2024
- Spark 3.5.1 was released on February 28, 2024
- Spark 3.3.4 was released on December 16, 2023

Committers and PMC:

- The latest committer was added on Oct 2nd, 2023 (Jiaan Geng).
- The latest PMC members were added on Oct 2nd, 2023 (Yuanjian Li and Yikun 
Jiang).


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



ASF board report draft for May 2023

2023-05-09 Thread Matei Zaharia
It’s time to send our ASF board report again on May 10th. I’ve put together 
this draft — let me know whether to add anything else.



Issues for the board:

- None

Project status:

- We released Apache Spark 3.4 on April 13th, a feature release with over 2600 
patches. This release introduces Python client for Spark Connect, augments 
Structured Streaming with async progress tracking and Python arbitrary stateful 
processing, increases Pandas API coverage and provides NumPy input support, 
simplifies the migration from traditional data warehouses to Apache Spark by 
improving ANSI compliance and implementing dozens of new built-in functions, 
and boosts development productivity and debuggability with memory profiling.

- We made two patch releases: Spark 3.2.4 on April 13th and Spark 3.3.2 on 
February 17th. These have bug fixes to the corresponding branches of the 
project.

- The PMC voted to add three new PMC members to the project (to be announced 
soon once they accept).

Trademarks:

- No changes since the last report.

Latest releases:

- Spark 3.4.0 was released on April 13, 2023
- Spark 3.2.4 on April 13, 2023
- Spark 3.3.2 on February 17, 2023

Committers and PMC:

- The latest committer was added on Oct 2nd, 2022 (Yikun Jiang).
- The latest PMC member was added on June 28th, 2022 (Huaxin Gao).


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



ASF board report draft for May 2022

2022-05-10 Thread Matei Zaharia
Hi all,

It’s time to submit our quarterly ASF board report again this Wednesday. I’ve 
put together the draft below. Let me know if you have any suggestions:

===

Description:

Apache Spark is a fast and general purpose engine for large-scale data
processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
well as a rich set of libraries including stream processing, machine learning,
and graph analytics.

Issues for the board:

- None

Project status:

- We are working on the release of Spark 3.3.0, with Release Candidate 1 
currently being tested and voted on.

- We released Apache Spark 3.1.3, a bug fix release for the 3.1 line, on 
February 18th.

- We started publishing official Docker images of Apache Spark in Docker Hub, 
at https://hub.docker.com/r/apache/spark/tags
 
- A new Spark Project Improvement Proposal (SPIP) is being discussed by the 
community to offer a simplified API for deep learning inference, including 
built-in integration with popular libraries such as Tensorflow, PyTorch and 
HuggingFace (https://issues.apache.org/jira/browse/SPARK-38648).

Trademarks:

- No changes since the last report.

Latest releases:

- Spark 3.1.3 was released on February 18, 2022.
- Spark 3.2.1 was released on January 26, 2022.
- Spark 3.2.0 was released on October 13, 2021.

Committers and PMC:
- The latest committer was added on Dec 20th, 2021 (Yuanjian Li).
- The latest PMC member was added on Jan 19th, 2022 (Maciej Szymkiewicz).

===



-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



ASF board report draft for May

2021-05-10 Thread Matei Zaharia
It’s time for our quarterly report to the ASF board, which we need to submit on 
Wednesday. I’ve put together the following draft based on activity in the 
community — let me know if you’d like to add or change anything:

==

Description:

Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set
of libraries including stream processing, machine learning, and graph
analytics.

Issues for the board:

- None

Project status:

- We released Apache Spark 3.1.1, a major update release for the 3.x branch, on 
March 2nd. This release includes updates to improve Python usability and error 
messages, ANSI SQL support, the streaming UI, and support for running Apache 
Spark on Kubernetes, which is now marked GA. Overall, the release includes 
about 1500 patches.

- We are voting on an Apache Spark 2.4.8 bug fix release with for the Spark 2.x 
line. This may be the last release on 2.x.

- We added six new committers to the project: Atilla Zsolt Piros, Gabor 
Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu.

- Several SPIPs (major project improvement proposals) were voted on and 
accepted, including adding a Function Catalog in Spark SQL and adding a Pandas 
API layer for PySpark based on the Koalas project. We’ve also started an effort 
to standardize error message reporting in Apache Spark 
(https://spark.apache.org/error-message-guidelines.html) so that messages are 
easier to understand and users can quickly figure out how to fix them.

Trademarks:

- No changes since the last report.

Latest releases:

- Spark 3.1.1 was released on March 2nd, 2021.
- Spark 3.0.2 was released on February 19th, 2021.
- Spark 2.4.7 was released on September 12th, 2020.

Committers and PMC:

- The latest committers were added on March 11th, 2021 (Atilla Zsolt Piros, 
Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu).
- The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun). The PMC 
has been discussing some new PMC candidates.
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



ASF board report draft for May

2020-05-11 Thread Matei Zaharia
Hi all,

Our quarterly project board report needs to be submitted on May 13th, and I 
wanted to include anything notable going on that we want to appear in the board 
archive. Here is my draft below — let me know if you have suggested changes.

===

Apache Spark is a fast and general engine for large-scale data processing. It 
offers high-level APIs in Java, Scala, Python and R as well as a rich set of 
libraries including stream processing, machine learning, and graph analytics.

Project status:

- Progress is continuing on the upcoming Apache Spark 3.0 release, with the 
first votes on release candidates. This will be a major release with various 
API and SQL language updates, so we’ve tried to solicit broad input on it 
through two preview releases and a lot of JIRA and mailing list discussion.

- The community is also voting on a release candidate for Apache Spark 2.4.6, 
bringing bug fixes to the 2.4 branch.

Trademarks:

- Nothing new to report in the past 3 months.

Latest releases:

- Spark 2.4.5 was released on Feb 8th, 2020.
- Spark 3.0.0-preview2 was released on Dec 23rd, 2019.
- Spark 3.0.0-preview was released on Nov 6th, 2019.
- Spark 2.3.4 was released on Sept 9th, 2019.

Committers and PMC:

- The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun).
- The latest committer was added on Sept 9th, 2019 (Weichen Xu). We also added
Ryan Blue, L.C. Hsieh, Gengliang Wang, Yuming Wang and Ruifeng Zheng as
committers in the past three months.
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org