Re: ASF board report draft for August 2023

2023-08-09 Thread Matei Zaharia
Sounds good, I’ll add that.

> On Aug 8, 2023, at 9:34 AM, Holden Karau  wrote:
> 
> Maybe add a link to the 4.0 JIRA where we are tracking the current plans for 
> 4.0?
> 
> On Tue, Aug 8, 2023 at 9:33 AM Dongjoon Hyun  > wrote:
>> Thank you, Matei.
>> 
>> It looks good to me.
>> 
>> Dongjoon
>> 
>> On Mon, Aug 7, 2023 at 22:54 Matei Zaharia > > wrote:
>>> It’s time to send our quarterly report to the ASF board on August 9th. 
>>> Here’s what I wrote as a draft — feel free to suggest changes.
>>> 
>>> =
>>> 
>>> Issues for the board:
>>> 
>>> - None
>>> 
>>> Project status:
>>> 
>>> - We cut the branch Spark 3.5.0 on July 17th 2023. The community is working 
>>> on bug fixes, tests, stability and documentation.
>>> - We made a patch release, Spark 3.4.1, on June 23, 2023.
>>> - We are preparing a Spark 3.3.3 release for later this month 
>>> (https://lists.apache.org/thread/0kgnw8njjnfgc5nghx60mn7oojvrqwj7).
>>> - Votes on three Spark Project Improvement Proposals (SPIP) passed: "XML 
>>> data source support", "Python Data Source API", and "PySpark Test 
>>> Framework".
>>> - A vote for "Apache Spark PMC asks Databricks to differentiate its Spark 
>>> version string" did not pass. This was asking a company to change the 
>>> string returned by Spark APIs in a product that packages a modified version 
>>> of Apache Spark.
>>> - The community decided to release Apache Spark 4.0.0 after the 3.5.0 
>>> version.
>>> - An official Apache Spark Docker image is now available at 
>>> https://hub.docker.com/_/spark
>>> - A new repository, https://github.com/apache/spark-connect-go, was created 
>>> for the Go client of Spark Connect.
>>> - The PMC voted to add two new committers to the project, XiDuo You and 
>>> Peter Toth
>>> 
>>> Trademarks:
>>> 
>>> - No changes since the last report.
>>> 
>>> Latest releases:
>>> 
>>> - We released Apache Spark 3.4.1 on June 23, 2023
>>> - We released Apache Spark 3.2.4 on April 13, 2023
>>> - We released Spark 3.3.2 on February 17, 2023
>>> 
>>> Committers and PMC:
>>> 
>>> - The latest committers were added on July 11th, 2023 (XiDuo You and Peter 
>>> Toth).
>>> - The latest PMC members were added on May 10th, 2023 (Chao Sun, Xinrong 
>>> Meng and Ruifeng Zheng).
>>> 
>>> =
> -- 
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
>  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau



Re: ASF board report draft for August 2023

2023-08-08 Thread Holden Karau
Maybe add a link to the 4.0 JIRA where we are tracking the current plans
for 4.0?

On Tue, Aug 8, 2023 at 9:33 AM Dongjoon Hyun 
wrote:

> Thank you, Matei.
>
> It looks good to me.
>
> Dongjoon
>
> On Mon, Aug 7, 2023 at 22:54 Matei Zaharia 
> wrote:
>
>> It’s time to send our quarterly report to the ASF board on August 9th.
>> Here’s what I wrote as a draft — feel free to suggest changes.
>>
>> =
>>
>> Issues for the board:
>>
>> - None
>>
>> Project status:
>>
>> - We cut the branch Spark 3.5.0 on July 17th 2023. The community is
>> working on bug fixes, tests, stability and documentation.
>> - We made a patch release, Spark 3.4.1, on June 23, 2023.
>> - We are preparing a Spark 3.3.3 release for later this month (
>> https://lists.apache.org/thread/0kgnw8njjnfgc5nghx60mn7oojvrqwj7).
>> - Votes on three Spark Project Improvement Proposals (SPIP) passed: "XML
>> data source support", "Python Data Source API", and "PySpark Test
>> Framework".
>> - A vote for "Apache Spark PMC asks Databricks to differentiate its Spark
>> version string" did not pass. This was asking a company to change the
>> string returned by Spark APIs in a product that packages a modified version
>> of Apache Spark.
>> - The community decided to release Apache Spark 4.0.0 after the 3.5.0
>> version.
>> - An official Apache Spark Docker image is now available at
>> https://hub.docker.com/_/spark
>> - A new repository, https://github.com/apache/spark-connect-go, was
>> created for the Go client of Spark Connect.
>> - The PMC voted to add two new committers to the project, XiDuo You and
>> Peter Toth
>>
>> Trademarks:
>>
>> - No changes since the last report.
>>
>> Latest releases:
>>
>> - We released Apache Spark 3.4.1 on June 23, 2023
>> - We released Apache Spark 3.2.4 on April 13, 2023
>> - We released Spark 3.3.2 on February 17, 2023
>>
>> Committers and PMC:
>>
>> - The latest committers were added on July 11th, 2023 (XiDuo You and
>> Peter Toth).
>> - The latest PMC members were added on May 10th, 2023 (Chao Sun, Xinrong
>> Meng and Ruifeng Zheng).
>>
>> =
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: ASF board report draft for August 2023

2023-08-08 Thread Dongjoon Hyun
Thank you, Matei.

It looks good to me.

Dongjoon

On Mon, Aug 7, 2023 at 22:54 Matei Zaharia  wrote:

> It’s time to send our quarterly report to the ASF board on August 9th.
> Here’s what I wrote as a draft — feel free to suggest changes.
>
> =
>
> Issues for the board:
>
> - None
>
> Project status:
>
> - We cut the branch Spark 3.5.0 on July 17th 2023. The community is
> working on bug fixes, tests, stability and documentation.
> - We made a patch release, Spark 3.4.1, on June 23, 2023.
> - We are preparing a Spark 3.3.3 release for later this month (
> https://lists.apache.org/thread/0kgnw8njjnfgc5nghx60mn7oojvrqwj7).
> - Votes on three Spark Project Improvement Proposals (SPIP) passed: "XML
> data source support", "Python Data Source API", and "PySpark Test
> Framework".
> - A vote for "Apache Spark PMC asks Databricks to differentiate its Spark
> version string" did not pass. This was asking a company to change the
> string returned by Spark APIs in a product that packages a modified version
> of Apache Spark.
> - The community decided to release Apache Spark 4.0.0 after the 3.5.0
> version.
> - An official Apache Spark Docker image is now available at
> https://hub.docker.com/_/spark
> - A new repository, https://github.com/apache/spark-connect-go, was
> created for the Go client of Spark Connect.
> - The PMC voted to add two new committers to the project, XiDuo You and
> Peter Toth
>
> Trademarks:
>
> - No changes since the last report.
>
> Latest releases:
>
> - We released Apache Spark 3.4.1 on June 23, 2023
> - We released Apache Spark 3.2.4 on April 13, 2023
> - We released Spark 3.3.2 on February 17, 2023
>
> Committers and PMC:
>
> - The latest committers were added on July 11th, 2023 (XiDuo You and Peter
> Toth).
> - The latest PMC members were added on May 10th, 2023 (Chao Sun, Xinrong
> Meng and Ruifeng Zheng).
>
> =


ASF board report draft for August 2023

2023-08-07 Thread Matei Zaharia
It’s time to send our quarterly report to the ASF board on August 9th. Here’s 
what I wrote as a draft — feel free to suggest changes.

=

Issues for the board:

- None

Project status:

- We cut the branch Spark 3.5.0 on July 17th 2023. The community is working on 
bug fixes, tests, stability and documentation.
- We made a patch release, Spark 3.4.1, on June 23, 2023.
- We are preparing a Spark 3.3.3 release for later this month 
(https://lists.apache.org/thread/0kgnw8njjnfgc5nghx60mn7oojvrqwj7).
- Votes on three Spark Project Improvement Proposals (SPIP) passed: "XML data 
source support", "Python Data Source API", and "PySpark Test Framework".
- A vote for "Apache Spark PMC asks Databricks to differentiate its Spark 
version string" did not pass. This was asking a company to change the string 
returned by Spark APIs in a product that packages a modified version of Apache 
Spark.
- The community decided to release Apache Spark 4.0.0 after the 3.5.0 version.
- An official Apache Spark Docker image is now available at 
https://hub.docker.com/_/spark
- A new repository, https://github.com/apache/spark-connect-go, was created for 
the Go client of Spark Connect.
- The PMC voted to add two new committers to the project, XiDuo You and Peter 
Toth

Trademarks:

- No changes since the last report.

Latest releases:

- We released Apache Spark 3.4.1 on June 23, 2023
- We released Apache Spark 3.2.4 on April 13, 2023
- We released Spark 3.3.2 on February 17, 2023

Committers and PMC:

- The latest committers were added on July 11th, 2023 (XiDuo You and Peter 
Toth).
- The latest PMC members were added on May 10th, 2023 (Chao Sun, Xinrong Meng 
and Ruifeng Zheng).

=
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: ASF board report draft for August

2022-08-10 Thread Matei Zaharia
Actually I forgot to add one more item. I want to mention that the community 
started a large effort to improve Structured Streaming performance, usability, 
APIs, and connectors (https://issues.apache.org/jira/browse/SPARK-40025 
), and we’d love to get 
feedback and contributions on that.

> On Aug 10, 2022, at 11:16 AM, Matei Zaharia  wrote:
> 
> It’s time to submit our quarterly report to the ASF board on Friday. Here is 
> a draft, lmk if you have suggestions:
> 
> ===
> 
> Description:
> 
> Apache Spark is a fast and general purpose engine for large-scale data
> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
> well as a rich set of libraries including stream processing, machine learning,
> and graph analytics.
> 
> Issues for the board:
> 
> - None
> 
> Project status:
> 
> - Apache Spark was honored to receive the SIGMOD System Award this year, 
> given by SIGMOD (the ACM’s data management research organization) to 
> impactful real-world and research systems.
> 
> - We recently released Apache Spark 3.3.0, a feature release that improves 
> join query performance via Bloom filters, increases the Pandas API coverage 
> with the support of popular Pandas features such as datetime.timedelta and 
> merge_asof, simplifies the migration from traditional data warehouses by 
> improving ANSI SQL compliance and supporting dozens of new built-in 
> functions, boosts development productivity with better error handling, 
> autocompletion, performance, and profiling.
> 
> - We released Apache Spark 3.2.2, a bug fix release for the 3.2 line, on July 
> 17th.
> 
> - A Spark Project Improvement Proposal (SPIP) for Spark Connect was voted on 
> and accepted. Spark Connect introduces a lightweight client/server API for 
> Spark (https://issues.apache.org/jira/browse/SPARK-39375) that will allow 
> applications to submit work to a remote Spark cluster without running the 
> heavyweight query planner in the client, and will also decouple the client 
> version from the server version, making it possible to update Spark without 
> updating all the applications.
> 
> - We added three new PMC members, Huaxin Gao, Gengliang Wang and Maxim Gekk, 
> in June 2022.
> 
> - We added a new committer, Xinrong Meng, in July 2022.
> 
> Trademarks:
> 
> - No changes since the last report.
> 
> Latest releases:
> 
> - Spark 3.3.0 was released on June 16, 2022.
> - Spark 3.2.2 was released on July 17, 2022.
> - Spark 3.1.3 was released on February 18, 2022.
> 
> Committers and PMC:
> 
> - The latest committer was added on July 13rd, 2022 (Xinrong Meng).
> - The latest PMC member was added on June 28th, 2022 (Huaxin Gao).
> 
> ===



ASF board report draft for August

2022-08-10 Thread Matei Zaharia
It’s time to submit our quarterly report to the ASF board on Friday. Here is a 
draft, lmk if you have suggestions:

===

Description:

Apache Spark is a fast and general purpose engine for large-scale data
processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
well as a rich set of libraries including stream processing, machine learning,
and graph analytics.

Issues for the board:

- None

Project status:

- Apache Spark was honored to receive the SIGMOD System Award this year, given 
by SIGMOD (the ACM’s data management research organization) to impactful 
real-world and research systems.

- We recently released Apache Spark 3.3.0, a feature release that improves join 
query performance via Bloom filters, increases the Pandas API coverage with the 
support of popular Pandas features such as datetime.timedelta and merge_asof, 
simplifies the migration from traditional data warehouses by improving ANSI SQL 
compliance and supporting dozens of new built-in functions, boosts development 
productivity with better error handling, autocompletion, performance, and 
profiling.

- We released Apache Spark 3.2.2, a bug fix release for the 3.2 line, on July 
17th.

- A Spark Project Improvement Proposal (SPIP) for Spark Connect was voted on 
and accepted. Spark Connect introduces a lightweight client/server API for 
Spark (https://issues.apache.org/jira/browse/SPARK-39375) that will allow 
applications to submit work to a remote Spark cluster without running the 
heavyweight query planner in the client, and will also decouple the client 
version from the server version, making it possible to update Spark without 
updating all the applications.

- We added three new PMC members, Huaxin Gao, Gengliang Wang and Maxim Gekk, in 
June 2022.

- We added a new committer, Xinrong Meng, in July 2022.

Trademarks:

- No changes since the last report.

Latest releases:

- Spark 3.3.0 was released on June 16, 2022.
- Spark 3.2.2 was released on July 17, 2022.
- Spark 3.1.3 was released on February 18, 2022.

Committers and PMC:

- The latest committer was added on July 13rd, 2022 (Xinrong Meng).
- The latest PMC member was added on June 28th, 2022 (Huaxin Gao).

===

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: ASF board report draft for August

2021-08-10 Thread Matei Zaharia
Good point, I’ll make sure to include that.

> On Aug 9, 2021, at 9:20 PM, Mridul Muralidharan  wrote:
> 
> Hi Matei,
> 
>   3.2 will also include support for pushed based shuffle (spip SPARK-30602).
> 
> Regards,
> Mridul
> 
> On Mon, Aug 9, 2021 at 9:26 PM Hyukjin Kwon  > wrote:
> > Are you referring to what version of Koala project? 1.8.1?
> 
> Yes, the latest version 1.8.1.
> 
> 2021년 8월 10일 (화) 오전 11:07, Igor Costa  >님이 작성:
> Hi Matei, nice update
> 
> 
> Just one question, when you mention “ We are working on Spark 3.2.0 as our 
> next release, with a release candidate likely to come soon. Spark 3.2 
> includes a new Pandas API for Apache Spark based on the Koalas project”
> 
> 
> Are you referring to what version of Koala project? 1.8.1?
> 
> 
> 
> Cheers
> Igor 
> 
> On Tue, 10 Aug 2021 at 13:31, Matei Zaharia  > wrote:
> It’s time for our quarterly report to the ASF board, which we need to send 
> out this Wednesday. I wrote the draft below based on community activity — let 
> me know if you’d like to add or change anything:
> 
> ==
> 
> Description:
> 
> Apache Spark is a fast and general engine for large-scale data processing. It 
> offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich 
> set of libraries including stream processing, machine learning, and graph 
> analytics.
> 
> Issues for the board:
> 
> - None
> 
> Project status:
> 
> - We made a number of maintenance releases in the past three months. We 
> released Apache Spark 3.1.2 and 3.0.3 in June as maintenance releases for the 
> 3.x branches. We also released Apache Spark 2.4.8 on May 17 as a bug fix 
> release for the Spark 2.x line. This may be the last release on 2.x unless 
> major new bugs are found.
> 
> - We added three PMC members: Liang-Chi Hsieh, Kousuke Saruta and Takeshi 
> Yamamuro.
> 
> - We are working on Spark 3.2.0 as our next release, with a release candidate 
> likely to come soon. Spark 3.2 includes a new Pandas API for Apache Spark 
> based on the Koalas project, a RocksDB state store for Structured Streaming, 
> native support for session windows, error message standardization, and 
> significant improvements to Spark SQL, such as the use of adaptive query 
> execution by default.
> 
> Trademarks:
> 
> - No changes since the last report.
> 
> Latest releases:
> 
> - Spark 3.1.2 was released on June 23rd, 2021.
> - Spark 3.0.3 was released on June 1st, 2021.
> - Spark 2.4.8 was released on May 17th, 2021.
> 
> Committers and PMC:
> 
> - The latest committers were added on March 11th, 2021 (Atilla Zsolt Piros, 
> Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu).
> - The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).
> 
> 
> 
> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 
> -- 
> Sent from Gmail Mobile



Re: ASF board report draft for August

2021-08-09 Thread Mridul Muralidharan
Hi Matei,

  3.2 will also include support for pushed based shuffle (spip SPARK-30602).

Regards,
Mridul

On Mon, Aug 9, 2021 at 9:26 PM Hyukjin Kwon  wrote:

> > Are you referring to what version of Koala project? 1.8.1?
>
> Yes, the latest version 1.8.1.
>
> 2021년 8월 10일 (화) 오전 11:07, Igor Costa 님이 작성:
>
>> Hi Matei, nice update
>>
>>
>> Just one question, when you mention “ We are working on Spark 3.2.0 as
>> our next release, with a release candidate likely to come soon. Spark 3.2
>> includes a new Pandas API for Apache Spark based on the Koalas project”
>>
>>
>> Are you referring to what version of Koala project? 1.8.1?
>>
>>
>>
>> Cheers
>> Igor
>>
>> On Tue, 10 Aug 2021 at 13:31, Matei Zaharia 
>> wrote:
>>
>>> It’s time for our quarterly report to the ASF board, which we need to
>>> send out this Wednesday. I wrote the draft below based on community
>>> activity — let me know if you’d like to add or change anything:
>>>
>>> ==
>>>
>>> Description:
>>>
>>> Apache Spark is a fast and general engine for large-scale data
>>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>>> well as a rich set of libraries including stream processing, machine
>>> learning, and graph analytics.
>>>
>>> Issues for the board:
>>>
>>> - None
>>>
>>> Project status:
>>>
>>> - We made a number of maintenance releases in the past three months. We
>>> released Apache Spark 3.1.2 and 3.0.3 in June as maintenance releases for
>>> the 3.x branches. We also released Apache Spark 2.4.8 on May 17 as a bug
>>> fix release for the Spark 2.x line. This may be the last release on 2.x
>>> unless major new bugs are found.
>>>
>>> - We added three PMC members: Liang-Chi Hsieh, Kousuke Saruta and
>>> Takeshi Yamamuro.
>>>
>>> - We are working on Spark 3.2.0 as our next release, with a release
>>> candidate likely to come soon. Spark 3.2 includes a new Pandas API for
>>> Apache Spark based on the Koalas project, a RocksDB state store for
>>> Structured Streaming, native support for session windows, error message
>>> standardization, and significant improvements to Spark SQL, such as the use
>>> of adaptive query execution by default.
>>>
>>> Trademarks:
>>>
>>> - No changes since the last report.
>>>
>>> Latest releases:
>>>
>>> - Spark 3.1.2 was released on June 23rd, 2021.
>>> - Spark 3.0.3 was released on June 1st, 2021.
>>> - Spark 2.4.8 was released on May 17th, 2021.
>>>
>>> Committers and PMC:
>>>
>>> - The latest committers were added on March 11th, 2021 (Atilla Zsolt
>>> Piros, Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu).
>>> - The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).
>>>
>>>
>>>
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
>> Sent from Gmail Mobile
>>
>


Re: ASF board report draft for August

2021-08-09 Thread Hyukjin Kwon
> Are you referring to what version of Koala project? 1.8.1?

Yes, the latest version 1.8.1.

2021년 8월 10일 (화) 오전 11:07, Igor Costa 님이 작성:

> Hi Matei, nice update
>
>
> Just one question, when you mention “ We are working on Spark 3.2.0 as
> our next release, with a release candidate likely to come soon. Spark 3.2
> includes a new Pandas API for Apache Spark based on the Koalas project”
>
>
> Are you referring to what version of Koala project? 1.8.1?
>
>
>
> Cheers
> Igor
>
> On Tue, 10 Aug 2021 at 13:31, Matei Zaharia 
> wrote:
>
>> It’s time for our quarterly report to the ASF board, which we need to
>> send out this Wednesday. I wrote the draft below based on community
>> activity — let me know if you’d like to add or change anything:
>>
>> ==
>>
>> Description:
>>
>> Apache Spark is a fast and general engine for large-scale data
>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>> well as a rich set of libraries including stream processing, machine
>> learning, and graph analytics.
>>
>> Issues for the board:
>>
>> - None
>>
>> Project status:
>>
>> - We made a number of maintenance releases in the past three months. We
>> released Apache Spark 3.1.2 and 3.0.3 in June as maintenance releases for
>> the 3.x branches. We also released Apache Spark 2.4.8 on May 17 as a bug
>> fix release for the Spark 2.x line. This may be the last release on 2.x
>> unless major new bugs are found.
>>
>> - We added three PMC members: Liang-Chi Hsieh, Kousuke Saruta and Takeshi
>> Yamamuro.
>>
>> - We are working on Spark 3.2.0 as our next release, with a release
>> candidate likely to come soon. Spark 3.2 includes a new Pandas API for
>> Apache Spark based on the Koalas project, a RocksDB state store for
>> Structured Streaming, native support for session windows, error message
>> standardization, and significant improvements to Spark SQL, such as the use
>> of adaptive query execution by default.
>>
>> Trademarks:
>>
>> - No changes since the last report.
>>
>> Latest releases:
>>
>> - Spark 3.1.2 was released on June 23rd, 2021.
>> - Spark 3.0.3 was released on June 1st, 2021.
>> - Spark 2.4.8 was released on May 17th, 2021.
>>
>> Committers and PMC:
>>
>> - The latest committers were added on March 11th, 2021 (Atilla Zsolt
>> Piros, Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu).
>> - The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).
>>
>>
>>
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> --
> Sent from Gmail Mobile
>


Re: ASF board report draft for August

2021-08-09 Thread Hyukjin Kwon
There is an SPIP passed and ready for Spark 3.2:

pandas API on Spark:
- JIRA: SPIP: Support pandas API layer on PySpark (
https://issues.apache.org/jira/browse/SPARK-34849)
- Vote: [VOTE] SPIP: Support pandas API layer on PySpark (
https://www.mail-archive.com/dev@spark.apache.org/msg27605.html)
- Design documentation: Koalas Internals (
https://docs.google.com/document/d/1tk24aq6FV5Wu2bX_Ym606doLFnrZsh4FdUd52FqojZU
)


2021년 8월 10일 (화) 오전 10:31, Matei Zaharia 님이 작성:

> It’s time for our quarterly report to the ASF board, which we need to send
> out this Wednesday. I wrote the draft below based on community activity —
> let me know if you’d like to add or change anything:
>
> ==
>
> Description:
>
> Apache Spark is a fast and general engine for large-scale data processing.
> It offers high-level APIs in Java, Scala, Python, R and SQL as well as a
> rich set of libraries including stream processing, machine learning, and
> graph analytics.
>
> Issues for the board:
>
> - None
>
> Project status:
>
> - We made a number of maintenance releases in the past three months. We
> released Apache Spark 3.1.2 and 3.0.3 in June as maintenance releases for
> the 3.x branches. We also released Apache Spark 2.4.8 on May 17 as a bug
> fix release for the Spark 2.x line. This may be the last release on 2.x
> unless major new bugs are found.
>
> - We added three PMC members: Liang-Chi Hsieh, Kousuke Saruta and Takeshi
> Yamamuro.
>
> - We are working on Spark 3.2.0 as our next release, with a release
> candidate likely to come soon. Spark 3.2 includes a new Pandas API for
> Apache Spark based on the Koalas project, a RocksDB state store for
> Structured Streaming, native support for session windows, error message
> standardization, and significant improvements to Spark SQL, such as the use
> of adaptive query execution by default.
>
> Trademarks:
>
> - No changes since the last report.
>
> Latest releases:
>
> - Spark 3.1.2 was released on June 23rd, 2021.
> - Spark 3.0.3 was released on June 1st, 2021.
> - Spark 2.4.8 was released on May 17th, 2021.
>
> Committers and PMC:
>
> - The latest committers were added on March 11th, 2021 (Atilla Zsolt
> Piros, Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu).
> - The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).
>
>
>
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: ASF board report draft for August

2021-08-09 Thread Igor Costa
Hi Matei, nice update


Just one question, when you mention “ We are working on Spark 3.2.0 as our
next release, with a release candidate likely to come soon. Spark 3.2
includes a new Pandas API for Apache Spark based on the Koalas project”


Are you referring to what version of Koala project? 1.8.1?



Cheers
Igor

On Tue, 10 Aug 2021 at 13:31, Matei Zaharia  wrote:

> It’s time for our quarterly report to the ASF board, which we need to send
> out this Wednesday. I wrote the draft below based on community activity —
> let me know if you’d like to add or change anything:
>
> ==
>
> Description:
>
> Apache Spark is a fast and general engine for large-scale data processing.
> It offers high-level APIs in Java, Scala, Python, R and SQL as well as a
> rich set of libraries including stream processing, machine learning, and
> graph analytics.
>
> Issues for the board:
>
> - None
>
> Project status:
>
> - We made a number of maintenance releases in the past three months. We
> released Apache Spark 3.1.2 and 3.0.3 in June as maintenance releases for
> the 3.x branches. We also released Apache Spark 2.4.8 on May 17 as a bug
> fix release for the Spark 2.x line. This may be the last release on 2.x
> unless major new bugs are found.
>
> - We added three PMC members: Liang-Chi Hsieh, Kousuke Saruta and Takeshi
> Yamamuro.
>
> - We are working on Spark 3.2.0 as our next release, with a release
> candidate likely to come soon. Spark 3.2 includes a new Pandas API for
> Apache Spark based on the Koalas project, a RocksDB state store for
> Structured Streaming, native support for session windows, error message
> standardization, and significant improvements to Spark SQL, such as the use
> of adaptive query execution by default.
>
> Trademarks:
>
> - No changes since the last report.
>
> Latest releases:
>
> - Spark 3.1.2 was released on June 23rd, 2021.
> - Spark 3.0.3 was released on June 1st, 2021.
> - Spark 2.4.8 was released on May 17th, 2021.
>
> Committers and PMC:
>
> - The latest committers were added on March 11th, 2021 (Atilla Zsolt
> Piros, Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu).
> - The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).
>
>
>
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
> --
Sent from Gmail Mobile


ASF board report draft for August

2021-08-09 Thread Matei Zaharia
It’s time for our quarterly report to the ASF board, which we need to send out 
this Wednesday. I wrote the draft below based on community activity — let me 
know if you’d like to add or change anything:

==

Description:

Apache Spark is a fast and general engine for large-scale data processing. It 
offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set 
of libraries including stream processing, machine learning, and graph analytics.

Issues for the board:

- None

Project status:

- We made a number of maintenance releases in the past three months. We 
released Apache Spark 3.1.2 and 3.0.3 in June as maintenance releases for the 
3.x branches. We also released Apache Spark 2.4.8 on May 17 as a bug fix 
release for the Spark 2.x line. This may be the last release on 2.x unless 
major new bugs are found.

- We added three PMC members: Liang-Chi Hsieh, Kousuke Saruta and Takeshi 
Yamamuro.

- We are working on Spark 3.2.0 as our next release, with a release candidate 
likely to come soon. Spark 3.2 includes a new Pandas API for Apache Spark based 
on the Koalas project, a RocksDB state store for Structured Streaming, native 
support for session windows, error message standardization, and significant 
improvements to Spark SQL, such as the use of adaptive query execution by 
default.

Trademarks:

- No changes since the last report.

Latest releases:

- Spark 3.1.2 was released on June 23rd, 2021.
- Spark 3.0.3 was released on June 1st, 2021.
- Spark 2.4.8 was released on May 17th, 2021.

Committers and PMC:

- The latest committers were added on March 11th, 2021 (Atilla Zsolt Piros, 
Gabor Somogyi, Kent Yao, Maciej Szymkiewicz, Max Gekk, and Yi Wu).
- The latest PMC member was added on June 20th, 2021 (Kousuke Saruta).





-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



ASF board report draft for August

2020-08-10 Thread Matei Zaharia
Hi all,

Our quarterly project board report needs to be submitted on August 12th, and I 
wanted to include anything notable going on that we want to appear in the board 
archive. Here is my draft below; let me know if you have suggested changes.

===

Apache Spark is a fast and general engine for large-scale data processing. It 
offers high-level APIs in Java, Scala, Python, R and SQL as well as a rich set 
of libraries including stream processing, machine learning, and graph analytics.

Project status:

- We released Apache Spark 3.0.0 on June 18th, 2020. This was our largest 
release yet, containing over 3400 patches from the community, including 
significant improvements to SQL performance, ANSI SQL compatibility, Python 
APIs, SparkR performance, error reporting and monitoring tools. This release 
also enhances Spark’s job scheduler to support adaptive execution (changing 
query plans at runtime to reduce the need for configuration) and workloads that 
need hardware accelerators.

- We released Apache Spark 2.4.6 on June 5th, 2020 with bug fixes to the 2.4 
line.

- The community is working on 3.0.1 and 2.4.7 releases with bug fixes to these 
two branches.

- We had a discussion on the dev list about clarifying our process for handling 
-1 votes on patches, which will go into updated guidelines on our website.

- We added three committers to the project since the last report: Huaxin Gao, 
Jungtaek Lim and Dilip Biswal.

Trademarks:

- We engaged with two companies that had created products with “Spark” in the 
name to ask them to follow our trademark guidelines.

Latest releases:

- Spark 3.0.0 was released on June 18th, 2020.
- Spark 2.4.6 was released on June 5th, 2020.
- Spark 2.4.5 was released on Feb 8th, 2020.

Committers and PMC:

- The latest PMC member was added on Sept 4th, 2019 (Dongjoon Hyun).
- The latest committers were added on July 14th, 2020 (Huaxin Gao, Jungtaek Lim 
and Dilip Biswal).
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



ASF board report draft for August

2019-08-12 Thread Matei Zaharia
Hi all,

It’s time to submit our quarterly report to the ASF board again this Wednesday. 
Here is my draft about what’s new — feel free to suggest changes.



Apache Spark is a fast and general engine for large-scale data processing. It
offers high-level APIs in Java, Scala, Python and R as well as a rich set of
libraries including stream processing, machine learning, and graph analytics.

Project status:

- Discussions are continuing about our next feature release, which will likely
  be Spark 3.0, on the dev and user mailing lists. Some key questions include
  whether to remove various deprecated APIs, and which minimum versions of
  Java, Python, Scala, etc to support. There are also a number of new features
  targeting this release. We encourage everyone in the community to give
  feedback on these discussions through our mailing lists or issue tracker.

- We announced a plan to stop supporting Python 2 in our next major release,
  as many other projects in the Python ecosystem are now dropping support
  (https://spark.apache.org/news/plan-for-dropping-python-2-support.html 
).

- We added three new PMC members to the project in May: Takuya Ueshin,
  Jerry Shao and Hyukjin Kwon.

- There is an ongoing discussion on our dev list about whether to consider
  adding project committers who do not contribute to the code or docs in the
  project, and what the criteria might be for those. (Note that the project does
  solicit committers who only work on docs, and has also added committers
  who work on other tasks, like maintaining our build infrastructure).

Trademarks:

- We are continuing engagement with various organizations.

Latest releases:

- May 8th, 2018: Spark 2.4.3
- April 23rd, 2019: Spark 2.4.2
- March 31st, 2019: Spark 2.4.1
- Feb 15th, 2019: Spark 2.3.3

Committers and PMC:

- The latest committer was added on Jan 29th, 2019 (Jose Torres).
- The latest PMC members were added on May 21st, 2019 (Jerry Shao,
  Takuya Ueshin and Hyukjin Kwon).