Our board report is due on May 14th. Here’s a draft of what’s been happening,
feel free to provide comments:
======================
Description:
Apache Spark is a fast and general purpose engine for large-scale data
processing. It offers high-level APIs in Java, Scala, Python, R and SQL as well
as a rich set of libraries including stream processing, machine learning, and
graph analytics.
Issues for the board:
- None
Project Status:
- Release candidates for Spark 4.0 have been created and the release has
entered the voting stage.
- Four SPIPs were recently accepted:
1. Introduction of the time data type
2. Support for constraints in DataSource V2 (DSv2)
3. Declarative pipelines
4. Add geospatial types to Spark
- The following votes have successfully passed:
1. Publish an additional Spark distribution with Spark Connect enabled
2. Release Apache Spark 3.5.5, deprecating spark.databricks.*
configuration
3. Retain migration logic for incorrect spark.databricks.*
configurations in Spark 4.0.x
- The PySpark User Guide has been merged into the official Spark documentation
site: https://github.com/apache/spark/pull/50589
Latest Releases:
- Spark 3.5.5 was released on February 27, 2025
- Spark 3.5.4 was released on December 20, 2024
- Spark 3.4.4 was released on October 27, 2024
Committers and PMC:
- The latest committer was added on Nov 13, 2024 (Bingkun Pan).
- The latest PMC member was added on Jan 21st, 2025 (Jie Yang).
======================
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]