dongjoon-hyun commented on code in PR #608: URL: https://github.com/apache/spark-website/pull/608#discussion_r2105030076
########## releases/_posts/2025-05-23-spark-release-4-0-0.md: ########## @@ -0,0 +1,694 @@ +--- +layout: post +title: Spark Release 4.0.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: + _edit_last: '4' + _wpas_done_all: '1' +--- + +Apache Spark 4.0.0 marks a significant milestone as the inaugural release in the 4.x series, embodying the collective effort of the vibrant open-source community. This release is a testament to tremendous collaboration, resolving over 5100 tickets with contributions from more than 390 individuals. + +Spark Connect continues its rapid advancement, delivering substantial improvements: +- A new lightweight Python client ([pyspark-client](https://pypi.org/project/pyspark-client)) at just 1.5 MB. +- Full API compatibility for the Java client. +- Greatly expanded API coverage. +- ML on Spark Connect. +- A new client implementation for [Swift](https://github.com/apache/spark-connect-swift). + +Spark SQL is significantly enriched with powerful new features designed to boost expressiveness and versatility for SQL workloads, such as VARIANT data type support, SQL user-defined functions, session variables, pipe syntax, and string collation. + +PySpark sees continuous dedication to both its functional breadth and the overall developer experience, bringing a native plotting API, a new Python Data Source API, support for Python UDTFs, and unified profiling for PySpark UDFs, alongside numerous other enhancements. + +Structured Streaming evolves with key additions that provide greater control and ease of debugging, notably the introduction of the Arbitrary State API v2 for more flexible state management and the State Data Source for easier debugging. + +To download Apache Spark 4.0.0, please visit the [downloads](https://spark.apache.org/downloads.html) page. For [detailed changes](https://issues.apache.org/jira/projects/SPARK/versions/12353359), you can consult JIRA. We have also curated a list of high-level changes here, grouped by major modules. + + +* This will become a table of contents (this text will be scraped). +{:toc} + + +### Core and Spark SQL Highlights + +- [[SPARK-45314]](https://issues.apache.org/jira/browse/SPARK-45314) Drop Scala 2.12 and make Scala 2.13 the default +- [[SPARK-45315]](https://issues.apache.org/jira/browse/SPARK-45315) Drop JDK 8/11 and make JDK 17 the default +- [[SPARK-45923]](https://issues.apache.org/jira/browse/SPARK-45923) Spark Kubernetes Operator +- [[SPARK-45869]](https://issues.apache.org/jira/browse/SPARK-45869) Revisit and improve Spark Standalone Cluster +- [[SPARK-42849]](https://issues.apache.org/jira/browse/SPARK-42849) Session Variables +- [[SPARK-44444]](https://issues.apache.org/jira/browse/SPARK-44444) Use ANSI SQL mode by default +- [[SPARK-46057]](https://issues.apache.org/jira/browse/SPARK-46057) Support SQL user-defined functions Review Comment: There are several open subtasks under the issue, @cloud-fan and @allisonwang-db . Could you move the remaining ones into other issues? I guess @allisonwang-db can create a new one or we can move it to under SPARK-51166 . - https://issues.apache.org/jira/browse/SPARK-46057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org