[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

GitBox Mon, 18 Oct 2021 02:09:52 -0700


gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730721947




##########
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##########
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning 
([SPARK-34119](https://issues.apache.org/jira/browse/SPARK-34119))
+  * Decouple bucket filter pruning and bucket table scan 
([SPARK-32985](https://issues.apache.org/jira/browse/SPARK-32985))
+* Query execution
+  * Adaptive query execution
+    * Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+    * Support Dynamic Partition Pruning (DPP) in AQE when the join is 
broadcast hash join at the beginning or there is no reused broadcast exchange 
([SPARK-34168](https://issues.apache.org/jira/browse/SPARK-34168), 
[SPARK-35710](https://issues.apache.org/jira/browse/SPARK-35710))
+    * Optimize skew join before coalescing shuffle partitions 
([SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447))
+    * Support AQE side shuffled hash join formula using rule 
([SPARK-35282](https://issues.apache.org/jira/browse/SPARK-35282))
+    * Support AQE side broadcast hash join threshold 
([SPARK-35264](https://issues.apache.org/jira/browse/SPARK-35264))
+    * Allow custom plugin for AQE cost evaluator 
([SPARK-35794](https://issues.apache.org/jira/browse/SPARK-35794))
+  * Enable Zstandard buffer pool by default 
([SPARK-34340](https://issues.apache.org/jira/browse/SPARK-34340), 
[SPARK-34390](https://issues.apache.org/jira/browse/SPARK-34390))
+  * Add code-gen for all join types of sort-merge join 
([SPARK-34705](https://issues.apache.org/jira/browse/SPARK-34705))
+  * Whole plan exchange and subquery reuse 
([SPARK-29375](https://issues.apache.org/jira/browse/SPARK-29375))
+  * Broadcast nested loop join improvement 
([SPARK-34706](https://issues.apache.org/jira/browse/SPARK-34706))
+  * Support two levels of hash maps for final hash aggregation 
([SPARK-35141](https://issues.apache.org/jira/browse/SPARK-35141))
+  * Allow concurrent writers for writing dynamic partitions and bucket table 
([SPARK-26164](https://issues.apache.org/jira/browse/SPARK-26164))
+  * Improve performance of processing FETCH_PRIOR in Spark Thrift server 
([SPARK-33655](https://issues.apache.org/jira/browse/SPARK-33655))
+
+**Connector Enhancements**
+
+* Parquet
+  * Upgrade Apache Parquet used to version 1.12.1 
([SPARK-36726](https://issues.apache.org/jira/browse/SPARK-36726))
+  * Support column index in Parquet vectorized reader 
([SPARK-34289](https://issues.apache.org/jira/browse/SPARK-34289))
+  * Add new parquet data source options to control datetime rebasing in read 
([SPARK-34377](https://issues.apache.org/jira/browse/SPARK-34377))
+  * Read parquet unsigned types that are stored as int32 physical type in 
parquet ([SPARK-34817](https://issues.apache.org/jira/browse/SPARK-34817))
+  * Read Parquet unsigned int64 logical type that stored as signed int64 
physical type to decimal(20, 0) 
([SPARK-34786](https://issues.apache.org/jira/browse/SPARK-34786))
+  * Handle column index when using vectorized Parquet reader 
([SPARK-34859](https://issues.apache.org/jira/browse/SPARK-34859))
+  * Improve Parquet In filter pushdown 
([SPARK-32792](https://issues.apache.org/jira/browse/SPARK-32792))
+* ORC
+  * Upgrade Apache ORC used to version 1.6.11 
([SPARK-36482](https://issues.apache.org/jira/browse/SPARK-36482))
+  * Support Apache ORC forced positional evolution 
([SPARK-32864](https://issues.apache.org/jira/browse/SPARK-32864))
+  * Support nested column in ORC vectorized reader 
([SPARK-34862](https://issues.apache.org/jira/browse/SPARK-34862))
+  * Support ZSTD, LZ4 compression in ORC data source 
([SPARK-33978](https://issues.apache.org/jira/browse/SPARK-33978), 
[SPARK-35612](https://issues.apache.org/jira/browse/SPARK-35612))
+  * Set the list of read columns in the task configuration to reduce reading 
of ORC data ([SPARK-35783](https://issues.apache.org/jira/browse/SPARK-35783))
+* Avro
+  * Upgrade Apache Avro used to version 1.10.2 
([SPARK-34778](https://issues.apache.org/jira/browse/SPARK-34778))
+  * Supporting Avro schema evolution for partitioned Hive tables with 
"avro.schema.literal" 
([SPARK-26836](https://issues.apache.org/jira/browse/SPARK-26836))
+  * Add new Avro datasource options to control datetime rebasing in read 
([SPARK-34404](https://issues.apache.org/jira/browse/SPARK-34404))
+  * Adding support for user provided schema url in Avro 
([SPARK-34416](https://issues.apache.org/jira/browse/SPARK-34416))
+  * Add support for positional Catalyst-to-Avro schema matching 
([SPARK-34365](https://issues.apache.org/jira/browse/SPARK-34365))
+* JSON
+  * Upgrade Jackson used to version 2.12.3 
([SPARK-35550](https://issues.apache.org/jira/browse/SPARK-35550))
+  * Allow JSON data sources to write non-ASCII characters as codepoints 
([SPARK-35047](https://issues.apache.org/jira/browse/SPARK-35047))
+* CSV
+  * Upgrade univocity-parsers to 2.9.1 
([SPARK-33940](https://issues.apache.org/jira/browse/SPARK-33940))
+* JDBC
+  * Represent JDBC Time type as Integer in milliseconds 
([SPARK-33888](https://issues.apache.org/jira/browse/SPARK-33888))

Review comment:
       Thanks, updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

Reply via email to