Apache Spark 2.4.8 (and EOL of 2.4)

Dongjoon Hyun Wed, 03 Mar 2021 10:05:09 -0800

Hi, All.

We successfully completed Apache Spark 3.1.1 and 3.0.2 releases and started
3.2.0 discussion already.


Let's talk about branch-2.4 because there exists some discussions on JIRA
and GitHub about skipping backporting to 2.4.

Since `branch-2.4` has been maintained well as LTS, I'd like to suggest
having Apache Spark 2.4.8 release as the official EOL release of 2.4 line
in order to focus on 3.x more from now. Please note that `branch-2.4` will
be frozen officially like `branch-2.3` after EOL release.

- Apache Spark 2.4.0 was released on November 2, 2018.
- Apache Spark 2.4.7 was released on September 12, 2020.
- Since v2.4.7 tag, `branch-2.4` has 134 commits including the following 12
correctness issues.

## CORRECTNESS ISSUE
SPARK-30201 HiveOutputWriter standardOI should use
ObjectInspectorCopyOption.DEFAULT
SPARK-30228 Update zstd-jni to 1.4.4-3
SPARK-30894 The nullability of Size function should not depend on
SQLConf.get
SPARK-32635 When pyspark.sql.functions.lit() function is used with
dataframe cache, it returns wrong result
SPARK-32908 percentile_approx() returns incorrect results
SPARK-33183 Bug in optimizer rule EliminateSorts
SPARK-33290 REFRESH TABLE should invalidate cache even though the table
itself may not be cached
SPARK-33593 Vector reader got incorrect data with binary partition value
SPARK-33726 Duplicate field names causes wrong answers during aggregation
SPARK-34187 Use available offset range obtained during polling when
checking offset validation
SPARK-34212 For parquet table, after changing the precision and scale of
decimal type in hive, spark reads incorrect value
SPARK-34229 Avro should read decimal values with the file schema

## SECURITY ISSUE
SPARK-33333 Upgrade Jetty to 9.4.28.v20200408
SPARK-33831 Update to jetty 9.4.34
SPARK-34449 Upgrade Jetty to fix CVE-2020-27218

What do you think about this?

Bests,
Dongjoon.

Apache Spark 2.4.8 (and EOL of 2.4)

Reply via email to