Re: Apache Spark 2.4.8 (and EOL of 2.4)

Sean Owen Wed, 03 Mar 2021 10:55:38 -0800

For reference, 2.3.x was maintained from February 2018 (2.3.0) to Sep 2019
(2.3.4), or about 19 months. The 2.4 branch should probably be maintained
longer than that, as the final 2.x branch. 2.4.0 was released in Nov 2018.
A final release in, say, April 2021 would be about 30 months. That feels
about right timing-wise.


We should in any event release 2.4.8, yes. We can of course choose to
release a 2.4.9 if some critical issue is found, later.

But yeah based on the velocity of back-ports to 2.4.x, it seems about time
to call it EOL.

Sean


On Wed, Mar 3, 2021 at 12:05 PM Dongjoon Hyun <[email protected]>
wrote:

> Hi, All.
>
> We successfully completed Apache Spark 3.1.1 and 3.0.2 releases and
> started 3.2.0 discussion already.
>
> Let's talk about branch-2.4 because there exists some discussions on JIRA
> and GitHub about skipping backporting to 2.4.
>
> Since `branch-2.4` has been maintained well as LTS, I'd like to suggest
> having Apache Spark 2.4.8 release as the official EOL release of 2.4 line
> in order to focus on 3.x more from now. Please note that `branch-2.4` will
> be frozen officially like `branch-2.3` after EOL release.
>
> - Apache Spark 2.4.0 was released on November 2, 2018.
> - Apache Spark 2.4.7 was released on September 12, 2020.
> - Since v2.4.7 tag, `branch-2.4` has 134 commits including the following
> 12 correctness issues.
>
> ## CORRECTNESS ISSUE
> SPARK-30201 HiveOutputWriter standardOI should use
> ObjectInspectorCopyOption.DEFAULT
> SPARK-30228 Update zstd-jni to 1.4.4-3
> SPARK-30894 The nullability of Size function should not depend on
> SQLConf.get
> SPARK-32635 When pyspark.sql.functions.lit() function is used with
> dataframe cache, it returns wrong result
> SPARK-32908 percentile_approx() returns incorrect results
> SPARK-33183 Bug in optimizer rule EliminateSorts
> SPARK-33290 REFRESH TABLE should invalidate cache even though the table
> itself may not be cached
> SPARK-33593 Vector reader got incorrect data with binary partition value
> SPARK-33726 Duplicate field names causes wrong answers during aggregation
> SPARK-34187 Use available offset range obtained during polling when
> checking offset validation
> SPARK-34212 For parquet table, after changing the precision and scale of
> decimal type in hive, spark reads incorrect value
> SPARK-34229 Avro should read decimal values with the file schema
>
> ## SECURITY ISSUE
> SPARK-33333 Upgrade Jetty to 9.4.28.v20200408
> SPARK-33831 Update to jetty 9.4.34
> SPARK-34449 Upgrade Jetty to fix CVE-2020-27218
>
> What do you think about this?
>
> Bests,
> Dongjoon.
>

Re: Apache Spark 2.4.8 (and EOL of 2.4)

Reply via email to