Hi, All. We successfully completed Apache Spark 3.1.1 and 3.0.2 releases and started 3.2.0 discussion already.
Let's talk about branch-2.4 because there exists some discussions on JIRA and GitHub about skipping backporting to 2.4. Since `branch-2.4` has been maintained well as LTS, I'd like to suggest having Apache Spark 2.4.8 release as the official EOL release of 2.4 line in order to focus on 3.x more from now. Please note that `branch-2.4` will be frozen officially like `branch-2.3` after EOL release. - Apache Spark 2.4.0 was released on November 2, 2018. - Apache Spark 2.4.7 was released on September 12, 2020. - Since v2.4.7 tag, `branch-2.4` has 134 commits including the following 12 correctness issues. ## CORRECTNESS ISSUE SPARK-30201 HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT SPARK-30228 Update zstd-jni to 1.4.4-3 SPARK-30894 The nullability of Size function should not depend on SQLConf.get SPARK-32635 When pyspark.sql.functions.lit() function is used with dataframe cache, it returns wrong result SPARK-32908 percentile_approx() returns incorrect results SPARK-33183 Bug in optimizer rule EliminateSorts SPARK-33290 REFRESH TABLE should invalidate cache even though the table itself may not be cached SPARK-33593 Vector reader got incorrect data with binary partition value SPARK-33726 Duplicate field names causes wrong answers during aggregation SPARK-34187 Use available offset range obtained during polling when checking offset validation SPARK-34212 For parquet table, after changing the precision and scale of decimal type in hive, spark reads incorrect value SPARK-34229 Avro should read decimal values with the file schema ## SECURITY ISSUE SPARK-33333 Upgrade Jetty to 9.4.28.v20200408 SPARK-33831 Update to jetty 9.4.34 SPARK-34449 Upgrade Jetty to fix CVE-2020-27218 What do you think about this? Bests, Dongjoon.