Re: UPDATE: Apache Spark 3.2 Release

Yikun Jiang Thu, 17 Jun 2021 02:04:16 -0700

- Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
3.2 via SPARK-29250 today. We are observing big improvements in S3 use
cases. Please try it and share your experience.


It should be  Apache Hadoop 3.3.1 [1]. : )

Note that Apache hadoop 3.3.0 is the first Hadoop release including x86 and
aarch64, and 3.3.1 also. Very happy to see 3.3.1 can be the default
dependency of Spark 3.2.0.

[1] https://hadoop.apache.org/release/3.3.1.html

Regards,
Yikun


Dongjoon Hyun <[email protected]> 于2021年6月17日周四 上午5:58写道：

> This is a continuation of the previous thread, `Apache Spark 3.2
> Expectation`, in order to give you updates.
>
> -
> https://lists.apache.org/thread.html/r61897da071729913bf586ddd769311ce8b5b068e7156c352b51f7a33%40%3Cdev.spark.apache.org%3E
>
> First of all, the AS-IS schedule is here
>
> - https://spark.apache.org/versioning-policy.html
>
>   July 1st Code freeze. Release branch cut.
>   Mid July QA period. Focus on bug fixes, tests, stability and docs.
> Generally, no new features merged.
>   August   Release candidates (RC), voting, etc. until final release passes
>
> Second, Gengliang Wang volunteered as a release manager and started to
> work as a release manager. Thank you! He shared the on-going issues and I
> want to piggy-back the followings to his list.
>
>
> # Languages
>
> - Scala 2.13 Support: Although SPARK-25075 is almost done and we have
> Scala 2.13 Jenkins job on master branch, we do not support Scala 2.13.6. We
> should document it if Scala 2.13.7 is not arrived on time.
>   Please see https://github.com/scala/scala/pull/9641 (Milestone Scala
> 2.13.7).
>
> - SparkR CRAN publishing: Apache SparkR 3.1.2 is in CRAN as of today, but
> we get policy violation warnings for cache directory. The fix deadline is
> 2021-06-28. If that's going to be removed again, we need to retry via
> Apache Spark 3.2.0 after making some fix.
>   https://cran.r-project.org/web/packages/SparkR/index.html
>
>
> # Dependencies
>
> - Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
> 3.2 via SPARK-29250 today. We are observing big improvements in S3 use
> cases. Please try it and share your experience.
>
> - Apache Hive 2.3.9 becomes the built-in Hive library with more HMS
> compatibility fixes recently. We need re-evaluate the previous HMS
> incompatibility reports.
>
> - K8s 1.21 is released May 12th. K8s Client 5.4.1 supports it in Apache
> Spark 3.2. In addition, public cloud vendors start to support K8s 1.20.
> Please note that this is a breaking K8s API change from K8s Client 4.x to
> 5.x.
>
> - SPARK-33913 upgraded Apache Kafka Client dependency to 2.8.0 and Kafka
> community is considering the deprecation of Scala 2.12 support at Apache
> Kafka 3.0.
>
> - SPARK-34542 upgraded Apache Parquet dependency to 1.12.0. However, we
> need SPARK-34859 to fix column index issue before release. In addition,
> Apache Parquet encryption is added as a developer API. Custom KMS client
> should be implemented.
>
> - SPARK-35489 upgraded Apache ORC dependency to 1.6.8. We still need
> ORC-804 for better masking feature additionally.
>
> - SPARK-34651 improved ZStandard support with ZStandard 1.4.9 and we are
> currently evaluating newly arrived ZStandard 1.5.0 additionally. Currently,
> JDK11 performance is under investigation. In addition, SPARK-35181 (Use
> zstd for spark.io.compression.codec by default) is still on the way
> seperately.
>
>
> # Newly arrived items
>
> - SPARK-35779 Dynamic filtering for Data Source V2
>
> - SPARK-35781 Support Spark on Apple Silicon on macOS natively
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: UPDATE: Apache Spark 3.2 Release

Reply via email to