- Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark 3.2 via SPARK-29250 today. We are observing big improvements in S3 use cases. Please try it and share your experience.
It should be Apache Hadoop 3.3.1 [1]. : ) Note that Apache hadoop 3.3.0 is the first Hadoop release including x86 and aarch64, and 3.3.1 also. Very happy to see 3.3.1 can be the default dependency of Spark 3.2.0. [1] https://hadoop.apache.org/release/3.3.1.html Regards, Yikun Dongjoon Hyun <dongj...@apache.org> 于2021年6月17日周四 上午5:58写道: > This is a continuation of the previous thread, `Apache Spark 3.2 > Expectation`, in order to give you updates. > > - > https://lists.apache.org/thread.html/r61897da071729913bf586ddd769311ce8b5b068e7156c352b51f7a33%40%3Cdev.spark.apache.org%3E > > First of all, the AS-IS schedule is here > > - https://spark.apache.org/versioning-policy.html > > July 1st Code freeze. Release branch cut. > Mid July QA period. Focus on bug fixes, tests, stability and docs. > Generally, no new features merged. > August Release candidates (RC), voting, etc. until final release passes > > Second, Gengliang Wang volunteered as a release manager and started to > work as a release manager. Thank you! He shared the on-going issues and I > want to piggy-back the followings to his list. > > > # Languages > > - Scala 2.13 Support: Although SPARK-25075 is almost done and we have > Scala 2.13 Jenkins job on master branch, we do not support Scala 2.13.6. We > should document it if Scala 2.13.7 is not arrived on time. > Please see https://github.com/scala/scala/pull/9641 (Milestone Scala > 2.13.7). > > - SparkR CRAN publishing: Apache SparkR 3.1.2 is in CRAN as of today, but > we get policy violation warnings for cache directory. The fix deadline is > 2021-06-28. If that's going to be removed again, we need to retry via > Apache Spark 3.2.0 after making some fix. > https://cran.r-project.org/web/packages/SparkR/index.html > > > # Dependencies > > - Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark > 3.2 via SPARK-29250 today. We are observing big improvements in S3 use > cases. Please try it and share your experience. > > - Apache Hive 2.3.9 becomes the built-in Hive library with more HMS > compatibility fixes recently. We need re-evaluate the previous HMS > incompatibility reports. > > - K8s 1.21 is released May 12th. K8s Client 5.4.1 supports it in Apache > Spark 3.2. In addition, public cloud vendors start to support K8s 1.20. > Please note that this is a breaking K8s API change from K8s Client 4.x to > 5.x. > > - SPARK-33913 upgraded Apache Kafka Client dependency to 2.8.0 and Kafka > community is considering the deprecation of Scala 2.12 support at Apache > Kafka 3.0. > > - SPARK-34542 upgraded Apache Parquet dependency to 1.12.0. However, we > need SPARK-34859 to fix column index issue before release. In addition, > Apache Parquet encryption is added as a developer API. Custom KMS client > should be implemented. > > - SPARK-35489 upgraded Apache ORC dependency to 1.6.8. We still need > ORC-804 for better masking feature additionally. > > - SPARK-34651 improved ZStandard support with ZStandard 1.4.9 and we are > currently evaluating newly arrived ZStandard 1.5.0 additionally. Currently, > JDK11 performance is under investigation. In addition, SPARK-35181 (Use > zstd for spark.io.compression.codec by default) is still on the way > seperately. > > > # Newly arrived items > > - SPARK-35779 Dynamic filtering for Data Source V2 > > - SPARK-35781 Support Spark on Apple Silicon on macOS natively > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >