Hi, As the expected release date is close, I would like to volunteer as the release manager for Apache Spark 3.2.0.
Thanks, Gengliang On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan <[email protected]> wrote: > An update: we found a mistake that we picked the Spark 3.2 release date > based on the scheduled release date of 3.1. However, 3.1 was delayed and > released on March 2. In order to have a full 6 months development for 3.2, > the target release date for 3.2 should be September 2. > > I'm updating the release dates in > https://github.com/apache/spark-website/pull/331 > > Thanks, > Wenchen > > On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun <[email protected]> > wrote: > >> Thank you, Xiao, Wenchen and Hyukjin. >> >> Bests, >> Dongjoon. >> >> >> On Thu, Mar 11, 2021 at 2:15 AM Hyukjin Kwon <[email protected]> wrote: >> >>> Just for an update, I will send a discussion email about my idea late >>> this week or early next week. >>> >>> 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan <[email protected]>님이 작성: >>> >>>> There are many projects going on right now, such as new DS v2 APIs, >>>> ANSI interval types, join improvement, disaggregated shuffle, etc. I don't >>>> think it's realistic to do the branch cut in April. >>>> >>>> I'm +1 to release 3.2 around July, but it doesn't mean we have to cut >>>> the branch 3 months earlier. We should make the release process faster and >>>> cut the branch around June probably. >>>> >>>> >>>> >>>> On Thu, Mar 11, 2021 at 4:41 AM Xiao Li <[email protected]> wrote: >>>> >>>>> Below are some nice-to-have features we can work on in Spark 3.2: Lateral >>>>> Join support <https://issues.apache.org/jira/browse/SPARK-28379>, >>>>> interval data type, timestamp without time zone, un-nesting arbitrary >>>>> queries, the returned metrics of DSV2, and error message standardization. >>>>> Spark 3.2 will be another exciting release I believe! >>>>> >>>>> Go Spark! >>>>> >>>>> Xiao >>>>> >>>>> >>>>> >>>>> >>>>> Dongjoon Hyun <[email protected]> 于2021年3月10日周三 下午12:25写道: >>>>> >>>>>> Hi, Xiao. >>>>>> >>>>>> This thread started 13 days ago. Since you asked the community about >>>>>> major features or timelines at that time, could you share your roadmap or >>>>>> expectations if you have something in your mind? >>>>>> >>>>>> > Thank you, Dongjoon, for initiating this discussion. Let us keep it >>>>>> open. It might take 1-2 weeks to collect from the community all the >>>>>> features we plan to build and ship in 3.2 since we just finished the 3.1 >>>>>> voting. >>>>>> > TBH, cutting the branch this April does not look good to me. That >>>>>> means, we only have one month left for feature development of Spark 3.2. >>>>>> Do >>>>>> we have enough features in the current master branch? If not, are we able >>>>>> to finish major features we collected here? Do they have a timeline or >>>>>> project plan? >>>>>> >>>>>> Bests, >>>>>> Dongjoon. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 3, 2021 at 2:58 PM Dongjoon Hyun <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, John. >>>>>>> >>>>>>> This thread aims to share your expectations and goals (and maybe >>>>>>> work progress) to Apache Spark 3.2 because we are making this together. >>>>>>> :) >>>>>>> >>>>>>> Bests, >>>>>>> Dongjoon. >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 3, 2021 at 1:59 PM John Zhuge <[email protected]> wrote: >>>>>>> >>>>>>>> Hi Dongjoon, >>>>>>>> >>>>>>>> Is it possible to get ViewCatalog in? The community already had >>>>>>>> fairly detailed discussions. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> John >>>>>>>> >>>>>>>> On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, All. >>>>>>>>> >>>>>>>>> Since we have been preparing Apache Spark 3.2.0 in master branch >>>>>>>>> since December 2020, March seems to be a good time to share our >>>>>>>>> thoughts >>>>>>>>> and aspirations on Apache Spark 3.2. >>>>>>>>> >>>>>>>>> According to the progress on Apache Spark 3.1 release, Apache >>>>>>>>> Spark 3.2 seems to be the last minor release of this year. Given the >>>>>>>>> timeframe, we might consider the following. (This is a small set. >>>>>>>>> Please >>>>>>>>> add your thoughts to this limited list.) >>>>>>>>> >>>>>>>>> # Languages >>>>>>>>> >>>>>>>>> - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075 but >>>>>>>>> slipped out. Currently, we are trying to use Scala 2.13.5 via >>>>>>>>> SPARK-34505 >>>>>>>>> and investigating the publishing issue. Thank you for your >>>>>>>>> contributions >>>>>>>>> and feedback on this. >>>>>>>>> >>>>>>>>> - Java 17 LTS Support: Java 17 LTS will arrive in September 2017. >>>>>>>>> Like Java 11, we need lots of support from our dependencies. Let's >>>>>>>>> see. >>>>>>>>> >>>>>>>>> - Python 3.6 Deprecation(?): Python 3.6 community support ends at >>>>>>>>> 2021-12-23. So, the deprecation is not required yet, but we had better >>>>>>>>> prepare it because we don't have an ETA of Apache Spark 3.3 in 2022. >>>>>>>>> >>>>>>>>> - SparkR CRAN publishing: As we know, it's discontinued so far. >>>>>>>>> Resuming it depends on the success of Apache SparkR 3.1.1 CRAN >>>>>>>>> publishing. >>>>>>>>> If it succeeds to revive it, we can keep publishing. Otherwise, I >>>>>>>>> believe >>>>>>>>> we had better drop it from the releasing work item list officially. >>>>>>>>> >>>>>>>>> # Dependencies >>>>>>>>> >>>>>>>>> - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop >>>>>>>>> profile in Apache Spark 3.1. Currently, Spark master branch lives on >>>>>>>>> Hadoop >>>>>>>>> 3.2.2's shaded clients via SPARK-33212. So far, there is one on-going >>>>>>>>> report at YARN environment. We hope it will be fixed soon at Spark 3.2 >>>>>>>>> timeframe and we can move toward Hadoop 3.3.2. >>>>>>>>> >>>>>>>>> - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by default >>>>>>>>> instead of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 profile >>>>>>>>> completely >>>>>>>>> via SPARK-32981 and replaced the generated hive-service-rpc code with >>>>>>>>> the >>>>>>>>> official dependency via SPARK-32981. We are steadily improving this >>>>>>>>> area >>>>>>>>> and will consume Hive 2.3.9 if available. >>>>>>>>> >>>>>>>>> - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades >>>>>>>>> K8s client dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in >>>>>>>>> order >>>>>>>>> to support K8s model 1.19. >>>>>>>>> >>>>>>>>> - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is using >>>>>>>>> Kafka Client 2.6. For Spark 3.2, SPARK-33913 upgraded to Kafka 2.7 >>>>>>>>> with >>>>>>>>> Scala 2.12.13, but it was reverted later due to Scala 2.12.13 issue. >>>>>>>>> Since >>>>>>>>> KAFKA-12357 fixed the Scala requirement two days ago, Spark 3.2 will >>>>>>>>> go >>>>>>>>> with Kafka Client 2.8 hopefully. >>>>>>>>> >>>>>>>>> # Some Features >>>>>>>>> >>>>>>>>> - Data Source v2: Spark 3.2 will deliver much richer DSv2 with >>>>>>>>> Apache Iceberg integration. Especially, we hope the on-going function >>>>>>>>> catalog SPIP and up-coming storage partitioned join SPIP can be >>>>>>>>> delivered >>>>>>>>> as a part of Spark 3.2 and become an additional foundation. >>>>>>>>> >>>>>>>>> - Columnar Encryption: As of today, Apache Spark master branch >>>>>>>>> supports columnar encryption via Apache ORC 1.6 and it's documented >>>>>>>>> via >>>>>>>>> SPARK-34036. Also, upcoming Apache Parquet 1.12 has a similar >>>>>>>>> capability. >>>>>>>>> Hopefully, Apache Spark 3.2 is going to be the first release to have >>>>>>>>> this >>>>>>>>> feature officially. Any feedback is welcome. >>>>>>>>> >>>>>>>>> - Improved ZStandard Support: Spark 3.2 will bring more benefits >>>>>>>>> for ZStandard users: 1) SPARK-34340 added native ZSTD JNI buffer pool >>>>>>>>> support for all IO operations, 2) SPARK-33978 makes ORC datasource >>>>>>>>> support >>>>>>>>> ZSTD compression, 3) SPARK-34503 sets ZSTD as the default codec for >>>>>>>>> event >>>>>>>>> log compression, 4) SPARK-34479 aims to support ZSTD at Avro data >>>>>>>>> source. >>>>>>>>> Also, the upcoming Parquet 1.12 supports ZSTD (and supports JNI buffer >>>>>>>>> pool), too. I'm expecting more benefits. >>>>>>>>> >>>>>>>>> - Structure Streaming with RocksDB backend: According to the >>>>>>>>> latest update, it looks active enough for merging to master branch in >>>>>>>>> Spark >>>>>>>>> 3.2. >>>>>>>>> >>>>>>>>> Please share your thoughts and let's build better Apache Spark 3.2 >>>>>>>>> together. >>>>>>>>> >>>>>>>>> Bests, >>>>>>>>> Dongjoon. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> John Zhuge >>>>>>>> >>>>>>>
