Re: Apache Spark 3.2 Expectation

Gengliang Wang Tue, 15 Jun 2021 00:17:36 -0700

Hi,

As the expected release date is close,  I would like to volunteer as the
release manager for Apache Spark 3.2.0.


Thanks,
Gengliang

On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan <cloud0...@gmail.com> wrote:

> An update: we found a mistake that we picked the Spark 3.2 release date
> based on the scheduled release date of 3.1. However, 3.1 was delayed and
> released on March 2. In order to have a full 6 months development for 3.2,
> the target release date for 3.2 should be September 2.
>
> I'm updating the release dates in
> https://github.com/apache/spark-website/pull/331
>
> Thanks,
> Wenchen
>
> On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Thank you, Xiao, Wenchen and Hyukjin.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Mar 11, 2021 at 2:15 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>
>>> Just for an update, I will send a discussion email about my idea late
>>> this week or early next week.
>>>
>>> 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan <cloud0...@gmail.com>님이 작성:
>>>
>>>> There are many projects going on right now, such as new DS v2 APIs,
>>>> ANSI interval types, join improvement, disaggregated shuffle, etc. I don't
>>>> think it's realistic to do the branch cut in April.
>>>>
>>>> I'm +1 to release 3.2 around July, but it doesn't mean we have to cut
>>>> the branch 3 months earlier. We should make the release process faster and
>>>> cut the branch around June probably.
>>>>
>>>>
>>>>
>>>> On Thu, Mar 11, 2021 at 4:41 AM Xiao Li <gatorsm...@gmail.com> wrote:
>>>>
>>>>> Below are some nice-to-have features we can work on in Spark 3.2: Lateral
>>>>> Join support <https://issues.apache.org/jira/browse/SPARK-28379>,
>>>>> interval data type, timestamp without time zone, un-nesting arbitrary
>>>>> queries, the returned metrics of DSV2, and error message standardization.
>>>>> Spark 3.2 will be another exciting release I believe!
>>>>>
>>>>> Go Spark!
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2021年3月10日周三 下午12:25写道：
>>>>>
>>>>>> Hi, Xiao.
>>>>>>
>>>>>> This thread started 13 days ago. Since you asked the community about
>>>>>> major features or timelines at that time, could you share your roadmap or
>>>>>> expectations if you have something in your mind?
>>>>>>
>>>>>> > Thank you, Dongjoon, for initiating this discussion. Let us keep it
>>>>>> open. It might take 1-2 weeks to collect from the community all the
>>>>>> features we plan to build and ship in 3.2 since we just finished the 3.1
>>>>>> voting.
>>>>>> > TBH, cutting the branch this April does not look good to me. That
>>>>>> means, we only have one month left for feature development of Spark 3.2. 
>>>>>> Do
>>>>>> we have enough features in the current master branch? If not, are we able
>>>>>> to finish major features we collected here? Do they have a timeline or
>>>>>> project plan?
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 3, 2021 at 2:58 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, John.
>>>>>>>
>>>>>>> This thread aims to share your expectations and goals (and maybe
>>>>>>> work progress) to Apache Spark 3.2 because we are making this together. 
>>>>>>> :)
>>>>>>>
>>>>>>> Bests,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 3, 2021 at 1:59 PM John Zhuge <jzh...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Dongjoon,
>>>>>>>>
>>>>>>>> Is it possible to get ViewCatalog in? The community already had
>>>>>>>> fairly detailed discussions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> John
>>>>>>>>
>>>>>>>> On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun <
>>>>>>>> dongjoon.h...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi, All.
>>>>>>>>>
>>>>>>>>> Since we have been preparing Apache Spark 3.2.0 in master branch
>>>>>>>>> since December 2020, March seems to be a good time to share our 
>>>>>>>>> thoughts
>>>>>>>>> and aspirations on Apache Spark 3.2.
>>>>>>>>>
>>>>>>>>> According to the progress on Apache Spark 3.1 release, Apache
>>>>>>>>> Spark 3.2 seems to be the last minor release of this year. Given the
>>>>>>>>> timeframe, we might consider the following. (This is a small set. 
>>>>>>>>> Please
>>>>>>>>> add your thoughts to this limited list.)
>>>>>>>>>
>>>>>>>>> # Languages
>>>>>>>>>
>>>>>>>>> - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075 but
>>>>>>>>> slipped out. Currently, we are trying to use Scala 2.13.5 via 
>>>>>>>>> SPARK-34505
>>>>>>>>> and investigating the publishing issue. Thank you for your 
>>>>>>>>> contributions
>>>>>>>>> and feedback on this.
>>>>>>>>>
>>>>>>>>> - Java 17 LTS Support: Java 17 LTS will arrive in September 2017.
>>>>>>>>> Like Java 11, we need lots of support from our dependencies. Let's 
>>>>>>>>> see.
>>>>>>>>>
>>>>>>>>> - Python 3.6 Deprecation(?): Python 3.6 community support ends at
>>>>>>>>> 2021-12-23. So, the deprecation is not required yet, but we had better
>>>>>>>>> prepare it because we don't have an ETA of Apache Spark 3.3 in 2022.
>>>>>>>>>
>>>>>>>>> - SparkR CRAN publishing: As we know, it's discontinued so far.
>>>>>>>>> Resuming it depends on the success of Apache SparkR 3.1.1 CRAN 
>>>>>>>>> publishing.
>>>>>>>>> If it succeeds to revive it, we can keep publishing. Otherwise, I 
>>>>>>>>> believe
>>>>>>>>> we had better drop it from the releasing work item list officially.
>>>>>>>>>
>>>>>>>>> # Dependencies
>>>>>>>>>
>>>>>>>>> - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop
>>>>>>>>> profile in Apache Spark 3.1. Currently, Spark master branch lives on 
>>>>>>>>> Hadoop
>>>>>>>>> 3.2.2's shaded clients via SPARK-33212. So far, there is one on-going
>>>>>>>>> report at YARN environment. We hope it will be fixed soon at Spark 3.2
>>>>>>>>> timeframe and we can move toward Hadoop 3.3.2.
>>>>>>>>>
>>>>>>>>> - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by default
>>>>>>>>> instead of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 profile 
>>>>>>>>> completely
>>>>>>>>> via SPARK-32981 and replaced the generated hive-service-rpc code with 
>>>>>>>>> the
>>>>>>>>> official dependency via SPARK-32981. We are steadily improving this 
>>>>>>>>> area
>>>>>>>>> and will consume Hive 2.3.9 if available.
>>>>>>>>>
>>>>>>>>> - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades
>>>>>>>>> K8s client dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in 
>>>>>>>>> order
>>>>>>>>> to support K8s model 1.19.
>>>>>>>>>
>>>>>>>>> - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is using
>>>>>>>>> Kafka Client 2.6. For Spark 3.2, SPARK-33913 upgraded to Kafka 2.7 
>>>>>>>>> with
>>>>>>>>> Scala 2.12.13, but it was reverted later due to Scala 2.12.13 issue. 
>>>>>>>>> Since
>>>>>>>>> KAFKA-12357 fixed the Scala requirement two days ago, Spark 3.2 will 
>>>>>>>>> go
>>>>>>>>> with Kafka Client 2.8 hopefully.
>>>>>>>>>
>>>>>>>>> # Some Features
>>>>>>>>>
>>>>>>>>> - Data Source v2: Spark 3.2 will deliver much richer DSv2 with
>>>>>>>>> Apache Iceberg integration. Especially, we hope the on-going function
>>>>>>>>> catalog SPIP and up-coming storage partitioned join SPIP can be 
>>>>>>>>> delivered
>>>>>>>>> as a part of Spark 3.2 and become an additional foundation.
>>>>>>>>>
>>>>>>>>> - Columnar Encryption: As of today, Apache Spark master branch
>>>>>>>>> supports columnar encryption via Apache ORC 1.6 and it's documented 
>>>>>>>>> via
>>>>>>>>> SPARK-34036. Also, upcoming Apache Parquet 1.12 has a similar 
>>>>>>>>> capability.
>>>>>>>>> Hopefully, Apache Spark 3.2 is going to be the first release to have 
>>>>>>>>> this
>>>>>>>>> feature officially. Any feedback is welcome.
>>>>>>>>>
>>>>>>>>> - Improved ZStandard Support: Spark 3.2 will bring more benefits
>>>>>>>>> for ZStandard users: 1) SPARK-34340 added native ZSTD JNI buffer pool
>>>>>>>>> support for all IO operations, 2) SPARK-33978 makes ORC datasource 
>>>>>>>>> support
>>>>>>>>> ZSTD compression, 3) SPARK-34503 sets ZSTD as the default codec for 
>>>>>>>>> event
>>>>>>>>> log compression, 4) SPARK-34479 aims to support ZSTD at Avro data 
>>>>>>>>> source.
>>>>>>>>> Also, the upcoming Parquet 1.12 supports ZSTD (and supports JNI buffer
>>>>>>>>> pool), too. I'm expecting more benefits.
>>>>>>>>>
>>>>>>>>> - Structure Streaming with RocksDB backend: According to the
>>>>>>>>> latest update, it looks active enough for merging to master branch in 
>>>>>>>>> Spark
>>>>>>>>> 3.2.
>>>>>>>>>
>>>>>>>>> Please share your thoughts and let's build better Apache Spark 3.2
>>>>>>>>> together.
>>>>>>>>>
>>>>>>>>> Bests,
>>>>>>>>> Dongjoon.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> John Zhuge
>>>>>>>>
>>>>>>>

Re: Apache Spark 3.2 Expectation

Reply via email to