Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-07 Thread Dongjoon Hyun
Thank you so much for your feedback, Koert. Yes, SPARK-20202 was created in April 2017 and targeted for 3.1.0 since Nov 2019. However, I believe Apache Spark 3.1.0 (Hadoop 3.2/Hive 2.3 distribution) will work with old Hadoop 2.x clusters if you isolated the classpath via SPARK-31960.

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-07 Thread Koert Kuipers
it seems to me with SPARK-20202 we are no longer planning to support hadoop2 + hive 1.2. is that correct? so basically spark 3.1 will no longer run on say CDH 5.x or HDP2.x with hive? my use case is building spark 3.1 and launching on these existing clusters that are not managed by me. e.g. i do

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Xiao Li
As pointed out by Dongjoon, the 2nd half of December is the holiday season in most countries. If we do the code freeze in mid November and release the first RC in mid December. I am afraid the community will not be active to verify the release candidates during the holiday season. Normally, the RC

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Xiao Li
> > Apache Spark 3.1.0 should be compared with Apache Spark 2.1.0. I think we made a change in release cadence since Spark 2.3. See the commit: https://github.com/apache/spark-website/commit/88990968962e5cc47db8bc2c11a50742d2438daa Thus, Spark 3.1 might just follow the release cadence of Spark

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Dongjoon Hyun
For Xiao's comment, I want to point out that Apache Spark 3.1.0 is different from 2.3 or 2.4. Apache Spark 3.1.0 should be compared with Apache Spark 2.1.0. - Apache Spark 2.0.0 was released on July 26, 2016. - Apache Spark 2.1.0 was released on December 28, 2016. Bests, Dongjoon. On Sun, Oct

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Dongjoon Hyun
Thank you all. BTW, Xiao and Mridul, I'm wondering what date you have in your mind specifically. Usually, `Christmas and New Year season` doesn't give us much additional time. If you think so, could you make a PR for Apache Spark website according to your expectation?

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Mridul Muralidharan
+1 on pushing the branch cut for increased dev time to match previous releases. Regards, Mridul On Sat, Oct 3, 2020 at 10:22 PM Xiao Li wrote: > Thank you for your updates. > > Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date of > the 3.1 branch cut, the feature

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-03 Thread Xiao Li
Thank you for your updates. Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date of the 3.1 branch cut, the feature development time window is less than 5 months. This is shorter than what we did in Spark 2.3 and 2.4 releases. Below are three highly desirable feature work I am

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-03 Thread Hyukjin Kwon
Nice summary. Thanks Dongjoon. One minor correction -> I believe we dropped R 3.5 and below at branch 2.4 as well. On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun, wrote: > Hi, All. > > As of today, master branch (Apache Spark 3.1.0) resolved > 852+ JIRA issues and 606+ issues are 3.1.0-only patches. >

Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-03 Thread Dongjoon Hyun
Hi, All. As of today, master branch (Apache Spark 3.1.0) resolved 852+ JIRA issues and 606+ issues are 3.1.0-only patches. According to the 3.1.0 release window, branch-3.1 will be created on November 1st and enters QA period. Here are some notable updates I've been monitoring. *Language* 01.