For Xiao's comment, I want to point out that Apache Spark 3.1.0 is different from 2.3 or 2.4.
Apache Spark 3.1.0 should be compared with Apache Spark 2.1.0. - Apache Spark 2.0.0 was released on July 26, 2016. - Apache Spark 2.1.0 was released on December 28, 2016. Bests, Dongjoon. On Sun, Oct 4, 2020 at 10:53 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Thank you all. > > BTW, Xiao and Mridul, I'm wondering what date you have in your mind > specifically. > > Usually, `Christmas and New Year season` doesn't give us much additional > time. > > If you think so, could you make a PR for Apache Spark website according > to your expectation? > > https://spark.apache.org/versioning-policy.html > > Bests, > Dongjoon. > > > On Sun, Oct 4, 2020 at 7:18 AM Mridul Muralidharan <mri...@gmail.com> > wrote: > >> >> +1 on pushing the branch cut for increased dev time to match previous >> releases. >> >> Regards, >> Mridul >> >> On Sat, Oct 3, 2020 at 10:22 PM Xiao Li <gatorsm...@gmail.com> wrote: >> >>> Thank you for your updates. >>> >>> Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date of >>> the 3.1 branch cut, the feature development time window is less than 5 >>> months. This is shorter than what we did in Spark 2.3 and 2.4 releases. >>> >>> Below are three highly desirable feature work I am watching. Hopefully, >>> we can finish them before the branch cut. >>> >>> - Support push-based shuffle to improve shuffle efficiency: >>> https://issues.apache.org/jira/browse/SPARK-30602 >>> - Unify create table syntax: >>> https://issues.apache.org/jira/browse/SPARK-31257 >>> - Bloom filter join: >>> https://issues.apache.org/jira/browse/SPARK-32268 >>> >>> Thanks, >>> >>> Xiao >>> >>> >>> Hyukjin Kwon <gurwls...@gmail.com> 于2020年10月3日周六 下午5:41写道: >>> >>>> Nice summary. Thanks Dongjoon. One minor correction -> I believe we >>>> dropped R 3.5 and below at branch 2.4 as well. >>>> >>>> On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun, <dongjoon.h...@gmail.com> >>>> wrote: >>>> >>>>> Hi, All. >>>>> >>>>> As of today, master branch (Apache Spark 3.1.0) resolved >>>>> 852+ JIRA issues and 606+ issues are 3.1.0-only patches. >>>>> According to the 3.1.0 release window, branch-3.1 will be >>>>> created on November 1st and enters QA period. >>>>> >>>>> Here are some notable updates I've been monitoring. >>>>> >>>>> *Language* >>>>> 01. SPARK-25075 Support Scala 2.13 >>>>> - Since SPARK-32926, Scala 2.13 build test has >>>>> become a part of GitHub Action jobs. >>>>> - After SPARK-33044, Scala 2.13 test will be >>>>> a part of Jenkins jobs. >>>>> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5 >>>>> 03. SPARK-32082 Project Zen: Improving Python usability >>>>> - 7 of 16 issues are resolved. >>>>> 04. SPARK-32073 Drop R < 3.5 support >>>>> - This is done for Spark 3.0.1 and 3.1.0. >>>>> >>>>> *Dependency* >>>>> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency >>>>> - This changes the default dist. for better cloud support >>>>> 06. SPARK-32981 Remove hive-1.2 distribution >>>>> 07. SPARK-20202 Remove references to org.spark-project.hive >>>>> - This will remove Hive 1.2.1 from source code >>>>> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP) >>>>> >>>>> *Core* >>>>> 09. SPARK-27495 Support Stage level resource conf and scheduling >>>>> - 11 of 15 issues are resolved >>>>> 10. SPARK-25299 Use remote storage for persisting shuffle data >>>>> - 8 of 14 issues are resolved >>>>> >>>>> *Resource Manager* >>>>> 11. SPARK-33005 Kubernetes GA preparation >>>>> - It is on the way and we are waiting for more feedback. >>>>> >>>>> *SQL* >>>>> 12. SPARK-30648/SPARK-32346 Support filters pushdown >>>>> to JSON/Avro >>>>> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer >>>>> 14. SPARK-12312 Support JDBC Kerberos w/ keytab >>>>> - 11 of 17 issues are resolved >>>>> 15. SPARK-27589 DSv2 was mostly completed in 3.0 >>>>> and added more features in 3.1 but still we missed >>>>> - All built-in DataSource v2 write paths are disabled >>>>> and v1 write is used instead. >>>>> - Support partition pruning with subqueries >>>>> - Support bucketing >>>>> >>>>> We still have one month before the feature freeze >>>>> and starting QA. If you are working for 3.1, >>>>> please consider the timeline and share your schedule >>>>> with the Apache Spark community. For the other stuff, >>>>> we can put it into 3.2 release scheduled in June 2021. >>>>> >>>>> Last not but least, I want to emphasize (7) once again. >>>>> We need to remove the forked unofficial Hive eventually. >>>>> Please let us know your reasons if you need to build >>>>> from Apache Spark 3.1 source code for Hive 1.2. >>>>> >>>>> https://github.com/apache/spark/pull/29936 >>>>> >>>>> As I wrote in the above PR description, for old releases, >>>>> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide >>>>> Hive 1.2-based distribution. >>>>> >>>>> Bests, >>>>> Dongjoon. >>>>> >>>>