+1 on pushing the branch cut for increased dev time to match previous releases.
Regards, Mridul On Sat, Oct 3, 2020 at 10:22 PM Xiao Li <gatorsm...@gmail.com> wrote: > Thank you for your updates. > > Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date of > the 3.1 branch cut, the feature development time window is less than 5 > months. This is shorter than what we did in Spark 2.3 and 2.4 releases. > > Below are three highly desirable feature work I am watching. Hopefully, we > can finish them before the branch cut. > > - Support push-based shuffle to improve shuffle efficiency: > https://issues.apache.org/jira/browse/SPARK-30602 > - Unify create table syntax: > https://issues.apache.org/jira/browse/SPARK-31257 > - Bloom filter join: https://issues.apache.org/jira/browse/SPARK-32268 > > Thanks, > > Xiao > > > Hyukjin Kwon <gurwls...@gmail.com> 于2020年10月3日周六 下午5:41写道: > >> Nice summary. Thanks Dongjoon. One minor correction -> I believe we >> dropped R 3.5 and below at branch 2.4 as well. >> >> On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun, <dongjoon.h...@gmail.com> wrote: >> >>> Hi, All. >>> >>> As of today, master branch (Apache Spark 3.1.0) resolved >>> 852+ JIRA issues and 606+ issues are 3.1.0-only patches. >>> According to the 3.1.0 release window, branch-3.1 will be >>> created on November 1st and enters QA period. >>> >>> Here are some notable updates I've been monitoring. >>> >>> *Language* >>> 01. SPARK-25075 Support Scala 2.13 >>> - Since SPARK-32926, Scala 2.13 build test has >>> become a part of GitHub Action jobs. >>> - After SPARK-33044, Scala 2.13 test will be >>> a part of Jenkins jobs. >>> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5 >>> 03. SPARK-32082 Project Zen: Improving Python usability >>> - 7 of 16 issues are resolved. >>> 04. SPARK-32073 Drop R < 3.5 support >>> - This is done for Spark 3.0.1 and 3.1.0. >>> >>> *Dependency* >>> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency >>> - This changes the default dist. for better cloud support >>> 06. SPARK-32981 Remove hive-1.2 distribution >>> 07. SPARK-20202 Remove references to org.spark-project.hive >>> - This will remove Hive 1.2.1 from source code >>> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP) >>> >>> *Core* >>> 09. SPARK-27495 Support Stage level resource conf and scheduling >>> - 11 of 15 issues are resolved >>> 10. SPARK-25299 Use remote storage for persisting shuffle data >>> - 8 of 14 issues are resolved >>> >>> *Resource Manager* >>> 11. SPARK-33005 Kubernetes GA preparation >>> - It is on the way and we are waiting for more feedback. >>> >>> *SQL* >>> 12. SPARK-30648/SPARK-32346 Support filters pushdown >>> to JSON/Avro >>> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer >>> 14. SPARK-12312 Support JDBC Kerberos w/ keytab >>> - 11 of 17 issues are resolved >>> 15. SPARK-27589 DSv2 was mostly completed in 3.0 >>> and added more features in 3.1 but still we missed >>> - All built-in DataSource v2 write paths are disabled >>> and v1 write is used instead. >>> - Support partition pruning with subqueries >>> - Support bucketing >>> >>> We still have one month before the feature freeze >>> and starting QA. If you are working for 3.1, >>> please consider the timeline and share your schedule >>> with the Apache Spark community. For the other stuff, >>> we can put it into 3.2 release scheduled in June 2021. >>> >>> Last not but least, I want to emphasize (7) once again. >>> We need to remove the forked unofficial Hive eventually. >>> Please let us know your reasons if you need to build >>> from Apache Spark 3.1 source code for Hive 1.2. >>> >>> https://github.com/apache/spark/pull/29936 >>> >>> As I wrote in the above PR description, for old releases, >>> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide >>> Hive 1.2-based distribution. >>> >>> Bests, >>> Dongjoon. >>> >>