Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

Mridul Muralidharan Sun, 04 Oct 2020 07:19:22 -0700

+1 on pushing the branch cut for increased dev time to match previous
releases.


Regards,
Mridul

On Sat, Oct 3, 2020 at 10:22 PM Xiao Li <gatorsm...@gmail.com> wrote:

> Thank you for your updates.
>
> Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date of
> the 3.1 branch cut, the feature development time window is less than 5
> months. This is shorter than what we did in Spark 2.3 and 2.4 releases.
>
> Below are three highly desirable feature work I am watching. Hopefully, we
> can finish them before the branch cut.
>
>    - Support push-based shuffle to improve shuffle efficiency:
>    https://issues.apache.org/jira/browse/SPARK-30602
>    - Unify create table syntax:
>    https://issues.apache.org/jira/browse/SPARK-31257
>    - Bloom filter join: https://issues.apache.org/jira/browse/SPARK-32268
>
> Thanks,
>
> Xiao
>
>
> Hyukjin Kwon <gurwls...@gmail.com> 于2020年10月3日周六 下午5:41写道：
>
>> Nice summary. Thanks Dongjoon. One minor correction -> I believe we
>> dropped R 3.5 and below at branch 2.4 as well.
>>
>> On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun, <dongjoon.h...@gmail.com> wrote:
>>
>>> Hi, All.
>>>
>>> As of today, master branch (Apache Spark 3.1.0) resolved
>>> 852+ JIRA issues and 606+ issues are 3.1.0-only patches.
>>> According to the 3.1.0 release window, branch-3.1 will be
>>> created on November 1st and enters QA period.
>>>
>>> Here are some notable updates I've been monitoring.
>>>
>>> *Language*
>>> 01. SPARK-25075 Support Scala 2.13
>>>       - Since SPARK-32926, Scala 2.13 build test has
>>>         become a part of GitHub Action jobs.
>>>       - After SPARK-33044, Scala 2.13 test will be
>>>         a part of Jenkins jobs.
>>> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5
>>> 03. SPARK-32082 Project Zen: Improving Python usability
>>>       - 7 of 16 issues are resolved.
>>> 04. SPARK-32073 Drop R < 3.5 support
>>>       - This is done for Spark 3.0.1 and 3.1.0.
>>>
>>> *Dependency*
>>> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency
>>>       - This changes the default dist. for better cloud support
>>> 06. SPARK-32981 Remove hive-1.2 distribution
>>> 07. SPARK-20202 Remove references to org.spark-project.hive
>>>       - This will remove Hive 1.2.1 from source code
>>> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP)
>>>
>>> *Core*
>>> 09. SPARK-27495 Support Stage level resource conf and scheduling
>>>       - 11 of 15 issues are resolved
>>> 10. SPARK-25299 Use remote storage for persisting shuffle data
>>>       - 8 of 14 issues are resolved
>>>
>>> *Resource Manager*
>>> 11. SPARK-33005 Kubernetes GA preparation
>>>       - It is on the way and we are waiting for more feedback.
>>>
>>> *SQL*
>>> 12. SPARK-30648/SPARK-32346 Support filters pushdown
>>>       to JSON/Avro
>>> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer
>>> 14. SPARK-12312 Support JDBC Kerberos w/ keytab
>>>       - 11 of 17 issues are resolved
>>> 15. SPARK-27589 DSv2 was mostly completed in 3.0
>>>       and added more features in 3.1 but still we missed
>>>       - All built-in DataSource v2 write paths are disabled
>>>         and v1 write is used instead.
>>>       - Support partition pruning with subqueries
>>>       - Support bucketing
>>>
>>> We still have one month before the feature freeze
>>> and starting QA. If you are working for 3.1,
>>> please consider the timeline and share your schedule
>>> with the Apache Spark community. For the other stuff,
>>> we can put it into 3.2 release scheduled in June 2021.
>>>
>>> Last not but least, I want to emphasize (7) once again.
>>> We need to remove the forked unofficial Hive eventually.
>>> Please let us know your reasons if you need to build
>>> from Apache Spark 3.1 source code for Hive 1.2.
>>>
>>> https://github.com/apache/spark/pull/29936
>>>
>>> As I wrote in the above PR description, for old releases,
>>> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide
>>> Hive 1.2-based distribution.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

Reply via email to