Thank you so much for your feedback, Koert.

Yes, SPARK-20202 was created in April 2017
and targeted for 3.1.0 since Nov 2019.

However, I believe Apache Spark 3.1.0 (Hadoop 3.2/Hive 2.3 distribution)
will work with old Hadoop 2.x clusters
if you isolated the classpath via SPARK-31960.

SPARK-31960 Only populate Hadoop classpath for no-hadoop build

Could you try with snapshot build?

Bests,
Dongjoon.




On Wed, Oct 7, 2020 at 3:24 PM Koert Kuipers <ko...@tresata.com> wrote:

> it seems to me with SPARK-20202 we are no longer planning to support
> hadoop2 + hive 1.2. is that correct?
>
> so basically spark 3.1 will no longer run on say CDH 5.x or HDP2.x with
> hive?
>
> my use case is building spark 3.1 and launching on these existing
> clusters that are not managed by me. e.g. i do not use the spark version
> provided by cloudera.
> however there are workarounds for me (using older spark version to extract
> out of hive, then switch to newer spark version) so i am not too worried
> about this. just making sure i understand.
>
> thanks
>
> On Sat, Oct 3, 2020 at 8:17 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Hi, All.
>>
>> As of today, master branch (Apache Spark 3.1.0) resolved
>> 852+ JIRA issues and 606+ issues are 3.1.0-only patches.
>> According to the 3.1.0 release window, branch-3.1 will be
>> created on November 1st and enters QA period.
>>
>> Here are some notable updates I've been monitoring.
>>
>> *Language*
>> 01. SPARK-25075 Support Scala 2.13
>>       - Since SPARK-32926, Scala 2.13 build test has
>>         become a part of GitHub Action jobs.
>>       - After SPARK-33044, Scala 2.13 test will be
>>         a part of Jenkins jobs.
>> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5
>> 03. SPARK-32082 Project Zen: Improving Python usability
>>       - 7 of 16 issues are resolved.
>> 04. SPARK-32073 Drop R < 3.5 support
>>       - This is done for Spark 3.0.1 and 3.1.0.
>>
>> *Dependency*
>> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency
>>       - This changes the default dist. for better cloud support
>> 06. SPARK-32981 Remove hive-1.2 distribution
>> 07. SPARK-20202 Remove references to org.spark-project.hive
>>       - This will remove Hive 1.2.1 from source code
>> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP)
>>
>> *Core*
>> 09. SPARK-27495 Support Stage level resource conf and scheduling
>>       - 11 of 15 issues are resolved
>> 10. SPARK-25299 Use remote storage for persisting shuffle data
>>       - 8 of 14 issues are resolved
>>
>> *Resource Manager*
>> 11. SPARK-33005 Kubernetes GA preparation
>>       - It is on the way and we are waiting for more feedback.
>>
>> *SQL*
>> 12. SPARK-30648/SPARK-32346 Support filters pushdown
>>       to JSON/Avro
>> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer
>> 14. SPARK-12312 Support JDBC Kerberos w/ keytab
>>       - 11 of 17 issues are resolved
>> 15. SPARK-27589 DSv2 was mostly completed in 3.0
>>       and added more features in 3.1 but still we missed
>>       - All built-in DataSource v2 write paths are disabled
>>         and v1 write is used instead.
>>       - Support partition pruning with subqueries
>>       - Support bucketing
>>
>> We still have one month before the feature freeze
>> and starting QA. If you are working for 3.1,
>> please consider the timeline and share your schedule
>> with the Apache Spark community. For the other stuff,
>> we can put it into 3.2 release scheduled in June 2021.
>>
>> Last not but least, I want to emphasize (7) once again.
>> We need to remove the forked unofficial Hive eventually.
>> Please let us know your reasons if you need to build
>> from Apache Spark 3.1 source code for Hive 1.2.
>>
>> https://github.com/apache/spark/pull/29936
>>
>> As I wrote in the above PR description, for old releases,
>> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide
>> Hive 1.2-based distribution.
>>
>> Bests,
>> Dongjoon.
>>
>

Reply via email to