Thank you so much for your feedback, Koert. Yes, SPARK-20202 was created in April 2017 and targeted for 3.1.0 since Nov 2019.
However, I believe Apache Spark 3.1.0 (Hadoop 3.2/Hive 2.3 distribution) will work with old Hadoop 2.x clusters if you isolated the classpath via SPARK-31960. SPARK-31960 Only populate Hadoop classpath for no-hadoop build Could you try with snapshot build? Bests, Dongjoon. On Wed, Oct 7, 2020 at 3:24 PM Koert Kuipers <ko...@tresata.com> wrote: > it seems to me with SPARK-20202 we are no longer planning to support > hadoop2 + hive 1.2. is that correct? > > so basically spark 3.1 will no longer run on say CDH 5.x or HDP2.x with > hive? > > my use case is building spark 3.1 and launching on these existing > clusters that are not managed by me. e.g. i do not use the spark version > provided by cloudera. > however there are workarounds for me (using older spark version to extract > out of hive, then switch to newer spark version) so i am not too worried > about this. just making sure i understand. > > thanks > > On Sat, Oct 3, 2020 at 8:17 PM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Hi, All. >> >> As of today, master branch (Apache Spark 3.1.0) resolved >> 852+ JIRA issues and 606+ issues are 3.1.0-only patches. >> According to the 3.1.0 release window, branch-3.1 will be >> created on November 1st and enters QA period. >> >> Here are some notable updates I've been monitoring. >> >> *Language* >> 01. SPARK-25075 Support Scala 2.13 >> - Since SPARK-32926, Scala 2.13 build test has >> become a part of GitHub Action jobs. >> - After SPARK-33044, Scala 2.13 test will be >> a part of Jenkins jobs. >> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5 >> 03. SPARK-32082 Project Zen: Improving Python usability >> - 7 of 16 issues are resolved. >> 04. SPARK-32073 Drop R < 3.5 support >> - This is done for Spark 3.0.1 and 3.1.0. >> >> *Dependency* >> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency >> - This changes the default dist. for better cloud support >> 06. SPARK-32981 Remove hive-1.2 distribution >> 07. SPARK-20202 Remove references to org.spark-project.hive >> - This will remove Hive 1.2.1 from source code >> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP) >> >> *Core* >> 09. SPARK-27495 Support Stage level resource conf and scheduling >> - 11 of 15 issues are resolved >> 10. SPARK-25299 Use remote storage for persisting shuffle data >> - 8 of 14 issues are resolved >> >> *Resource Manager* >> 11. SPARK-33005 Kubernetes GA preparation >> - It is on the way and we are waiting for more feedback. >> >> *SQL* >> 12. SPARK-30648/SPARK-32346 Support filters pushdown >> to JSON/Avro >> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer >> 14. SPARK-12312 Support JDBC Kerberos w/ keytab >> - 11 of 17 issues are resolved >> 15. SPARK-27589 DSv2 was mostly completed in 3.0 >> and added more features in 3.1 but still we missed >> - All built-in DataSource v2 write paths are disabled >> and v1 write is used instead. >> - Support partition pruning with subqueries >> - Support bucketing >> >> We still have one month before the feature freeze >> and starting QA. If you are working for 3.1, >> please consider the timeline and share your schedule >> with the Apache Spark community. For the other stuff, >> we can put it into 3.2 release scheduled in June 2021. >> >> Last not but least, I want to emphasize (7) once again. >> We need to remove the forked unofficial Hive eventually. >> Please let us know your reasons if you need to build >> from Apache Spark 3.1 source code for Hive 1.2. >> >> https://github.com/apache/spark/pull/29936 >> >> As I wrote in the above PR description, for old releases, >> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide >> Hive 1.2-based distribution. >> >> Bests, >> Dongjoon. >> >