I also agree with Steve and Felix. Let's have another thread to discuss Hive issue
because this thread was originally for `hadoop` version. And, now we can have `hive-2.3` profile for both `hadoop-2.7` and `hadoop-3.0` versions. We don't need to mix both. Bests, Dongjoon. On Mon, Nov 18, 2019 at 8:19 PM Felix Cheung <felixcheun...@hotmail.com> wrote: > 1000% with Steve, the org.spark-project hive 1.2 will need a solution. It > is old and rather buggy; and It’s been *years* > > I think we should decouple hive change from everything else if people are > concerned? > > ------------------------------ > *From:* Steve Loughran <ste...@cloudera.com.INVALID> > *Sent:* Sunday, November 17, 2019 9:22:09 AM > *To:* Cheng Lian <lian.cs....@gmail.com> > *Cc:* Sean Owen <sro...@gmail.com>; Wenchen Fan <cloud0...@gmail.com>; > Dongjoon Hyun <dongjoon.h...@gmail.com>; dev <dev@spark.apache.org>; > Yuming Wang <wgy...@gmail.com> > *Subject:* Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0? > > Can I take this moment to remind everyone that the version of hive which > spark has historically bundled (the org.spark-project one) is an orphan > project put together to deal with Hive's shading issues and a source of > unhappiness in the Hive project. What ever get shipped should do its best > to avoid including that file. > > Postponing a switch to hadoop 3.x after spark 3.0 is probably the safest > move from a risk minimisation perspective. If something has broken then it > is you can start with the assumption that it is in the o.a.s packages > without having to debug o.a.hadoop and o.a.hive first. There is a cost: if > there are problems with the hadoop / hive dependencies those teams will > inevitably ignore filed bug reports for the same reason spark team will > probably because 1.6-related JIRAs as WONTFIX. WONTFIX responses for the > Hadoop 2.x line include any compatibility issues with Java 9+. Do bear that > in mind. It's not been tested, it has dependencies on artifacts we know are > incompatible, and as far as the Hadoop project is concerned: people should > move to branch 3 if they want to run on a modern version of Java > > It would be really really good if the published spark maven artefacts (a) > included the spark-hadoop-cloud JAR and (b) were dependent upon hadoop 3.x. > That way people doing things with their own projects will get up-to-date > dependencies and don't get WONTFIX responses themselves. > > -Steve > > PS: Discussion on hadoop-dev @ making Hadoop 2.10 the official "last ever" > branch-2 release and then declare its predecessors EOL; 2.10 will be the > transition release. > > On Sun, Nov 17, 2019 at 1:50 AM Cheng Lian <lian.cs....@gmail.com> wrote: > > Dongjoon, I didn't follow the original Hive 2.3 discussion closely. I > thought the original proposal was to replace Hive 1.2 with Hive 2.3, which > seemed risky, and therefore we only introduced Hive 2.3 under the > hadoop-3.2 profile without removing Hive 1.2. But maybe I'm totally wrong > here... > > Sean, Yuming's PR https://github.com/apache/spark/pull/26533 showed that > Hadoop 2 + Hive 2 + JDK 11 looks promising. My major motivation is not > about demand, but risk control: coupling Hive 2.3, Hadoop 3.2, and JDK 11 > upgrade together looks too risky. > > On Sat, Nov 16, 2019 at 4:03 AM Sean Owen <sro...@gmail.com> wrote: > > I'd prefer simply not making Hadoop 3 the default until 3.1+, rather > than introduce yet another build combination. Does Hadoop 2 + Hive 2 > work and is there demand for it? > > On Sat, Nov 16, 2019 at 3:52 AM Wenchen Fan <cloud0...@gmail.com> wrote: > > > > Do we have a limitation on the number of pre-built distributions? Seems > this time we need > > 1. hadoop 2.7 + hive 1.2 > > 2. hadoop 2.7 + hive 2.3 > > 3. hadoop 3 + hive 2.3 > > > > AFAIK we always built with JDK 8 (but make it JDK 11 compatible), so > don't need to add JDK version to the combination. > > > > On Sat, Nov 16, 2019 at 4:05 PM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> > >> Thank you for suggestion. > >> > >> Having `hive-2.3` profile sounds good to me because it's orthogonal to > Hadoop 3. > >> IIRC, originally, it was proposed in that way, but we put it under > `hadoop-3.2` to avoid adding new profiles at that time. > >> > >> And, I'm wondering if you are considering additional pre-built > distribution and Jenkins jobs. > >> > >> Bests, > >> Dongjoon. > >> > >