1000% with Steve, the org.spark-project hive 1.2 will need a solution. It is old and rather buggy; and It’s been *years*
I think we should decouple hive change from everything else if people are concerned? ________________________________ From: Steve Loughran <ste...@cloudera.com.INVALID> Sent: Sunday, November 17, 2019 9:22:09 AM To: Cheng Lian <lian.cs....@gmail.com> Cc: Sean Owen <sro...@gmail.com>; Wenchen Fan <cloud0...@gmail.com>; Dongjoon Hyun <dongjoon.h...@gmail.com>; dev <dev@spark.apache.org>; Yuming Wang <wgy...@gmail.com> Subject: Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0? Can I take this moment to remind everyone that the version of hive which spark has historically bundled (the org.spark-project one) is an orphan project put together to deal with Hive's shading issues and a source of unhappiness in the Hive project. What ever get shipped should do its best to avoid including that file. Postponing a switch to hadoop 3.x after spark 3.0 is probably the safest move from a risk minimisation perspective. If something has broken then it is you can start with the assumption that it is in the o.a.s packages without having to debug o.a.hadoop and o.a.hive first. There is a cost: if there are problems with the hadoop / hive dependencies those teams will inevitably ignore filed bug reports for the same reason spark team will probably because 1.6-related JIRAs as WONTFIX. WONTFIX responses for the Hadoop 2.x line include any compatibility issues with Java 9+. Do bear that in mind. It's not been tested, it has dependencies on artifacts we know are incompatible, and as far as the Hadoop project is concerned: people should move to branch 3 if they want to run on a modern version of Java It would be really really good if the published spark maven artefacts (a) included the spark-hadoop-cloud JAR and (b) were dependent upon hadoop 3.x. That way people doing things with their own projects will get up-to-date dependencies and don't get WONTFIX responses themselves. -Steve PS: Discussion on hadoop-dev @ making Hadoop 2.10 the official "last ever" branch-2 release and then declare its predecessors EOL; 2.10 will be the transition release. On Sun, Nov 17, 2019 at 1:50 AM Cheng Lian <lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote: Dongjoon, I didn't follow the original Hive 2.3 discussion closely. I thought the original proposal was to replace Hive 1.2 with Hive 2.3, which seemed risky, and therefore we only introduced Hive 2.3 under the hadoop-3.2 profile without removing Hive 1.2. But maybe I'm totally wrong here... Sean, Yuming's PR https://github.com/apache/spark/pull/26533 showed that Hadoop 2 + Hive 2 + JDK 11 looks promising. My major motivation is not about demand, but risk control: coupling Hive 2.3, Hadoop 3.2, and JDK 11 upgrade together looks too risky. On Sat, Nov 16, 2019 at 4:03 AM Sean Owen <sro...@gmail.com<mailto:sro...@gmail.com>> wrote: I'd prefer simply not making Hadoop 3 the default until 3.1+, rather than introduce yet another build combination. Does Hadoop 2 + Hive 2 work and is there demand for it? On Sat, Nov 16, 2019 at 3:52 AM Wenchen Fan <cloud0...@gmail.com<mailto:cloud0...@gmail.com>> wrote: > > Do we have a limitation on the number of pre-built distributions? Seems this > time we need > 1. hadoop 2.7 + hive 1.2 > 2. hadoop 2.7 + hive 2.3 > 3. hadoop 3 + hive 2.3 > > AFAIK we always built with JDK 8 (but make it JDK 11 compatible), so don't > need to add JDK version to the combination. > > On Sat, Nov 16, 2019 at 4:05 PM Dongjoon Hyun > <dongjoon.h...@gmail.com<mailto:dongjoon.h...@gmail.com>> wrote: >> >> Thank you for suggestion. >> >> Having `hive-2.3` profile sounds good to me because it's orthogonal to >> Hadoop 3. >> IIRC, originally, it was proposed in that way, but we put it under >> `hadoop-3.2` to avoid adding new profiles at that time. >> >> And, I'm wondering if you are considering additional pre-built distribution >> and Jenkins jobs. >> >> Bests, >> Dongjoon. >>