Can I take this moment to remind everyone that the version of hive which
spark has historically bundled (the org.spark-project one) is an orphan
project put together to deal with Hive's shading issues and a source of
unhappiness in the Hive project. What ever get shipped should do its best
to avoid including that file.

Postponing a switch to hadoop 3.x after spark 3.0 is probably the safest
move from a risk minimisation perspective. If something has broken then it
is you can start with the assumption that it is in the o.a.s packages
without having to debug o.a.hadoop and o.a.hive first. There is a cost: if
there are problems with the hadoop / hive dependencies those teams will
inevitably ignore filed bug reports for the same reason spark team will
probably because 1.6-related JIRAs as WONTFIX. WONTFIX responses for the
Hadoop 2.x line include any compatibility issues with Java 9+. Do bear that
in mind. It's not been tested, it has dependencies on artifacts we know are
incompatible, and as far as the Hadoop project is concerned: people should
move to branch 3 if they want to run on a modern version of Java

It would be really really good if the published spark maven artefacts (a)
included the spark-hadoop-cloud JAR and (b) were dependent upon hadoop 3.x.
That way people doing things with their own projects will get up-to-date
dependencies and don't get WONTFIX responses themselves.

-Steve

PS: Discussion on hadoop-dev @ making Hadoop 2.10 the official "last ever"
branch-2 release and then declare its predecessors EOL; 2.10 will be the
transition release.

On Sun, Nov 17, 2019 at 1:50 AM Cheng Lian <lian.cs....@gmail.com> wrote:

> Dongjoon, I didn't follow the original Hive 2.3 discussion closely. I
> thought the original proposal was to replace Hive 1.2 with Hive 2.3, which
> seemed risky, and therefore we only introduced Hive 2.3 under the
> hadoop-3.2 profile without removing Hive 1.2. But maybe I'm totally wrong
> here...
>
> Sean, Yuming's PR https://github.com/apache/spark/pull/26533 showed that
> Hadoop 2 + Hive 2 + JDK 11 looks promising. My major motivation is not
> about demand, but risk control: coupling Hive 2.3, Hadoop 3.2, and JDK 11
> upgrade together looks too risky.
>
> On Sat, Nov 16, 2019 at 4:03 AM Sean Owen <sro...@gmail.com> wrote:
>
>> I'd prefer simply not making Hadoop 3 the default until 3.1+, rather
>> than introduce yet another build combination. Does Hadoop 2 + Hive 2
>> work and is there demand for it?
>>
>> On Sat, Nov 16, 2019 at 3:52 AM Wenchen Fan <cloud0...@gmail.com> wrote:
>> >
>> > Do we have a limitation on the number of pre-built distributions? Seems
>> this time we need
>> > 1. hadoop 2.7 + hive 1.2
>> > 2. hadoop 2.7 + hive 2.3
>> > 3. hadoop 3 + hive 2.3
>> >
>> > AFAIK we always built with JDK 8 (but make it JDK 11 compatible), so
>> don't need to add JDK version to the combination.
>> >
>> > On Sat, Nov 16, 2019 at 4:05 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>> >>
>> >> Thank you for suggestion.
>> >>
>> >> Having `hive-2.3` profile sounds good to me because it's orthogonal to
>> Hadoop 3.
>> >> IIRC, originally, it was proposed in that way, but we put it under
>> `hadoop-3.2` to avoid adding new profiles at that time.
>> >>
>> >> And, I'm wondering if you are considering additional pre-built
>> distribution and Jenkins jobs.
>> >>
>> >> Bests,
>> >> Dongjoon.
>> >>
>>
>

Reply via email to