1000% with Steve, the org.spark-project hive 1.2 will need a solution. It is 
old and rather buggy; and It’s been *years*

I think we should decouple hive change from everything else if people are 
concerned?

________________________________
From: Steve Loughran <ste...@cloudera.com.INVALID>
Sent: Sunday, November 17, 2019 9:22:09 AM
To: Cheng Lian <lian.cs....@gmail.com>
Cc: Sean Owen <sro...@gmail.com>; Wenchen Fan <cloud0...@gmail.com>; Dongjoon 
Hyun <dongjoon.h...@gmail.com>; dev <dev@spark.apache.org>; Yuming Wang 
<wgy...@gmail.com>
Subject: Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Can I take this moment to remind everyone that the version of hive which spark 
has historically bundled (the org.spark-project one) is an orphan project put 
together to deal with Hive's shading issues and a source of unhappiness in the 
Hive project. What ever get shipped should do its best to avoid including that 
file.

Postponing a switch to hadoop 3.x after spark 3.0 is probably the safest move 
from a risk minimisation perspective. If something has broken then it is you 
can start with the assumption that it is in the o.a.s packages without having 
to debug o.a.hadoop and o.a.hive first. There is a cost: if there are problems 
with the hadoop / hive dependencies those teams will inevitably ignore filed 
bug reports for the same reason spark team will probably because 1.6-related 
JIRAs as WONTFIX. WONTFIX responses for the Hadoop 2.x line include any 
compatibility issues with Java 9+. Do bear that in mind. It's not been tested, 
it has dependencies on artifacts we know are incompatible, and as far as the 
Hadoop project is concerned: people should move to branch 3 if they want to run 
on a modern version of Java

It would be really really good if the published spark maven artefacts (a) 
included the spark-hadoop-cloud JAR and (b) were dependent upon hadoop 3.x. 
That way people doing things with their own projects will get up-to-date 
dependencies and don't get WONTFIX responses themselves.

-Steve

PS: Discussion on hadoop-dev @ making Hadoop 2.10 the official "last ever" 
branch-2 release and then declare its predecessors EOL; 2.10 will be the 
transition release.

On Sun, Nov 17, 2019 at 1:50 AM Cheng Lian 
<lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote:
Dongjoon, I didn't follow the original Hive 2.3 discussion closely. I thought 
the original proposal was to replace Hive 1.2 with Hive 2.3, which seemed 
risky, and therefore we only introduced Hive 2.3 under the hadoop-3.2 profile 
without removing Hive 1.2. But maybe I'm totally wrong here...

Sean, Yuming's PR https://github.com/apache/spark/pull/26533 showed that Hadoop 
2 + Hive 2 + JDK 11 looks promising. My major motivation is not about demand, 
but risk control: coupling Hive 2.3, Hadoop 3.2, and JDK 11 upgrade together 
looks too risky.

On Sat, Nov 16, 2019 at 4:03 AM Sean Owen 
<sro...@gmail.com<mailto:sro...@gmail.com>> wrote:
I'd prefer simply not making Hadoop 3 the default until 3.1+, rather
than introduce yet another build combination. Does Hadoop 2 + Hive 2
work and is there demand for it?

On Sat, Nov 16, 2019 at 3:52 AM Wenchen Fan 
<cloud0...@gmail.com<mailto:cloud0...@gmail.com>> wrote:
>
> Do we have a limitation on the number of pre-built distributions? Seems this 
> time we need
> 1. hadoop 2.7 + hive 1.2
> 2. hadoop 2.7 + hive 2.3
> 3. hadoop 3 + hive 2.3
>
> AFAIK we always built with JDK 8 (but make it JDK 11 compatible), so don't 
> need to add JDK version to the combination.
>
> On Sat, Nov 16, 2019 at 4:05 PM Dongjoon Hyun 
> <dongjoon.h...@gmail.com<mailto:dongjoon.h...@gmail.com>> wrote:
>>
>> Thank you for suggestion.
>>
>> Having `hive-2.3` profile sounds good to me because it's orthogonal to 
>> Hadoop 3.
>> IIRC, originally, it was proposed in that way, but we put it under 
>> `hadoop-3.2` to avoid adding new profiles at that time.
>>
>> And, I'm wondering if you are considering additional pre-built distribution 
>> and Jenkins jobs.
>>
>> Bests,
>> Dongjoon.
>>

Reply via email to