Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Dongjoon Hyun Mon, 18 Nov 2019 20:39:33 -0800

I also agree with Steve and Felix.

Let's have another thread to discuss Hive issue


because this thread was originally for `hadoop` version.

And, now we can have `hive-2.3` profile for both `hadoop-2.7` and
`hadoop-3.0` versions.

We don't need to mix both.

Bests,
Dongjoon.


On Mon, Nov 18, 2019 at 8:19 PM Felix Cheung <felixcheun...@hotmail.com>
wrote:

> 1000% with Steve, the org.spark-project hive 1.2 will need a solution. It
> is old and rather buggy; and It’s been *years*
>
> I think we should decouple hive change from everything else if people are
> concerned?
>
> ------------------------------
> *From:* Steve Loughran <ste...@cloudera.com.INVALID>
> *Sent:* Sunday, November 17, 2019 9:22:09 AM
> *To:* Cheng Lian <lian.cs....@gmail.com>
> *Cc:* Sean Owen <sro...@gmail.com>; Wenchen Fan <cloud0...@gmail.com>;
> Dongjoon Hyun <dongjoon.h...@gmail.com>; dev <dev@spark.apache.org>;
> Yuming Wang <wgy...@gmail.com>
> *Subject:* Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?
>
> Can I take this moment to remind everyone that the version of hive which
> spark has historically bundled (the org.spark-project one) is an orphan
> project put together to deal with Hive's shading issues and a source of
> unhappiness in the Hive project. What ever get shipped should do its best
> to avoid including that file.
>
> Postponing a switch to hadoop 3.x after spark 3.0 is probably the safest
> move from a risk minimisation perspective. If something has broken then it
> is you can start with the assumption that it is in the o.a.s packages
> without having to debug o.a.hadoop and o.a.hive first. There is a cost: if
> there are problems with the hadoop / hive dependencies those teams will
> inevitably ignore filed bug reports for the same reason spark team will
> probably because 1.6-related JIRAs as WONTFIX. WONTFIX responses for the
> Hadoop 2.x line include any compatibility issues with Java 9+. Do bear that
> in mind. It's not been tested, it has dependencies on artifacts we know are
> incompatible, and as far as the Hadoop project is concerned: people should
> move to branch 3 if they want to run on a modern version of Java
>
> It would be really really good if the published spark maven artefacts (a)
> included the spark-hadoop-cloud JAR and (b) were dependent upon hadoop 3.x.
> That way people doing things with their own projects will get up-to-date
> dependencies and don't get WONTFIX responses themselves.
>
> -Steve
>
> PS: Discussion on hadoop-dev @ making Hadoop 2.10 the official "last ever"
> branch-2 release and then declare its predecessors EOL; 2.10 will be the
> transition release.
>
> On Sun, Nov 17, 2019 at 1:50 AM Cheng Lian <lian.cs....@gmail.com> wrote:
>
> Dongjoon, I didn't follow the original Hive 2.3 discussion closely. I
> thought the original proposal was to replace Hive 1.2 with Hive 2.3, which
> seemed risky, and therefore we only introduced Hive 2.3 under the
> hadoop-3.2 profile without removing Hive 1.2. But maybe I'm totally wrong
> here...
>
> Sean, Yuming's PR https://github.com/apache/spark/pull/26533 showed that
> Hadoop 2 + Hive 2 + JDK 11 looks promising. My major motivation is not
> about demand, but risk control: coupling Hive 2.3, Hadoop 3.2, and JDK 11
> upgrade together looks too risky.
>
> On Sat, Nov 16, 2019 at 4:03 AM Sean Owen <sro...@gmail.com> wrote:
>
> I'd prefer simply not making Hadoop 3 the default until 3.1+, rather
> than introduce yet another build combination. Does Hadoop 2 + Hive 2
> work and is there demand for it?
>
> On Sat, Nov 16, 2019 at 3:52 AM Wenchen Fan <cloud0...@gmail.com> wrote:
> >
> > Do we have a limitation on the number of pre-built distributions? Seems
> this time we need
> > 1. hadoop 2.7 + hive 1.2
> > 2. hadoop 2.7 + hive 2.3
> > 3. hadoop 3 + hive 2.3
> >
> > AFAIK we always built with JDK 8 (but make it JDK 11 compatible), so
> don't need to add JDK version to the combination.
> >
> > On Sat, Nov 16, 2019 at 4:05 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
> >>
> >> Thank you for suggestion.
> >>
> >> Having `hive-2.3` profile sounds good to me because it's orthogonal to
> Hadoop 3.
> >> IIRC, originally, it was proposed in that way, but we put it under
> `hadoop-3.2` to avoid adding new profiles at that time.
> >>
> >> And, I'm wondering if you are considering additional pre-built
> distribution and Jenkins jobs.
> >>
> >> Bests,
> >> Dongjoon.
> >>
>
>

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Reply via email to