Hi, Koert.

Could you be more specific to your Hadoop version requirement?

Although we will have Hadoop 2.7 profile, Hadoop 2.6 and older support is
officially already dropped in Apache Spark 3.0.0. We can not give you the
answer for Hadoop 2.6 and older version clusters because we are not testing
at all.

Also, Steve already pointed out that Hadoop 2.7 is also EOL. According to
his advice, we might need to upgrade our Hadoop 2.7 profile to the latest
2.x. I'm wondering you are against on that because of Hadoop 2.6 or older
version support.

BTW, I'm the one of the users of Hadoop 3.x clusters. It's used already and
we are migrating more. Apache Spark 3.0 will arrive 2020 (not today). We
need to consider that, too. Do you have any migration plan in 2020?

In short, for the clusters using Hadoop 2.6 and older versions, Apache
Spark 2.4 is supported as a LTS version. You can get the bug fixes. For
Hadoop 2.7, Apache Spark 3.0 will have the profile and the binary release.
Making Hadoop 3.2 profile as a default is irrelevant to that.

Bests,
Dongjoon.


On Sat, Nov 2, 2019 at 09:35 Koert Kuipers <ko...@tresata.com> wrote:

> i dont see how we can be close to the point where we dont need to support
> hadoop 2.x. this does not agree with the reality from my perspective, which
> is that all our clients are on hadoop 2.x. not a single one is on hadoop
> 3.x currently. this includes deployments of cloudera distros, hortonworks
> distros, and cloud distros like emr and dataproc.
>
> forcing us to be on older spark versions would be unfortunate for us, and
> also bad for the community (as deployments like ours help find bugs in
> spark).
>
> On Mon, Oct 28, 2019 at 3:51 PM Sean Owen <sro...@gmail.com> wrote:
>
>> I'm OK with that, but don't have a strong opinion nor info about the
>> implications.
>> That said my guess is we're close to the point where we don't need to
>> support Hadoop 2.x anyway, so, yeah.
>>
>> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>> >
>> > Hi, All.
>> >
>> > There was a discussion on publishing artifacts built with Hadoop 3 .
>> > But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will
>> be the same because we didn't change anything yet.
>> >
>> > Technically, we need to change two places for publishing.
>> >
>> > 1. Jenkins Snapshot Publishing
>> >
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
>> >
>> > 2. Release Snapshot/Release Publishing
>> >
>> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh
>> >
>> > To minimize the change, we need to switch our default Hadoop profile.
>> >
>> > Currently, the default is `hadoop-2.7 (2.7.4)` profile and `hadoop-3.2
>> (3.2.0)` is optional.
>> > We had better use `hadoop-3.2` profile by default and `hadoop-2.7`
>> optionally.
>> >
>> > Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7`
>> distribution will use `Hive 1.2.1` like Apache Spark 2.4.x.
>> >
>> > Bests,
>> > Dongjoon.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Reply via email to