yes i am not against hadoop 3 becoming the default. i was just questioning the statement that we are close to dropping support for hadoop 2.
we build our own spark releases that we deploy on the clusters of our clients. these clusters are hdp 2.x, cdh 5, emr, dataproc, etc. i am aware that hadoop 2.6 profile was dropped and we are handling this in-house. given that latest hdp 2.x is still hadoop 2.7 bumping hadoop 2 profile to latest would probably be an issue for us. On Sat, Nov 2, 2019, 15:47 Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Hi, Koert. > > Could you be more specific to your Hadoop version requirement? > > Although we will have Hadoop 2.7 profile, Hadoop 2.6 and older support is > officially already dropped in Apache Spark 3.0.0. We can not give you the > answer for Hadoop 2.6 and older version clusters because we are not testing > at all. > > Also, Steve already pointed out that Hadoop 2.7 is also EOL. According to > his advice, we might need to upgrade our Hadoop 2.7 profile to the latest > 2.x. I'm wondering you are against on that because of Hadoop 2.6 or older > version support. > > BTW, I'm the one of the users of Hadoop 3.x clusters. It's used already > and we are migrating more. Apache Spark 3.0 will arrive 2020 (not today). > We need to consider that, too. Do you have any migration plan in 2020? > > In short, for the clusters using Hadoop 2.6 and older versions, Apache > Spark 2.4 is supported as a LTS version. You can get the bug fixes. For > Hadoop 2.7, Apache Spark 3.0 will have the profile and the binary release. > Making Hadoop 3.2 profile as a default is irrelevant to that. > > Bests, > Dongjoon. > > > On Sat, Nov 2, 2019 at 09:35 Koert Kuipers <ko...@tresata.com> wrote: > >> i dont see how we can be close to the point where we dont need to support >> hadoop 2.x. this does not agree with the reality from my perspective, which >> is that all our clients are on hadoop 2.x. not a single one is on hadoop >> 3.x currently. this includes deployments of cloudera distros, hortonworks >> distros, and cloud distros like emr and dataproc. >> >> forcing us to be on older spark versions would be unfortunate for us, and >> also bad for the community (as deployments like ours help find bugs in >> spark). >> >> On Mon, Oct 28, 2019 at 3:51 PM Sean Owen <sro...@gmail.com> wrote: >> >>> I'm OK with that, but don't have a strong opinion nor info about the >>> implications. >>> That said my guess is we're close to the point where we don't need to >>> support Hadoop 2.x anyway, so, yeah. >>> >>> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> > >>> > Hi, All. >>> > >>> > There was a discussion on publishing artifacts built with Hadoop 3 . >>> > But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will >>> be the same because we didn't change anything yet. >>> > >>> > Technically, we need to change two places for publishing. >>> > >>> > 1. Jenkins Snapshot Publishing >>> > >>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ >>> > >>> > 2. Release Snapshot/Release Publishing >>> > >>> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh >>> > >>> > To minimize the change, we need to switch our default Hadoop profile. >>> > >>> > Currently, the default is `hadoop-2.7 (2.7.4)` profile and `hadoop-3.2 >>> (3.2.0)` is optional. >>> > We had better use `hadoop-3.2` profile by default and `hadoop-2.7` >>> optionally. >>> > >>> > Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7` >>> distribution will use `Hive 1.2.1` like Apache Spark 2.4.x. >>> > >>> > Bests, >>> > Dongjoon. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>>