Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

Will Lauer Tue, 08 Jun 2021 13:05:28 -0700

Unfortunately, the migration off of hadoop3 is a hard one (maybe not for
Druid, but certainly for big organizations running large hadoop2
workloads). If druid migrated to hadoop3 after 0.22, that would probably
prevent me from taking any new versions of Druid for at least the remainder
of the year and possibly longer.


Will


<http://www.verizonmedia.com>

Will Lauer

Senior Principal Architect, Audience & Advertising Reporting
Data Platforms & Systems Engineering

M 508 561 6427
1908 S. First St
Champaign, IL 61822

<http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
<https://www.linkedin.com/company/verizon-media/>
<http://www.instagram.com/verizonmedia>



On Tue, Jun 8, 2021 at 3:08 AM Clint Wylie <[email protected]> wrote:

> Hi all,
>
> I've been assisting with some experiments to see how we might want to
> migrate Druid to support Hadoop 3.x, and more importantly, see if maybe we
> can finally be free of some of the dependency issues it has been causing
> for as long as I can remember working with Druid.
>
> Hadoop 3 introduced shaded client jars,
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D11804&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=FRw8adGvb_qAPLtFgQWNJywJiOgU8zgfkkXf_nokPKQ&s=rBnEOMf2IKDMeWUo4TZyqf5CzrnbiYTfZUkjHr8GOHo&e=
> , with the purpose to
> allow applications to talk to the Hadoop cluster without drowning in its
> transitive dependencies. The experimental branch that I have been helping
> with, which is using these new shaded client jars, can be seen in this PR
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_pull_11314&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ULseRJUsY5gTBgFA9-BUxg&m=FRw8adGvb_qAPLtFgQWNJywJiOgU8zgfkkXf_nokPKQ&s=424doHggbejAz5XswosgVkJK98VUBcUj0pD5bAcBjT0&e=
> , and is currently working with
> the HDFS integration tests as well as the Hadoop tutorial flow in the Druid
> docs (which is pretty much equivalent to the HDFS integration test).
>
> The cloud deep storages still need some further testing and some minor
> cleanup still needs done for the docs and such. Additionally we still need
> to figure out how to handle the Kerberos extension, because it extends some
> Hadoop classes so isn't able to use the shaded client jars in a
> straight-forward manner, and so still has heavy dependencies and hasn't
> been tested. However, the experiment has started to pan out enough to where
> I think it is worth starting this discussion, because it does have some
> implications.
>
> Making this change I think will allow us to update our dependencies with a
> lot more freedom (I'm looking at you, Guava), but the catch is that once we
> make this change and start updating these dependencies, it will become
> hard, nearing impossible to support Hadoop 2.x, since as far as I know
> there isn't an equivalent set of shaded client jars. I am also not certain
> how far back the Hadoop job classpath isolation stuff goes
> (mapreduce.job.classloader = true) which I think is required to be set on
> Druid tasks for this shaded stuff to work alongside updated Druid
> dependencies.
>
> Is anyone opposed to or worried about dropping Hadoop 2.x support after the
> Druid 0.22 release?
>

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x

Reply via email to