Dropping Apache Spark Hadoop2 Binary Distribution?

Dongjoon Hyun Mon, 03 Oct 2022 20:16:31 -0700

Hi, All.

I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
is still used by someone in the community or not. If it's not used or not
useful,
we may remove it from Apache Spark 3.4.0 release.



https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz

Here is the background of this question.
Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
Spark community has been building and releasing with Java 8 only.
I believe that the user applications also use Java8+ in these days.
Recently, I received the following message from the Hadoop PMC.

  > "if you really want to claim hadoop 2.x compatibility, then you have to
  > be building against java 7". Otherwise a lot of people with hadoop 2.x
  > clusters won't be able to run your code. If your projects are java8+
  > only, then they are implicitly hadoop 3.1+, no matter what you use
  > in your build. Hence: no need for branch-2 branches except
  > to complicate your build/test/release processes [1]

If Hadoop2 binary distribution is no longer used as of today,
or incomplete somewhere due to Java 8 building, the following three
existing alternative Hadoop 3 binary distributions could be
the better official solution for old Hadoop 2 clusters.

    1) Scala 2.12 and without-hadoop distribution
    2) Scala 2.12 and Hadoop 3 distribution
    3) Scala 2.13 and Hadoop 3 distribution

In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2 Binary
distribution?

Dongjoon

[1]
https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247

Dropping Apache Spark Hadoop2 Binary Distribution?

Reply via email to