Re: Dropping Apache Spark Hadoop2 Binary Distribution?

Steve Loughran Thu, 06 Oct 2022 02:32:13 -0700

On Wed, 5 Oct 2022 at 21:59, Chao Sun <sunc...@apache.org> wrote:

> +1
>
> > and specifically may allow us to finally move off of the ancient version
> of Guava (?)
>
> I think the Guava issue comes from Hive 2.3 dependency, not Hadoop.
>


hadoop branch-2 has guava dependencies; not sure which one

A key lesson there is "never trust google artifacts to be stable at the
binary level"

Which is a shame, especially as there are some things in the jar (executors
in particular) for which there is still no comparable java equivalent.

Oh, we've also learned never to export *any* third party class in a public
API if possible.
Which is also a shame as java language lacks any form of tuple type and I
do not want to reimplement all of that. Java 17 records would suffice,
though as there's no java.lang.Tuple base type, no way to write methods
which work on arbitrary Tuples through some standard methods (elements():
int; element(int) -> object).

It's that cliche interview question "implement a tree", updated for guava
"how would you reimplement a popular guava class so as to get independence
from guava releases and the ability to make it a return type in a public
api"

Anyway, good to see the change is in. The next step would be to have a
baseline 3.x.y dependency as a minimum.

steve

>
> On Wed, Oct 5, 2022 at 1:55 PM Xinrong Meng <xinrong.apa...@gmail.com>
> wrote:
>
>> +1.
>>
>> On Wed, Oct 5, 2022 at 1:53 PM Xiao Li <lix...@databricks.com.invalid>
>> wrote:
>>
>>> +1.
>>>
>>> Xiao
>>>
>>> On Wed, Oct 5, 2022 at 12:49 PM Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> I'm OK with this. It simplifies maintenance a bit, and specifically may
>>>> allow us to finally move off of the ancient version of Guava (?)
>>>>
>>>> On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi, All.
>>>>>
>>>>> I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
>>>>> is still used by someone in the community or not. If it's not used or
>>>>> not useful,
>>>>> we may remove it from Apache Spark 3.4.0 release.
>>>>>
>>>>>
>>>>> https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz
>>>>>
>>>>> Here is the background of this question.
>>>>> Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
>>>>> Spark community has been building and releasing with Java 8 only.
>>>>> I believe that the user applications also use Java8+ in these days.
>>>>> Recently, I received the following message from the Hadoop PMC.
>>>>>
>>>>>   > "if you really want to claim hadoop 2.x compatibility, then you
>>>>> have to
>>>>>   > be building against java 7". Otherwise a lot of people with hadoop
>>>>> 2.x
>>>>>   > clusters won't be able to run your code. If your projects are
>>>>> java8+
>>>>>   > only, then they are implicitly hadoop 3.1+, no matter what you use
>>>>>   > in your build. Hence: no need for branch-2 branches except
>>>>>   > to complicate your build/test/release processes [1]
>>>>>
>>>>> If Hadoop2 binary distribution is no longer used as of today,
>>>>> or incomplete somewhere due to Java 8 building, the following three
>>>>> existing alternative Hadoop 3 binary distributions could be
>>>>> the better official solution for old Hadoop 2 clusters.
>>>>>
>>>>>     1) Scala 2.12 and without-hadoop distribution
>>>>>     2) Scala 2.12 and Hadoop 3 distribution
>>>>>     3) Scala 2.13 and Hadoop 3 distribution
>>>>>
>>>>> In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2
>>>>> Binary distribution?
>>>>>
>>>>> Dongjoon
>>>>>
>>>>> [1]
>>>>> https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247
>>>>>
>>>>
>>>
>>> --
>>>
>>>

Re: Dropping Apache Spark Hadoop2 Binary Distribution?

Reply via email to