Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Dongjoon Hyun Tue, 23 Jun 2020 00:05:02 -0700

Hi, All.

I bump up this thread again with the title "Use Hadoop-3.2 as a default
Hadoop profile in 3.1.0?"
There exists some recent discussion on the following PR. Please let us know
your thoughts.


https://github.com/apache/spark/pull/28897


Bests,
Dongjoon.


On Fri, Nov 1, 2019 at 9:41 AM Xiao Li <[email protected]> wrote:

> Hi, Steve,
>
> Thanks for your comments! My major quality concern is not against Hadoop
> 3.2. In this release, Hive execution module upgrade [from 1.2 to 2.3], Hive
> thrift-server upgrade, and JDK11 supports are added to Hadoop 3.2 profile
> only. Compared with Hadoop 2.x profile, the Hadoop 3.2 profile is more
> risky due to these changes.
>
> To speed up the adoption of Spark 3.0, which has many other highly
> desirable features, I am proposing to keep Hadoop 2.x profile as the
> default.
>
> Cheers,
>
> Xiao.
>
>
>
> On Fri, Nov 1, 2019 at 5:33 AM Steve Loughran <[email protected]> wrote:
>
>> What is the current default value? as the 2.x releases are becoming EOL;
>> 2.7 is dead, there might be a 2.8.x; for now 2.9 is the branch-2 release
>> getting attention. 2.10.0 shipped yesterday, but the ".0" means there will
>> inevitably be surprises.
>>
>> One issue about using a older versions is that any problem reported
>> -especially at stack traces you can blame me for- Will generally be met by
>> a response of "does it go away when you upgrade?" The other issue is how
>> much test coverage are things getting?
>>
>> w.r.t Hadoop 3.2 stability, nothing major has been reported. The ABFS
>> client is there, and I the big guava update (HADOOP-16213) went in. People
>> will either love or hate that.
>>
>> No major changes in s3a code between 3.2.0 and 3.2.1; I have a large
>> backport planned though, including changes to better handle AWS caching of
>> 404s generatd from HEAD requests before an object was actually created.
>>
>> It would be really good if the spark distributions shipped with later
>> versions of the hadoop artifacts.
>>
>> On Mon, Oct 28, 2019 at 7:53 PM Xiao Li <[email protected]> wrote:
>>
>>> The stability and quality of Hadoop 3.2 profile are unknown. The changes
>>> are massive, including Hive execution and a new version of Hive
>>> thriftserver.
>>>
>>> To reduce the risk, I would like to keep the current default version
>>> unchanged. When it becomes stable, we can change the default profile to
>>> Hadoop-3.2.
>>>
>>> Cheers,
>>>
>>> Xiao
>>>
>>> On Mon, Oct 28, 2019 at 12:51 PM Sean Owen <[email protected]> wrote:
>>>
>>>> I'm OK with that, but don't have a strong opinion nor info about the
>>>> implications.
>>>> That said my guess is we're close to the point where we don't need to
>>>> support Hadoop 2.x anyway, so, yeah.
>>>>
>>>> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun <[email protected]>
>>>> wrote:
>>>> >
>>>> > Hi, All.
>>>> >
>>>> > There was a discussion on publishing artifacts built with Hadoop 3 .
>>>> > But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will
>>>> be the same because we didn't change anything yet.
>>>> >
>>>> > Technically, we need to change two places for publishing.
>>>> >
>>>> > 1. Jenkins Snapshot Publishing
>>>> >
>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
>>>> >
>>>> > 2. Release Snapshot/Release Publishing
>>>> >
>>>> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh
>>>> >
>>>> > To minimize the change, we need to switch our default Hadoop profile.
>>>> >
>>>> > Currently, the default is `hadoop-2.7 (2.7.4)` profile and
>>>> `hadoop-3.2 (3.2.0)` is optional.
>>>> > We had better use `hadoop-3.2` profile by default and `hadoop-2.7`
>>>> optionally.
>>>> >
>>>> > Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7`
>>>> distribution will use `Hive 1.2.1` like Apache Spark 2.4.x.
>>>> >
>>>> > Bests,
>>>> > Dongjoon.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: [email protected]
>>>>
>>>>
>>>
>>> --
>>> [image: Databricks Summit - Watch the talks]
>>> <https://databricks.com/sparkaisummit/north-america>
>>>
>>
>
> --
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Reply via email to