Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Felix Cheung Mon, 04 Feb 2019 12:42:38 -0800

Likely need a shim (which we should have anyway) because of namespace/import 
changes.

I’m huge +1 on this.

________________________________
From: Hyukjin Kwon <gurwls...@gmail.com>
Sent: Monday, February 4, 2019 12:27 PM
To: Xiao Li
Cc: Sean Owen; Felix Cheung; Ryan Blue; Marcelo Vanzin; Yuming Wang; dev
Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

I should check the details and feasiablity by myself but to me it sounds fine 
if it doesn't need extra big efforts.

On Tue, 5 Feb 2019, 4:15 am Xiao Li 
<gatorsm...@gmail.com<mailto:gatorsm...@gmail.com> wrote:
Yes. When our support/integration with Hive 2.x becomes stable, we can do it in 
Hadoop 2.x profile too, if needed. The whole proposal is to minimize the risk 
and ensure the release stability and quality.

Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>> 于2019年2月4日周一 
下午12:01写道：
Xiao, to check if I understood correctly, do you mean the below?

1. Use our fork with Hadoop 2.x profile for now, and use Hive 2.x with Hadoop 
3.x profile.
2. Make another newer version of thrift server by Hive 2.x(?) in Spark side.
3. Target the transition to Hive 2.x completely and slowly later in the future.

2019년 2월 5일 (화) 오전 1:16, Xiao Li 
<gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>>님이 작성:
To reduce the impact and risk of upgrading Hive execution JARs, we can just 
upgrade the built-in Hive to 2.x when using the profile of Hadoop 3.x. The 
support of Hadoop 3 will be still experimental in our next release. That means, 
the impact and risk are very minimal for most users who are still using Hadoop 
2.x profile.

The code changes in Spark thrift server are massive. It is risky and hard to 
review. The original code of our Spark thrift server is from Hive-service 
1.2.1. To reduce the risk of the upgrade, we can inline the new version. In the 
future, we can completely get rid of the thrift server, and build our own 
high-performant JDBC server.

Does this proposal sound good to you?

In the last two weeks, Yuming was trying this proposal. Now, he is on vacation. 
In China, today is already the lunar New Year. I would not expect he will reply 
this email in the next 7 days.

Cheers,

Xiao

Sean Owen <sro...@gmail.com<mailto:sro...@gmail.com>> 于2019年2月4日周一 上午7:56写道：
I was unclear from this thread what the objection to these PRs is:

https://github.com/apache/spark/pull/23552
https://github.com/apache/spark/pull/23553

Would we like to specifically discuss whether to merge these or not? I
hear support for it, concerns about continuing to support Hive too,
but I wasn't clear whether those concerns specifically argue against
these PRs.

On Fri, Feb 1, 2019 at 2:03 PM Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote:
>
> What’s the update and next step on this?
>
> We have real users getting blocked by this issue.
>
>
> ________________________________
> From: Xiao Li <gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>>
> Sent: Wednesday, January 16, 2019 9:37 AM
> To: Ryan Blue
> Cc: Marcelo Vanzin; Hyukjin Kwon; Sean Owen; Felix Cheung; Yuming Wang; dev
> Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4
>
> Thanks for your feedbacks!
>
> Working with Yuming to reduce the risk of stability and quality. Will keep 
> you posted when the proposal is ready.
>
> Cheers,
>
> Xiao
>
> Ryan Blue <rb...@netflix.com<mailto:rb...@netflix.com>> 于2019年1月16日周三 
> 上午9:27写道：
>>
>> +1 for what Marcelo and Hyukjin said.
>>
>> In particular, I agree that we can't expect Hive to release a version that 
>> is now more than 3 years old just to solve a problem for Spark. Maybe that 
>> would have been a reasonable ask instead of publishing a fork years ago, but 
>> I think this is now Spark's problem.
>>
>> On Tue, Jan 15, 2019 at 9:02 PM Marcelo Vanzin 
>> <van...@cloudera.com<mailto:van...@cloudera.com>> wrote:
>>>
>>> +1 to that. HIVE-16391 by itself means we're giving up things like
>>> Hadoop 3, and we're also putting the burden on the Hive folks to fix a
>>> problem that we created.
>>>
>>> The current PR is basically a Spark-side fix for that bug. It does
>>> mean also upgrading Hive (which gives us Hadoop 3, yay!), but I think
>>> it's really the right path to take here.
>>>
>>> On Tue, Jan 15, 2019 at 6:32 PM Hyukjin Kwon 
>>> <gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote:
>>> >
>>> > Resolving HIVE-16391 means Hive to release 1.2.x that contains the fixes 
>>> > of our Hive fork (correct me if I am mistaken).
>>> >
>>> > Just to be honest by myself and as a personal opinion, that basically 
>>> > says Hive to take care of Spark's dependency.
>>> > Hive looks going ahead for 3.1.x and no one would use the newer release 
>>> > of 1.2.x. In practice, Spark doesn't make a release 1.6.x anymore for 
>>> > instance,
>>> >
>>> > Frankly, my impression was that it's, honestly, our mistake to fix. Since 
>>> > Spark community is big enough, I was thinking we should try to fix it by 
>>> > ourselves first.
>>> > I am not saying upgrading is the only way to get through this but I think 
>>> > we should at least try first, and see what's next.
>>> >
>>> > It does, yes, sound more risky to upgrade it in our side but I think it's 
>>> > worth to check and try it and see if it's possible.
>>> > I think this is a standard approach to upgrade the dependency than using 
>>> > the fork or letting Hive side to release another 1.2.x.
>>> >
>>> > If we fail to upgrade it for critical or inevitable reasons somehow, yes, 
>>> > we could find an alternative but that basically means
>>> > we're going to stay in 1.2.x for, at least, a long time (say .. until 
>>> > Spark 4.0.0?).
>>> >
>>> > I know somehow it happened to be sensitive but to be just literally 
>>> > honest to myself, I think we should make a try.
>>> >
>>>
>>>
>>> --
>>> Marcelo
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Reply via email to