Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-22 Thread Dongjoon Hyun
Thank you, Steve and all. As a conclusion of this thread, we will merge the following PR and move forward. [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles https://github.com/apache/spark/pull/26619 Please leave your comments if you have any concern. And, the following PRs and more will

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-22 Thread Steve Loughran
On Thu, Nov 21, 2019 at 12:53 AM Dongjoon Hyun wrote: > Thank you for much thoughtful clarification. I agree with your all options. > > Especially, for Hive Metastore connection, `Hive isolated client loader` > is also important with Hive 2.3 because Hive 2.3 client cannot talk with > Hive 2.1

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
Thank you for much thoughtful clarification. I agree with your all options. Especially, for Hive Metastore connection, `Hive isolated client loader` is also important with Hive 2.3 because Hive 2.3 client cannot talk with Hive 2.1 and lower. `Hive Isolated client loader` is one of the good design

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Cheng Lian
Oh, actually, in order to decouple Hadoop 3.2 and Hive 2.3 upgrades, we will need a hive-2.3 profile anyway, no matter having the hive-1.2 profile or not. On Wed, Nov 20, 2019 at 3:33 PM Cheng Lian wrote: > Just to summarize my points: > >1. Let's still keep the Hive 1.2 dependency in Spark

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Cheng Lian
Just to summarize my points: 1. Let's still keep the Hive 1.2 dependency in Spark 3.0, but it is optional. End-users may choose between Hive 1.2/2.3 via a new profile (either adding a hive-1.2 profile or adding a hive-2.3 profile works for me, depending on which Hive version we pick

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Cheng Lian
Dongjoon, I don't think we have any conflicts here. As stated in other threads multiple times, as long as Hive 2.3 and Hadoop 3.2 version upgrades can be decoupled, I have no preference over picking which Hive/Hadoop version as the default version. So the following two plans both work for me:

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
Nice. That's a progress. Let's narrow down to the path. We need to clarify what is the criteria we can agree. 1. What does `battle-tested for years` mean exactly? How and when can we start the `battle-tested` stage for Hive 2.3? 2. What is the new "Hive integration in Spark"? During

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Cheng Lian
Hey Dongjoon and Felix, I totally agree that Hive 2.3 is more stable than Hive 1.2. Otherwise, we wouldn't even consider integrating with Hive 2.3 in Spark 3.0. However, *"Hive" and "Hive integration in Spark" are two quite different things*, and I don't think anybody has ever mentioned "the

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Felix Cheung
Just to add - hive 1.2 fork is definitely not more stable. We know of a few critical bug fixes that we cherry picked into a fork of that fork to maintain ourselves. From: Dongjoon Hyun Sent: Wednesday, November 20, 2019 11:07:47 AM To: Sean Owen Cc: dev

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Dongjoon Hyun
Thanks. That will be a giant step forward, Sean! > I'd prefer making it the default in the POM for 3.0. Bests, Dongjoon. On Wed, Nov 20, 2019 at 11:02 AM Sean Owen wrote: > Yeah 'stable' is ambiguous. It's old and buggy, but at least it's the > same old and buggy that's been there a while.