Thanks. That will be a giant step forward, Sean!

> I'd prefer making it the default in the POM for 3.0.

Bests,
Dongjoon.

On Wed, Nov 20, 2019 at 11:02 AM Sean Owen <sro...@gmail.com> wrote:

> Yeah 'stable' is ambiguous. It's old and buggy, but at least it's the
> same old and buggy that's been there a while. "stable" in that sense
> I'm sure there is a lot more delta between Hive 1 and 2 in terms of
> bug fixes that are important; the question isn't just 1.x releases.
>
> What I don't know is how much affects Spark, as it's a Hive client
> mostly. Clearly some do.
>
> I'd prefer making it the default in the POM for 3.0. Mostly on the
> grounds that its effects are on deployed clusters, not apps. And
> deployers can still choose a binary distro with 1.x or make the choice
> they want. Those that don't care should probably be nudged to 2.x.
> Spark 3.x is already full of behavior changes and 'unstable', so I
> think this is minor relative to the overall risk question.
>
> On Wed, Nov 20, 2019 at 12:53 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
> >
> > Hi, All.
> >
> > I'm sending this email because it's important to discuss this topic
> narrowly
> > and make a clear conclusion.
> >
> > `The forked Hive 1.2.1 is stable`? It sounds like a myth we created
> > by ignoring the existing bugs. If you want to say the forked Hive 1.2.1
> is
> > stabler than XXX, please give us the evidence. Then, we can fix it.
> > Otherwise, let's stop making `The forked Hive 1.2.1` invincible.
> >
> > Historically, the following forked Hive 1.2.1 has never been stable.
> > It's just frozen. Since the forked Hive is out of our control, we
> ignored bugs.
> > That's all. The reality is a way far from the stable status.
> >
> >     https://mvnrepository.com/artifact/org.spark-project.hive/
> >
> https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark
> (2015 August)
> >
> https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark2
> (2016 April)
> >
> > First, let's begin Hive itself by comparing with Apache Hive 1.2.2 and
> 1.2.3,
> >
> >     Apache Hive 1.2.2 has 50 bug fixes.
> >     Apache Hive 1.2.3 has 9 bug fixes.
> >
> > I will not cover all of them, but Apache Hive community also backports
> > important patches like Apache Spark community.
> >
> > Second, let's move to SPARK issues because we aren't exposed to all Hive
> issues.
> >
> >     SPARK-19109 ORC metadata section can sometimes exceed protobuf
> message size limit
> >     SPARK-22267 Spark SQL incorrectly reads ORC file when column order
> is different
> >
> > These were reported since Apache Spark 1.6.x because the forked Hive
> doesn't have
> > a proper upstream patch like HIVE-11592 (fixed at Apache Hive 1.3.0).
> >
> > Since we couldn't update the frozen forked Hive, we added Apache ORC
> dependency
> > at SPARK-20682 (2.3.0), added a switching configuration at SPARK-20728
> (2.3.0),
> > tured on `spark.sql.hive.convertMetastoreOrc by default` at SPARK-22279
> (2.4.0).
> > However, if you turn off the switch and start to use the forked hive,
> > you will be exposed to the buggy forked Hive 1.2.1 again.
> >
> > Third, let's talk about the new features like Hadoop 3 and JDK11.
> > No one believe that the ancient forked Hive 1.2.1 will work with this.
> > I saw that the following issue is mentioned as an evidence of Hive 2.3.6
> bug.
> >
> >     SPARK-29245 ClassCastException during creating HiveMetaStoreClient
> >
> > Yes. I know that issue because I reported it and verified HIVE-21508.
> > It's fixed already and will be released ad Apache Hive 2.3.7.
> >
> > Can we imagine something like this in the forked Hive 1.2.1?
> > 'No'. There is no future on it. It's frozen.
> >
> > From now, I want to claim that the forked Hive 1.2.1 is the unstable one.
> > I welcome all your positive and negative opinions.
> > Please share your concerns and problems and fix them together.
> > Apache Spark is an open source project we shared.
> >
> > Bests,
> > Dongjoon.
> >
>

Reply via email to