Thanks. That will be a giant step forward, Sean! > I'd prefer making it the default in the POM for 3.0.
Bests, Dongjoon. On Wed, Nov 20, 2019 at 11:02 AM Sean Owen <sro...@gmail.com> wrote: > Yeah 'stable' is ambiguous. It's old and buggy, but at least it's the > same old and buggy that's been there a while. "stable" in that sense > I'm sure there is a lot more delta between Hive 1 and 2 in terms of > bug fixes that are important; the question isn't just 1.x releases. > > What I don't know is how much affects Spark, as it's a Hive client > mostly. Clearly some do. > > I'd prefer making it the default in the POM for 3.0. Mostly on the > grounds that its effects are on deployed clusters, not apps. And > deployers can still choose a binary distro with 1.x or make the choice > they want. Those that don't care should probably be nudged to 2.x. > Spark 3.x is already full of behavior changes and 'unstable', so I > think this is minor relative to the overall risk question. > > On Wed, Nov 20, 2019 at 12:53 PM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > > > > Hi, All. > > > > I'm sending this email because it's important to discuss this topic > narrowly > > and make a clear conclusion. > > > > `The forked Hive 1.2.1 is stable`? It sounds like a myth we created > > by ignoring the existing bugs. If you want to say the forked Hive 1.2.1 > is > > stabler than XXX, please give us the evidence. Then, we can fix it. > > Otherwise, let's stop making `The forked Hive 1.2.1` invincible. > > > > Historically, the following forked Hive 1.2.1 has never been stable. > > It's just frozen. Since the forked Hive is out of our control, we > ignored bugs. > > That's all. The reality is a way far from the stable status. > > > > https://mvnrepository.com/artifact/org.spark-project.hive/ > > > https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark > (2015 August) > > > https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark2 > (2016 April) > > > > First, let's begin Hive itself by comparing with Apache Hive 1.2.2 and > 1.2.3, > > > > Apache Hive 1.2.2 has 50 bug fixes. > > Apache Hive 1.2.3 has 9 bug fixes. > > > > I will not cover all of them, but Apache Hive community also backports > > important patches like Apache Spark community. > > > > Second, let's move to SPARK issues because we aren't exposed to all Hive > issues. > > > > SPARK-19109 ORC metadata section can sometimes exceed protobuf > message size limit > > SPARK-22267 Spark SQL incorrectly reads ORC file when column order > is different > > > > These were reported since Apache Spark 1.6.x because the forked Hive > doesn't have > > a proper upstream patch like HIVE-11592 (fixed at Apache Hive 1.3.0). > > > > Since we couldn't update the frozen forked Hive, we added Apache ORC > dependency > > at SPARK-20682 (2.3.0), added a switching configuration at SPARK-20728 > (2.3.0), > > tured on `spark.sql.hive.convertMetastoreOrc by default` at SPARK-22279 > (2.4.0). > > However, if you turn off the switch and start to use the forked hive, > > you will be exposed to the buggy forked Hive 1.2.1 again. > > > > Third, let's talk about the new features like Hadoop 3 and JDK11. > > No one believe that the ancient forked Hive 1.2.1 will work with this. > > I saw that the following issue is mentioned as an evidence of Hive 2.3.6 > bug. > > > > SPARK-29245 ClassCastException during creating HiveMetaStoreClient > > > > Yes. I know that issue because I reported it and verified HIVE-21508. > > It's fixed already and will be released ad Apache Hive 2.3.7. > > > > Can we imagine something like this in the forked Hive 1.2.1? > > 'No'. There is no future on it. It's frozen. > > > > From now, I want to claim that the forked Hive 1.2.1 is the unstable one. > > I welcome all your positive and negative opinions. > > Please share your concerns and problems and fix them together. > > Apache Spark is an open source project we shared. > > > > Bests, > > Dongjoon. > > >