Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Felix Cheung Tue, 15 Jan 2019 09:55:50 -0800

Resolving https://issues.apache.org/jira/browse/HIVE-16391 means to keep Spark 
on Hive 1.2?


I’m not sure that is reducing dependency on Hive - Hive is still there and it’s 
a very old Hive. IMO it is increasing the risk the longer we keep on this. (And 
it’s been years)

Looking at the two PR. They don’t seem very drastic to me, except for thrift 
server. Is there another, better approach to thrift server?


________________________________
From: Xiao Li <gatorsm...@gmail.com>
Sent: Tuesday, January 15, 2019 9:44 AM
To: Yuming Wang
Cc: dev
Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Hi, Yuming,

Thank you for your contributions! The community aims at reducing the dependence 
on Hive. Currently, most of Spark users are not using Hive. The changes looks 
risky to me.

To support Hadoop 3.x, we just need to resolve this JIRA: 
https://issues.apache.org/jira/browse/HIVE-16391

Cheers,

Xiao

Yuming Wang <wgy...@gmail.com<mailto:wgy...@gmail.com>> 于2019年1月15日周二 上午8:41写道：
Dear Spark Developers and Users,

Hyukjin and I plan to upgrade the built-in Hive from 
1.2.1-spark2<https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2> to 
2.3.4<https://github.com/apache/hive/releases/tag/rel%2Frelease-2.3.4> to solve 
some critical issues, such as support Hadoop 3.x, solve some ORC and Parquet 
issues. This is the list:
Hive issues:
[SPARK-26332<https://issues.apache.org/jira/browse/SPARK-26332>][HIVE-10790] 
Spark sql write orc table on viewFS throws exception
[SPARK-25193<https://issues.apache.org/jira/browse/SPARK-25193>][HIVE-12505] 
insert overwrite doesn't throw exception when drop old data fails
[SPARK-26437<https://issues.apache.org/jira/browse/SPARK-26437>][HIVE-13083] 
Decimal data becomes bigint to query, unable to query
[SPARK-25919<https://issues.apache.org/jira/browse/SPARK-25919>][HIVE-11771] 
Date value corrupts when tables are "ParquetHiveSerDe" formatted and target 
table is Partitioned
[SPARK-12014<https://issues.apache.org/jira/browse/SPARK-12014>][HIVE-11100] 
Spark SQL query containing semicolon is broken in Beeline

Spark issues:
[SPARK-23534<https://issues.apache.org/jira/browse/SPARK-23534>] Spark run on 
Hadoop 3.0.0
[SPARK-20202<https://issues.apache.org/jira/browse/SPARK-20202>] Remove 
references to org.spark-project.hive
[SPARK-18673<https://issues.apache.org/jira/browse/SPARK-18673>] Dataframes 
doesn't work on Hadoop 3.x; Hive rejects Hadoop version
[SPARK-24766<https://issues.apache.org/jira/browse/SPARK-24766>] 
CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column 
stats in parquet


Since the code for the hive-thriftserver module has changed too much for this 
upgrade, I split it into two PRs for easy review.
The first PR<https://github.com/apache/spark/pull/23552> does not contain the 
changes of hive-thriftserver. Please ignore the failed test in 
hive-thriftserver.
The second PR<https://github.com/apache/spark/pull/23553> is complete changes.

I have created a Spark distribution for Apache Hadoop 2.7, you might download 
it via Google 
Drive<https://drive.google.com/open?id=1cq2I8hUTs9F4JkFyvRfdOJ5BlxV0ujgt> or 
Baidu Pan<https://pan.baidu.com/s/1b090Ctuyf1CDYS7c0puBqQ>.
Please help review and test. Thanks.

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Reply via email to