The metastore interactions in Spark are currently based on APIs that are in the Hive exec jar; so that makes it not possible to have Spark work with Hadoop 3 until the exec jar is upgraded.
It could be possible to re-implement those interactions based solely on the metastore client Hive publishes; but that would be a lot of work IIRC. I can't comment on how many people use Hive serde tables (although I do know they use it, just not how extensively), but that's not the only reason why Spark currently requires the hive-exec jar. On Tue, Jan 15, 2019 at 10:03 AM Xiao Li <gatorsm...@gmail.com> wrote: > > Let me take my words back. To read/write a table, Spark users do not use the > Hive execution JARs, unless they explicitly create the Hive serde tables. > Actually, I want to understand the motivation and use cases why your usage > scenarios need to create Hive serde tables instead of our Spark native tables? > > BTW, we are still using Hive metastore as our metadata store. This does not > require the Hive execution JAR upgrade, based on my understanding. Users can > upgrade it to the newer version of Hive metastore. > > Felix Cheung <felixcheun...@hotmail.com> 于2019年1月15日周二 上午9:56写道: >> >> And we are super 100% dependent on Hive... >> >> >> ________________________________ >> From: Ryan Blue <rb...@netflix.com.invalid> >> Sent: Tuesday, January 15, 2019 9:53 AM >> To: Xiao Li >> Cc: Yuming Wang; dev >> Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4 >> >> How do we know that most Spark users are not using Hive? I wouldn't be >> surprised either way, but I do want to make sure we aren't making decisions >> based on any one person's (or one company's) experience about what "most" >> Spark users do. >> >> On Tue, Jan 15, 2019 at 9:44 AM Xiao Li <gatorsm...@gmail.com> wrote: >>> >>> Hi, Yuming, >>> >>> Thank you for your contributions! The community aims at reducing the >>> dependence on Hive. Currently, most of Spark users are not using Hive. The >>> changes looks risky to me. >>> >>> To support Hadoop 3.x, we just need to resolve this JIRA: >>> https://issues.apache.org/jira/browse/HIVE-16391 >>> >>> Cheers, >>> >>> Xiao >>> >>> Yuming Wang <wgy...@gmail.com> 于2019年1月15日周二 上午8:41写道: >>>> >>>> Dear Spark Developers and Users, >>>> >>>> >>>> >>>> Hyukjin and I plan to upgrade the built-in Hive from1.2.1-spark2 to 2.3.4 >>>> to solve some critical issues, such as support Hadoop 3.x, solve some ORC >>>> and Parquet issues. This is the list: >>>> >>>> Hive issues: >>>> >>>> [SPARK-26332][HIVE-10790] Spark sql write orc table on viewFS throws >>>> exception >>>> >>>> [SPARK-25193][HIVE-12505] insert overwrite doesn't throw exception when >>>> drop old data fails >>>> >>>> [SPARK-26437][HIVE-13083] Decimal data becomes bigint to query, unable to >>>> query >>>> >>>> [SPARK-25919][HIVE-11771] Date value corrupts when tables are >>>> "ParquetHiveSerDe" formatted and target table is Partitioned >>>> >>>> [SPARK-12014][HIVE-11100] Spark SQL query containing semicolon is broken >>>> in Beeline >>>> >>>> >>>> >>>> Spark issues: >>>> >>>> [SPARK-23534] Spark run on Hadoop 3.0.0 >>>> >>>> [SPARK-20202] Remove references to org.spark-project.hive >>>> >>>> [SPARK-18673] Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop >>>> version >>>> >>>> [SPARK-24766] CreateHiveTableAsSelect and InsertIntoHiveDir won't generate >>>> decimal column stats in parquet >>>> >>>> >>>> >>>> >>>> >>>> Since the code for the hive-thriftserver module has changed too much for >>>> this upgrade, I split it into two PRs for easy review. >>>> >>>> The first PR does not contain the changes of hive-thriftserver. Please >>>> ignore the failed test in hive-thriftserver. >>>> >>>> The second PR is complete changes. >>>> >>>> >>>> >>>> I have created a Spark distribution for Apache Hadoop 2.7, you might >>>> download it viaGoogle Drive or Baidu Pan. >>>> >>>> Please help review and test. Thanks. >> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix -- Marcelo --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org