Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Marcelo Vanzin Tue, 15 Jan 2019 10:09:19 -0800

The metastore interactions in Spark are currently based on APIs that
are in the Hive exec jar; so that makes it not possible to have Spark
work with Hadoop 3 until the exec jar is upgraded.


It could be possible to re-implement those interactions based solely
on the metastore client Hive publishes; but that would be a lot of
work IIRC.

I can't comment on how many people use Hive serde tables (although I
do know they use it, just not how extensively), but that's not the
only reason why Spark currently requires the hive-exec jar.

On Tue, Jan 15, 2019 at 10:03 AM Xiao Li <gatorsm...@gmail.com> wrote:
>
> Let me take my words back. To read/write a table, Spark users do not use the 
> Hive execution JARs, unless they explicitly create the Hive serde tables. 
> Actually, I want to understand the motivation and use cases why your usage 
> scenarios need to create Hive serde tables instead of our Spark native tables?
>
> BTW, we are still using Hive metastore as our metadata store. This does not 
> require the Hive execution JAR upgrade, based on my understanding. Users can 
> upgrade it to the newer version of Hive metastore.
>
> Felix Cheung <felixcheun...@hotmail.com> 于2019年1月15日周二 上午9:56写道：
>>
>> And we are super 100% dependent on Hive...
>>
>>
>> ________________________________
>> From: Ryan Blue <rb...@netflix.com.invalid>
>> Sent: Tuesday, January 15, 2019 9:53 AM
>> To: Xiao Li
>> Cc: Yuming Wang; dev
>> Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4
>>
>> How do we know that most Spark users are not using Hive? I wouldn't be 
>> surprised either way, but I do want to make sure we aren't making decisions 
>> based on any one person's (or one company's) experience about what "most" 
>> Spark users do.
>>
>> On Tue, Jan 15, 2019 at 9:44 AM Xiao Li <gatorsm...@gmail.com> wrote:
>>>
>>> Hi, Yuming,
>>>
>>> Thank you for your contributions! The community aims at reducing the 
>>> dependence on Hive. Currently, most of Spark users are not using Hive. The 
>>> changes looks risky to me.
>>>
>>> To support Hadoop 3.x, we just need to resolve this JIRA: 
>>> https://issues.apache.org/jira/browse/HIVE-16391
>>>
>>> Cheers,
>>>
>>> Xiao
>>>
>>> Yuming Wang <wgy...@gmail.com> 于2019年1月15日周二 上午8:41写道：
>>>>
>>>> Dear Spark Developers and Users,
>>>>
>>>>
>>>>
>>>> Hyukjin and I plan to upgrade the built-in Hive from1.2.1-spark2 to 2.3.4 
>>>> to solve some critical issues, such as support Hadoop 3.x, solve some ORC 
>>>> and Parquet issues. This is the list:
>>>>
>>>> Hive issues:
>>>>
>>>> [SPARK-26332][HIVE-10790] Spark sql write orc table on viewFS throws 
>>>> exception
>>>>
>>>> [SPARK-25193][HIVE-12505] insert overwrite doesn't throw exception when 
>>>> drop old data fails
>>>>
>>>> [SPARK-26437][HIVE-13083] Decimal data becomes bigint to query, unable to 
>>>> query
>>>>
>>>> [SPARK-25919][HIVE-11771] Date value corrupts when tables are 
>>>> "ParquetHiveSerDe" formatted and target table is Partitioned
>>>>
>>>> [SPARK-12014][HIVE-11100] Spark SQL query containing semicolon is broken 
>>>> in Beeline
>>>>
>>>>
>>>>
>>>> Spark issues:
>>>>
>>>> [SPARK-23534] Spark run on Hadoop 3.0.0
>>>>
>>>> [SPARK-20202] Remove references to org.spark-project.hive
>>>>
>>>> [SPARK-18673] Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop 
>>>> version
>>>>
>>>> [SPARK-24766] CreateHiveTableAsSelect and InsertIntoHiveDir won't generate 
>>>> decimal column stats in parquet
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Since the code for the hive-thriftserver module has changed too much for 
>>>> this upgrade, I split it into two PRs for easy review.
>>>>
>>>> The first PR does not contain the changes of hive-thriftserver. Please 
>>>> ignore the failed test in hive-thriftserver.
>>>>
>>>> The second PR is complete changes.
>>>>
>>>>
>>>>
>>>> I have created a Spark distribution for Apache Hadoop 2.7, you might 
>>>> download it viaGoogle Drive or Baidu Pan.
>>>>
>>>> Please help review and test. Thanks.
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Reply via email to