[ 
https://issues.apache.org/jira/browse/SPARK-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221795#comment-14221795
 ] 

Cheng Lian commented on SPARK-4114:
-----------------------------------

Did some research into this and came to the following conclusions:

# HCatalog API itself doesn’t help to eliminate the shim layer
  HCatalog API implementation simply delegates to the underlying raw Metastore 
API. It’s basically a more user friendly but feature incomplete wrapper over 
raw Metastore API. For example it doesn’t support altering table properties, 
which we use to implement the ANALYZE command.
# Talking to remote mode Metastore service in Thrift protocol helps (either 
with HCatalog API or raw Metastore API)
  HCatalog requires users to deploy a remote mode Metastore service. This 
forces the Metastore client to talk in Thrift protocol, which is exactly the 
same protocol raw Metastore API speaks, and seems to be compatible between 
0.12.0 and 0.13.1. Currently we usually use local or embedded Metastore (esp. 
for regression tests), and access the underlying Metastore database 
(MySQL/Derby) via JDBC. This makes us suffer from, for example, database schema 
changes between Hive versions.
# We can get rid of the shim layer as long as the Metastore Thrift protocol of 
a higher Hive version (0.13.1) is strictly a super set of that of lower Hive 
version (0.12.0) and downward compatible
  (And we *don’t* need to replace the raw Metastore API with HCatalog.)

However, I haven't gone through all Thrift protocol changes between Hive 0.12.0 
and 0.13.1, and didn't find official documents that claims protocol 
compatibility yet.

> Use stable Hive API (if one exists) for communication with Metastore
> --------------------------------------------------------------------
>
>                 Key: SPARK-4114
>                 URL: https://issues.apache.org/jira/browse/SPARK-4114
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Patrick Wendell
>            Priority: Blocker
>
> If one exists, we should use a stable API for our communication with the Hive 
> metastore. Specifically, we don't want to have to support compiling against 
> multiple versions of the Hive library to support users with different 
> versions of the Hive metastore.
> I think this is what HCatalog API's are intended for, but I don't know enough 
> about Hive and HCatalog to be sure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to