[ 
https://issues.apache.org/jira/browse/HIVE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16241398#comment-16241398
 ] 

Vihang Karajgaonkar commented on HIVE-17714:
--------------------------------------------

Hi [~sershe] I am still looking at the AvroSerDe (and 
SerDes/ObjectInspectors/TypeInfos in general), so may be I don't understand the 
big picture correctly but here are my thoughts:

I am not sure if changing all the metastore APIs to fallback on getting the 
schema from Deserializer is the way forward. This will create strong dependency 
with the Hive source code and make the metastore separation work largely 
irrelevant (in the sense, you won't be able to use standalone metastore with 
having hive jars in the classpath). I like the idea of looking if we can move 
the Deserializer and friends to some common project (storage-api?) or metastore 
instead. But I think Alan had investigated that early on in his work and it was 
not trivial. SerDes, TypeInfo and ObjectInspector are all intertwined such that 
we cannot move one out without the others if I understand it right.

I am not sure how it makes sense from design perspective for metastore to serve 
something which it doesn't know in the first place and has to go an external 
source to fetch that information. Its not really a metastore if it doesn't 
store metadata isn't it. Do you know what was the motivation of using this way 
to get the fields information from external sources (urls).

> move custom SerDe schema considerations into metastore from QL
> --------------------------------------------------------------
>
>                 Key: HIVE-17714
>                 URL: https://issues.apache.org/jira/browse/HIVE-17714
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Alan Gates
>
> Columns in metastore for tables that use external schema don't have the type 
> information (since HIVE-11985) and may be entirely inconsistent (since 
> forever, due to issues like HIVE-17713; or for SerDes that allow an URL for 
> the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA, 
> and to MetaStoreUtils.getFieldsFromDeserializer, you'd see that the code in 
> QL handles this in Hive. So, for the most part metastore just returns 
> whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is 
> interesting... so getTable will return incorrect columns (potentially), but 
> get_fields/get_schema will return correct ones from SerDe as far as I can 
> tell.
> As part of separating the metastore, we should make sure all the APIs return 
> the correct schema for the columns; it's not a good idea to have everyone 
> reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to