[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125676#comment-15125676 ]
Swarnim Kulkarni commented on HIVE-6147: ---------------------------------------- {quote} It is pretty common to use schema-less avro objects in HBase. {quote} I am not sure if that is true(if possible at all). As far as my understanding goes, you will have to almost always provide the exact schema that was used while persisting the data when attempting to deserialize it and the best way to do that would be to store alongside the schema itself. Plus schema evolution is going to be a mess. Imagine writing a billion rows in HBase with one schema which evolves and then you write another billion rows with new schema. How do you ensure the first billion rows are still correctly readable? {quote} (if there are billions of rows with objects of the same type, it is not reasonable to store the same schema in all of them) and it is not convenient to write a customer schema retriever for each such case. {quote} Correct. I agree it is inefficient to store it for every single cell. Although IMO that isn't a good excuse to not write the schema at all. A better design in this case is to use some kind of schema registry, use a custom serializer, write the schema to the schema registry, generate a id of some kind and persist the id along with the data. Then when you are reading the data, use the id to pull the schema from the store and read the data. That is also where a custom implementation of an AvroSchemaRetriever makes sense where your custom implementation would know how to read your schema from the schema registry and get that to hive and let hive handle the deserialization from there on. > Support avro data stored in HBase columns > ----------------------------------------- > > Key: HIVE-6147 > URL: https://issues.apache.org/jira/browse/HIVE-6147 > Project: Hive > Issue Type: Improvement > Components: HBase Handler > Affects Versions: 0.12.0, 0.13.0 > Reporter: Swarnim Kulkarni > Assignee: Swarnim Kulkarni > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, > HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, > HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt > > > Presently, the HBase Hive integration supports querying only primitive data > types in columns. It would be nice to be able to store and query Avro objects > in HBase columns by making them visible as structs to Hive. This will allow > Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)