[
https://issues.apache.org/jira/browse/HCATALOG-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Travis Crawford updated HCATALOG-443:
-------------------------------------
Attachment: HCATALOG-443_api_to_metadata_deserializer.2.patch
This patch version adds a few more changes necessary to get our data working
with HCat trunk.
Summarizing the patch as a whole:
* Switch to "org.apache.hadoop.hive.ql.metadata" classes except when
serializing. These versions add additional business logic. Specifically I care
about dynamically reported schemas, but there's some other logic too.
* Add support for deserializer-only read path. For example, ThriftDeserializer
does not work with the current HCatRecordReader because it does not implement
Serializer.
* Bugfix to actually use SerDeInfo properties to initialize the deserializer.
* Allow users to get string values of enum fields. By default you still get
struct<value:int> but if you specify "string" as the field type enums will be
returned as strings. This is necessary for integration in our environment as
Elephant-Bird behaves this way.
* Small updates to how binary fields are handled based on Hive changes.
* Add TestHCatHiveThriftCompatibility test to ensure HCat works with
serde-reported schemas.
* Update some tests to use HCatBaseTest (and junit 4 style) so they run in my
IDE, which was needed to debug them.
> Use "metadata" Table/Partition classes, and Deserializer when reading
> ---------------------------------------------------------------------
>
> Key: HCATALOG-443
> URL: https://issues.apache.org/jira/browse/HCATALOG-443
> Project: HCatalog
> Issue Type: Bug
> Reporter: Travis Crawford
> Assignee: Travis Crawford
> Attachments: HCATALOG-443_api_to_metadata_deserializer.1.patch,
> HCATALOG-443_api_to_metadata_deserializer.2.patch
>
>
> This issue is related to HIVE-2950.
> When HCatalog queries the HiveMetaStore it gets back classes in the
> "org.apache.hadoop.hive.metastore.api" package. This represents exactly what
> is stored in the metastore database.
> Hive has companion classes in "org.apache.hadoop.hive.ql.metadata" that
> provide some logic on top of what's stored in the actual database. For
> example:
> * org.apache.hadoop.hive.metastore.api.Table.getCols shows columns explicitly
> stored in the database
> * org.apache.hadoop.hive.ql.metadata.Table.getCols shows columns reported by
> the serde if there are any.
> Except when serializing stuff into the job configuration HCatalog should use
> the "metadata" version of these classes so that the additional logic is
> called.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira