[
https://issues.apache.org/jira/browse/HIVE-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252649#comment-13252649
]
Travis Crawford commented on HIVE-2941:
---------------------------------------
Here are some additional details about the issue. Consider the following create
table statement. Columns will be discovered for the table by reflecting on the
{{Person}} object (instead of explicitly specifying them).
{code}
hive> create external table travis_test.person_test
> partitioned by (part_dt string)
> row format serde "com.twitter.elephantbird.hive.serde.ThriftSerDe"
> with serdeproperties
("serialization.class"="com.twitter.elephantbird.examples.thrift.Person")
> stored as
> inputformat
"com.twitter.elephantbird.mapred.input.HiveMultiInputFormat"
> outputformat
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
{code}
Current behavior does not expand nested structures, listing the class name of
nested structs as the field type. Users browsing the schema do not get a full
definition of the table schema.
{code}
hive> describe extended person_test;
OK
name com.twitter.elephantbird.examples.thrift.Name from deserializer
id int from deserializer
email string from deserializer
phones array<com.twitter.elephantbird.examples.thrift.PhoneNumber> from
deserializer
part_dt string
{code}
This patch expands nested structures, showing the full table schema. Here's an
example of what the table looks like with the patch:
{code}
hive> describe extended person_test;
OK
name struct<first_name:string,last_name:string> from deserializer
id int from deserializer
email string from deserializer
phones array<struct<number:string,type:struct<value:int>>> from
deserializer
part_dt string
{code}
In both cases, the table storage descriptor is unchanged - both list the
columns as {{cols:[]}}.
I believe the reflected table schema should be copied into the partition
storage descriptor when adding a new partition, but that could be a separate
change.
> Hive should expand nested structs when setting the table schema from thrift
> structs
> -----------------------------------------------------------------------------------
>
> Key: HIVE-2941
> URL: https://issues.apache.org/jira/browse/HIVE-2941
> Project: Hive
> Issue Type: Bug
> Reporter: Travis Crawford
> Assignee: Travis Crawford
> Attachments: HIVE-2941.D2721.1.patch
>
>
> When setting a table serde, the deserializer is queried for its schema, which
> is used to set the metastore table schema. The current implementation uses
> the class name stored in the field as the field type.
> By storing the class name as the field type, users cannot see the contents of
> a struct with "describe tblname". Applications that query HiveMetaStore for
> the table schema (specifically HCatalog in this case) see an unknown field
> type, rather than a struct containing known field types.
> Hive should store the expanded schema in the metastore so users browsing the
> schema see expanded fields, and applications querying metastore see familiar
> types.
> DETAILS
> Set the table serde to something like this. This serde uses the built-in
> {{ThriftStructObjectInspector}}.
> {code}
> alter table foo_test
> set serde "com.twitter.elephantbird.hive.serde.ThriftSerDe"
> with serdeproperties ("serialization.class"="com.foo.Foo");
> {code}
> This causes a call to {{MetaStoreUtils.getFieldsFromDeserializer}} which
> returns a list of fields and their schemas. However, currently it does not
> handle nested structs, and if {{com.foo.Foo}} above contains a field
> {{com.foo.Bar}}, the class name {{com.foo.Bar}} would appear as the field
> type. Instead, nested structs should be expanded.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira