[
https://issues.apache.org/jira/browse/HIVE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakob Homan updated HIVE-2171:
------------------------------
Attachment: HIVE-2171.patch
Patch:
* Adds comment field to StructField interface and implements reasonable
versions to each of its implementations.
* Adds overloaded versions of each of the struct-based ObjectInspector
factories to allow the comments to be set.
* Adjusts MetastoreUtils to check if the comment of the field is null, if so,
maintains previous behavior, else uses the comment.
* Adds new unit test for MetastoreUtils. For this, mockito was added as a
dependency. Right now it looks like Hive's Ivy conf isn't set up to only
include some jars in the package. If this patch goes in, I'll open another
jira to make sure the mockito and other test-related jars aren't included in
jars they don't need to be.
* Refactors the TestStandardObjectInspectors test to test both with and without
comments.
After this patch, a serde that wants to specify comments can and have them show
up in the table description. For example, with a table kst created by an
implementation of SerDe, that has an example for each type (the comments are
all separate, they're all just boring: this is field BLAH) can now set the
field comments:
{noformat}hive> describe kst;
OK
string1 string this field is string1
string2 string this field is string2
int1 int this field is int1
boolean1 boolean this field is boolean1
long1 bigint this field is long1
float1 float this field is float1
double1 double this field is double1
inner_record1 struct<int_in_inner_record1:int,string_in_inner_record1:string>
this field is inner_record1
enum1 string this field is enum1
array1 array<string> this field is array1
map1 map<string,string> this field is map1
union1 uniontype<float,boolean,string> this field is union1
fixed1 array<tinyint> this field is fixed1
null1 void this field is null1
unionnullint int this field is UnionNullInt
bytes1 array<tinyint> this field is bytes1
ds string
Time taken: 0.286 seconds{noformat}
One thing I noticed is that these field comments on structs should extended to
substructures, and does with this new patch for custom serdes:
{noformat}hive> describe kst.inner_record1;
OK
int_in_inner_record1 int this field is int_in_inner_record1
string_in_inner_record1 string this field is string_in_inner_record1
Time taken: 0.113 seconds{noformat}
However, this doesn't work correctly with built-in serdes:
{noformat}hive> create table test_table(a STRUCT<z:string COMMENT 'comment for
z',x:int> COMMENT 'comment for a');
OK
Time taken: 2.565 seconds
hive> describe test_table;
OK
a struct<z:string,x:int> comment for a
Time taken: 0.139 seconds
hive> describe test_table.a;
OK
z string from deserializer
x int from deserializer
Time taken: 0.096 seconds
hive> describe test_table.a.z;
OK
z string from deserializer
Time taken: 0.089 seconds
hive>{noformat}
The comment for field z is lost, replaced by the boilerplate text "from
deserializer" and can't be retrieved from the CLI. I'll open a JIRA for this.
This is my first Hive patch, so please check to see if I missed anything.
> Allow custom serdes to set field comments
> -----------------------------------------
>
> Key: HIVE-2171
> URL: https://issues.apache.org/jira/browse/HIVE-2171
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 0.7.0
> Reporter: Jakob Homan
> Assignee: Jakob Homan
> Fix For: 0.7.1
>
> Attachments: HIVE-2171.patch
>
>
> Currently, while serde implementations can set a field's name, they can't set
> its comment. These are set in the metastore utils to {{(from
> deserializer)}}. For those serdes that can provide meaningful comments for a
> field, they should be propagated to the table description. These
> serde-provided comments could be prepended to "(from deserializer)" if others
> feel that's a meaningful distinction. This change involves updating
> {{StructField}} to support a (possibly null) comment field and then
> propagating this change out to the myriad places {{StructField}} is thrown
> around.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira