[
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910305#action_12910305
]
John Sichi commented on HIVE-1634:
----------------------------------
Hey Basab,
This is a great start. Beyond the review comments I added, I do have some
higher-level suggestions:
* For the column mapping, the reason I suggested "a:b:string" in the original
JIRA description is that it's a pain to keep everything lined up by column
position. It's already less than ideal that we do the column name mapping by
position, so I don't think we should make it worse by having a separate
property for type. Using the s/b shorthand is fine, and if you think that we
shouldn't overload the colon, we can use a different separator, e.g. "cf:cq#s".
Since the existing property name is hbase.columns.mapping, I don't think it
will be confusing to roll in the (optional) type info as well.
* I'm wondering whether we can just use the existing classes like
LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of
creating new ones. Or are these not compatible with hbase.utils.Bytes?
* For the tests, I noticed that you have attached TestHiveHBaseExternalTable.
I think it would be a good idea if you can create and populate such a fixture
table in HBaseTestSetup; that way it can be available (treated as read-only) to
all of the HBase .q tests. Otherwise, it's hard to verify that we're
compatible with a table created directly through HBase API's rather than Hive.
* Also for the tests, it would be good if you can filter it down to only a
small number of representative rows when pulling the initial test data set from
the Hive src table. That way, we can keep the .q.out files smaller.
* Once we get this one committed, be sure to update the wiki.
> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
> Key: HIVE-1634
> URL: https://issues.apache.org/jira/browse/HIVE-1634
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: HBase Handler
> Affects Versions: 0.7.0
> Reporter: Basab Maulik
> Assignee: Basab Maulik
> Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a
> specification of the storage option for the corresponding column in the serde
> property "hbase.columns.mapping". Allowed values are '-' for table default,
> 's' for standard string storage, and 'b' for binary storage as would be
> obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families
> use a colon separated pair such as 's:b' for the key and value part
> specifiers respectively. See the test cases and queries for HBase handler for
> additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string"
> to specify a table level default storage type. The other valid specification
> is "binary". The table level default is overridden by a column level
> specification.
> This control is available for the boolean, tinyint, smallint, int, bigint,
> float, and double primitive types. The attached patch also relaxes the
> mapping of map types to HBase column families to allow any primitive type to
> be the map key.
> Attached is a program for creating a table and populating it in HBase. The
> external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
> > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
> > c_int int, c_long bigint, c_string string, c_float float, c_double
> double)
> > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> > with serdeproperties ("hbase.columns.mapping" =
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
> > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 NULL NULL NULL NULL NULL Test-String NULL NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
> > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
> > c_int int, c_long bigint, c_string string, c_float float, c_double
> double)
> > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> > with serdeproperties (
> > "hbase.columns.mapping" =
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
> > "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
> > tblproperties (
> > "hbase.table.name" = "TestHiveHBaseExternalTable",
> > "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 true -128 -32768 -2147483648 -9223372036854775808
> Test-String -2.1793132E-11 2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
> > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
> > c_int int, c_long bigint, c_string string, c_float float, c_double
> double)
> > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> > with serdeproperties (
> > "hbase.columns.mapping" =
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
> > "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
> > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 true -128 -32768 -2147483648 -9223372036854775808
> Test-String -2.1793132E-11 2.01345E291
> Time taken: 0.245 seconds
> hive>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.