[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923776#action_12923776
 ] 

Basab Maulik commented on HIVE-1634:
------------------------------------

Re: Beyond the review comments I added, I do have some higher-level suggestions:

    * For the column mapping, the reason I suggested "a:b:string" in the 
original JIRA description is that it's a pain to keep everything lined up by 
column position. It's already less than ideal that we do the column name 
mapping by position, so I don't think we should make it worse by having a 
separate property for type. Using the s/b shorthand is fine, and if you think 
that we shouldn't overload the colon, we can use a different separator, e.g. 
"cf:cq#s". Since the existing property name is hbase.columns.mapping, I don't 
think it will be confusing to roll in the (optional) type info as well.

I have adopted your suggestion of '#' as the separator to the storage 
information and use 'hbase.columns.mapping' to carry the additional storage 
information optionally. I have made a small change to allow any prefix of 
'string' or of 'binary' to be valid, i.e. s/b or str/bin or string/binary etc.

    * I'm wondering whether we can just use the existing classes like 
LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of 
creating new ones. Or are these not compatible with hbase.utils.Bytes?

I think the incompatibility stems more from trying to stay within the 
serde2.lazy.Lazy family of objects which the HBaseSerDe, LazyHBaseRow, and 
LazyHBaseCellMap extend or depend on. It will be useful to have these two 
families of classes compatible (inherit from a common base class). Small 
differences in the object inspector classes which type parametrize these 
classes further complicates getting past the type system. Should be doable but 
perhaps as a separate patch?

    * For the tests, I noticed that you have attached 
TestHiveHBaseExternalTable. I think it would be a good idea if you can create 
and populate such a fixture table in HBaseTestSetup; that way it can be 
available (treated as read-only) to all of the HBase .q tests. Otherwise, it's 
hard to verify that we're compatible with a table created directly through 
HBase API's rather than Hive.

Done. Added tests to create a Hive external table associated with this HBase 
table and test queries.

    * Also for the tests, it would be good if you can filter it down to only a 
small number of representative rows when pulling the initial test data set from 
the Hive src table. That way, we can keep the .q.out files smaller.

Done, the .out files are a lot smaller than in the initial patch.

    * Once we get this one committed, be sure to update the wiki.

Will do once this is committed.


> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a 
> specification of the storage option for the corresponding column in the serde 
> property "hbase.columns.mapping". Allowed values are '-' for table default, 
> 's' for standard string storage, and 'b' for binary storage as would be 
> obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
> use a colon separated pair such as 's:b' for the key and value part 
> specifiers respectively. See the test cases and queries for HBase handler for 
> additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" 
> to specify a table level default storage type. The other valid specification 
> is "binary". The table level default is overridden by a column level 
> specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, 
> float, and double primitive types. The attached patch also relaxes the 
> mapping of map types to HBase column families to allow any primitive type to 
> be the map key.
> Attached is a program for creating a table and populating it in HBase. The 
> external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double 
> double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = 
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 NULL    NULL    NULL    NULL    NULL    Test-String     NULL    NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double 
> double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = 
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 true    -128    -32768  -2147483648     -9223372036854775808    
> Test-String     -2.1793132E-11  2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double 
> double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = 
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 true    -128    -32768  -2147483648     -9223372036854775808    
> Test-String     -2.1793132E-11  2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to