[ 
https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642511#comment-14642511
 ] 

Wojciech Indyk commented on HIVE-11329:
---------------------------------------

[~swarnim] Example: I define a map prefix "loc_" for geographical locations of 
a row in HBase.
I create table in Hive
CREATE EXTERNAL TABLE xyz(id int, locations map<string,string>)
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, source:loc_.*")
Then I want to query:
select id where locations['California']!=null
instead of 
select id where locations['loc_California']!=null

Moreover if I query:
select id, locations where locations['California']!=null
I would like to have a result like:
1, {California:state, New York:state}
instead of
1, {loc_California:state, loc_New York:state}

In general: I don't want to receive the prefix for each element of the map in 
hive. I know what the prefix for the map is (it is defined in SERDEPROPERTIES). 
It is hard to use prefixed data with another data sources, e.g. a 
IP->geolocation libraries.

All in all it's easier to integrate data without prefixes. IMO Prefixes are 
artificial structure (like 'super-column') to optimize queries and be able to 
store a map in hbase. That's why i want to cut prefixes. I know a solution with 
column family -> hive map, but HBase doesn't support more than 2 CF well. I 
need ~10 maps in row.

I think the idea with a flag is very well. IMO it could be flag defined on 
creating table.

> Column prefix in key of hbase column prefix map
> -----------------------------------------------
>
>                 Key: HIVE-11329
>                 URL: https://issues.apache.org/jira/browse/HIVE-11329
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>    Affects Versions: 0.14.0
>            Reporter: Wojciech Indyk
>            Assignee: Wojciech Indyk
>            Priority: Minor
>         Attachments: HIVE-11329.1.patch
>
>
> When I create a table with hbase column prefix 
> https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result 
> map in hive. 
> E.g. record in HBase
> rowkey: 123
> column: tag_one, value: 0.5
> column: tag_two, value 0.5
> representation in Hive via column prefix mapping "tag_.*":
> column: tag map<string,string>
> key: tag_one, value: 0.5
> key: tag_two, value: 0.5
> should be:
> key: one, value: 0.5
> key: two: value: 0.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to