[
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503522#comment-13503522
]
Eric Yang commented on PIG-1832:
--------------------------------
For loading HBase data with timestamp, the API could look like this:
{code}
a = load 'hbase://table1' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*',
'-loadKey -gt $START -caster Utf8StorageConverter -timeRange
$startTs,$endTs');
{code}
For storing, I am inclined to suggest a new callback user defined function in
HBaseStorage as parameter, this will enable to extract timestamp from row key,
and set the timestamp at cell level. For example:
{code}
STORE table2 INTO 'table2' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:s1 cf2:s2',
'-cb
org.apache.pig.backend.hadoop.hbase.TimestampExtractor("\\w+-\\d+-\\w+")');
{code}
It could also be used by setting data with bulk loaded timestamp:
{code}
STORE table2 INTO 'table2' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:s1 cf2:s2',
'-cb org.apache.pig.backend.hadoop.hbase.TimestampSetter($ts)');
{code}
Any thoughts?
> Support timestamp in HBaseStorage
> ---------------------------------
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
> Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
> Reporter: Eric Yang
>
> When storing data into HBase using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is
> stored with insertion time of the mapreduce job. It would be nice to have a
> way to populate timestamp from user data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira