[ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503522#comment-13503522
 ] 

Eric Yang commented on PIG-1832:
--------------------------------

For loading HBase data with timestamp, the API could look like this:

{code}
a = load 'hbase://table1' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*', 
  '-loadKey -gt $START -caster Utf8StorageConverter -timeRange 
$startTs,$endTs');
{code}

For storing, I am inclined to suggest a new callback user defined function in 
HBaseStorage as parameter, this will enable to extract timestamp from row key, 
and set the timestamp at cell level.  For example:

{code}
STORE table2 INTO 'table2' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:s1 cf2:s2', 
  '-cb 
org.apache.pig.backend.hadoop.hbase.TimestampExtractor("\\w+-\\d+-\\w+")');
{code}

It could also be used by setting data with bulk loaded timestamp:

{code}
STORE table2 INTO 'table2' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:s1 cf2:s2', 
  '-cb org.apache.pig.backend.hadoop.hbase.TimestampSetter($ts)');
{code}

Any thoughts?
                
> Support timestamp in HBaseStorage
> ---------------------------------
>
>                 Key: PIG-1832
>                 URL: https://issues.apache.org/jira/browse/PIG-1832
>             Project: Pig
>          Issue Type: Improvement
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to