[ 
https://issues.apache.org/jira/browse/PIG-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988192#action_12988192
 ] 

Bill Graham commented on PIG-1782:
----------------------------------

I agree. Dmitriy, I like where you're going with new classes and deprecation, 
but maybe we could do this with just an enhanced (and backward compatible) 
HBaseStorage and a new AdvancedHBaseStorage.

* HBaseStorage
   * if you specific discrete columns, you get a tuple of values like the 
current behavior
   * if you specify one or more CFs (or possibly a CF with a wildcard column 
expression) you get back a tuple of maps
   * If you specify a mix, you get a tuple with values and maps. For example 
'cf2:foo c1: cf2:bar' would produce ( value, { col => value }, value }
   * This is backwards compatible and seems easiest to grok from a users 
perspective.

* AdvancedHBaseStorage
   * Somehow support mulitiple timestamps with a more complex data structure
   * One possibility is to use the data structure I suggested in my previous 
comment where everything is a map 
   * Another is to return something like the proposed HBaseStorage data 
structure, where each 'value' is replaced with ( (value, ts), ... )
   * We could hash out the specifics of AdvancedHBaseStorage in another JIRA if 
we decide to go this route

> Add ability to load data by column family in HBaseStorage
> ---------------------------------------------------------
>
>                 Key: PIG-1782
>                 URL: https://issues.apache.org/jira/browse/PIG-1782
>             Project: Pig
>          Issue Type: New Feature
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Bill Graham
>
> It would be nice to load all columns in the column family by using short hand 
> syntax like:
> {noformat}
> CpuMetrics = load 'hbase://SystemMetrics' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey');
> {noformat}
> Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1,  in 
> cpu column family.
> CpuMetrics would contain something like:
> {noformat}
> (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to