[jira] Updated: (PIG-1782) Add ability to load data by column family in HBaseStorage

Bill Graham (JIRA) Thu, 24 Feb 2011 15:21:05 -0800

     [ 
https://issues.apache.org/jira/browse/PIG-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bill Graham updated PIG-1782:
-----------------------------

    Attachment: PIG_1782_2.patch

Attached is a second patch. This one is built to be applied on top of the 
PIG_1680.3.patch.

>From the Javadocs:

An HBase implementation of LoadFunc and StoreFunc.

Below is an example showing how to load data from HBase:

{code}
raw = LOAD 'hbase://SampleTable'
      USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')
       AS (id:bytearray, first_name:chararray, last_name:chararray, 
friends_map:map[], info_map:map[]);
{code}
This example loads data redundantly from the info column family just to 
illustrate usage. Note that the row key is inserted first in the result schema. 
To load only column names that start with a given prefix, specify the column 
prefix with a trailing \*. For example passing {{friends:bob_*}} to the 
constructor in the above example would cause only columns that start with 
_bob__ to be loaded.

Below is an example showing how to store data into HBase:
{code}
 copy = STORE raw INTO 'hbase://SampleTableCopy'
       USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       'info:first_name info:last_name friends:* info:*')
       AS (info:first_name info:last_name buddies:* info:*);
{code}
Note that {{STORE}} will expect the first value in the tuple to be the row key. 
Scalar values need to map to an explicit column descriptor and maps need to map 
to a column family name. In the above examples, the {{friends}} column family 
data from {{SampleTable}} will be written to a {{buddies}} column family in the 
{{SampleTableCopy}} table.
 

> Add ability to load data by column family in HBaseStorage
> ---------------------------------------------------------
>
>                 Key: PIG-1782
>                 URL: https://issues.apache.org/jira/browse/PIG-1782
>             Project: Pig
>          Issue Type: New Feature
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Bill Graham
>         Attachments: PIG-1782_1.patch, PIG_1782_2.patch, 
> apply-PIG-1782-patch.sh
>
>
> It would be nice to load all columns in the column family by using short hand 
> syntax like:
> {noformat}
> CpuMetrics = load 'hbase://SystemMetrics' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey');
> {noformat}
> Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1,  in 
> cpu column family.
> CpuMetrics would contain something like:
> {noformat}
> (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1782) Add ability to load data by column family in HBaseStorage

Reply via email to