[
https://issues.apache.org/jira/browse/PIG-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bill Graham updated PIG-1782:
-----------------------------
Attachment: PIG_1782_2.patch
Attached is a second patch. This one is built to be applied on top of the
PIG_1680.3.patch.
>From the Javadocs:
An HBase implementation of LoadFunc and StoreFunc.
Below is an example showing how to load data from HBase:
{code}
raw = LOAD 'hbase://SampleTable'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')
AS (id:bytearray, first_name:chararray, last_name:chararray,
friends_map:map[], info_map:map[]);
{code}
This example loads data redundantly from the info column family just to
illustrate usage. Note that the row key is inserted first in the result schema.
To load only column names that start with a given prefix, specify the column
prefix with a trailing \*. For example passing {{friends:bob_*}} to the
constructor in the above example would cause only columns that start with
_bob__ to be loaded.
Below is an example showing how to store data into HBase:
{code}
copy = STORE raw INTO 'hbase://SampleTableCopy'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'info:first_name info:last_name friends:* info:*')
AS (info:first_name info:last_name buddies:* info:*);
{code}
Note that {{STORE}} will expect the first value in the tuple to be the row key.
Scalar values need to map to an explicit column descriptor and maps need to map
to a column family name. In the above examples, the {{friends}} column family
data from {{SampleTable}} will be written to a {{buddies}} column family in the
{{SampleTableCopy}} table.
> Add ability to load data by column family in HBaseStorage
> ---------------------------------------------------------
>
> Key: PIG-1782
> URL: https://issues.apache.org/jira/browse/PIG-1782
> Project: Pig
> Issue Type: New Feature
> Environment: Java 6, Mac OS X 10.6
> Reporter: Eric Yang
> Assignee: Bill Graham
> Attachments: PIG-1782_1.patch, PIG_1782_2.patch,
> apply-PIG-1782-patch.sh
>
>
> It would be nice to load all columns in the column family by using short hand
> syntax like:
> {noformat}
> CpuMetrics = load 'hbase://SystemMetrics' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey');
> {noformat}
> Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1, in
> cpu column family.
> CpuMetrics would contain something like:
> {noformat}
> (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1)
> {noformat}
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira