When using pig with hbase, pig filters should utilize hbase indexes to limit 
workset.
-------------------------------------------------------------------------------------

                 Key: PIG-2107
                 URL: https://issues.apache.org/jira/browse/PIG-2107
             Project: Pig
          Issue Type: Improvement
            Reporter: Albert Sunwoo


The LOAD function using HBaseStorage has filter arguments you can use limit the 
working set for an MR job.
e.g. 
blah = LOAD 'hbase://test' using 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:field1', '-loadKey -gte 
foo1 -lte foo1');

It would be really great if this could also be applied to filter statements 
within pig, where a filter statement within pig e.g.
blah2 = FILTER blah by key=foo1; or
blah2 = FILTER blah by key > foo1 and key < foo2;

would actually limit what is retrieved from hbase, so big has a smaller working 
set to perform MR on. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to