[ https://issues.apache.org/jira/browse/PIG-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Albert Sunwoo updated PIG-2107: ------------------------------- Summary: When using pig with HBaseStorage, pig filters should utilize hbase indexes to limit workset. (was: When using pig with hbase, pig filters should utilize hbase indexes to limit workset.) > When using pig with HBaseStorage, pig filters should utilize hbase indexes to > limit workset. > -------------------------------------------------------------------------------------------- > > Key: PIG-2107 > URL: https://issues.apache.org/jira/browse/PIG-2107 > Project: Pig > Issue Type: Improvement > Reporter: Albert Sunwoo > > The LOAD function using HBaseStorage has filter arguments you can use limit > the working set for an MR job. > e.g. > blah = LOAD 'hbase://test' using > org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:field1', '-loadKey -gte > foo1 -lte foo1'); > It would be really great if this could also be applied to filter statements > within pig, where a filter statement within pig e.g. > blah2 = FILTER blah by key=foo1; or > blah2 = FILTER blah by key > foo1 and key < foo2; > would actually limit what is retrieved from hbase, so big has a smaller > working set to perform MR on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira