Mike Welch created PIG-3961:
-------------------------------
Summary: Adding HBaseStorage cell value filters
Key: PIG-3961
URL: https://issues.apache.org/jira/browse/PIG-3961
Project: Pig
Issue Type: New Feature
Reporter: Mike Welch
Priority: Minor
Adding three additional server side filtering options when loading data with
HBaseStorage:
# specified cf:col does not exist
{{-null cf:col}}
# specified cf:col must exist
{{-notnull cf:col}}
# specified cf:col contains the given value
{{-val cf:col=value}}
These are meant to replace (and optimize by reducing data transfer) the
frequent paradigm in pig of loading data and immediately filtering for a
specific condition. For example
data = load 'hbase://mytable' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*') as (cf:map[]) ;
data_with_value = filter data by cf#'col' = 'value' ;
Can be replaced with:
data_with_value = load 'hbase://mytable' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*', 'cf:col=value') as
(cf:map[]) ;
--
This message was sent by Atlassian JIRA
(v6.2#6252)