[
https://issues.apache.org/jira/browse/PIG-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Welch updated PIG-3961:
----------------------------
Description:
Adding three additional server side filtering options when loading data with
HBaseStorage:
# specified cf:col does not exist
{{-null cf:col}}
# specified cf:col must exist
{{-notnull cf:col}}
# specified cf:col contains the given value
{{-val cf:col=value}}
These are meant to replace (and optimize by reducing data transfer) the
frequent paradigm in pig of loading data and immediately filtering for a
specific condition. For example
data = load 'hbase://mytable' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*') as (cf:map[]) ;
data_with_value = filter data by cf#'col' = 'value' ;
Can be replaced with:
data_with_value = load 'hbase://mytable' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*', '-val cf:col=value')
as (cf:map[]) ;
was:
Adding three additional server side filtering options when loading data with
HBaseStorage:
# specified cf:col does not exist
{{-null cf:col}}
# specified cf:col must exist
{{-notnull cf:col}}
# specified cf:col contains the given value
{{-val cf:col=value}}
These are meant to replace (and optimize by reducing data transfer) the
frequent paradigm in pig of loading data and immediately filtering for a
specific condition. For example
data = load 'hbase://mytable' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*') as (cf:map[]) ;
data_with_value = filter data by cf#'col' = 'value' ;
Can be replaced with:
data_with_value = load 'hbase://mytable' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*', 'cf:col=value') as
(cf:map[]) ;
> Adding HBaseStorage cell value filters
> --------------------------------------
>
> Key: PIG-3961
> URL: https://issues.apache.org/jira/browse/PIG-3961
> Project: Pig
> Issue Type: New Feature
> Reporter: Mike Welch
> Assignee: Mike Welch
> Priority: Minor
> Fix For: 0.14.0
>
> Attachments: filters-patch.diff
>
>
> Adding three additional server side filtering options when loading data with
> HBaseStorage:
> # specified cf:col does not exist
> {{-null cf:col}}
> # specified cf:col must exist
> {{-notnull cf:col}}
> # specified cf:col contains the given value
> {{-val cf:col=value}}
> These are meant to replace (and optimize by reducing data transfer) the
> frequent paradigm in pig of loading data and immediately filtering for a
> specific condition. For example
> data = load 'hbase://mytable' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*') as (cf:map[]) ;
> data_with_value = filter data by cf#'col' = 'value' ;
> Can be replaced with:
> data_with_value = load 'hbase://mytable' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*', '-val cf:col=value')
> as (cf:map[]) ;
--
This message was sent by Atlassian JIRA
(v6.2#6252)