[
https://issues.apache.org/jira/browse/PIG-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bill Graham updated PIG-2934:
-----------------------------
Affects Version/s: 0.10.0
> HBaseStorage filter optimizations
> ---------------------------------
>
> Key: PIG-2934
> URL: https://issues.apache.org/jira/browse/PIG-2934
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.10.0
> Reporter: Bill Graham
> Assignee: Bill Graham
> Labels: hbase
> Attachments: PIG-2934.1.patch
>
>
> Our HBase pal/guru Gary Helmling was kind enough to do a code review of
> HBaseStorage. He suggested some good filter optimizations:
> * when using the "lt*" and "gt*" options, set the start/stop rows on the Scan
> instance, at least in addition to the RowFilters. Without this you're doing a
> full table scan, regardless of the RowFilters.
> * when selecting specific columns or entire families to return, it would be
> more efficient to set the family + columns on the Scan object (addFamily(),
> addColumn()), instead of using a FilterList. I'm not familiar with the
> family:prefix handling you mention, but that would still seem to require
> filters. But if that's not being used, it would be better to avoid the
> FilterList for columns. At minimum, we should probably call Scan.addFamily()
> with the distinct families, so we can skip entire column families that are
> not being used. In the case of a table with 4 CFs, if, say, only 1 is being
> used, this could be a big gain.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira