[ 
https://issues.apache.org/jira/browse/PIG-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-2934:
-----------------------------

    Attachment: PIG-2934.1.patch

Attaching patch that reduces the number of filters used and improves how range 
scans are done.
                
> HBaseStorage filter optimizations
> ---------------------------------
>
>                 Key: PIG-2934
>                 URL: https://issues.apache.org/jira/browse/PIG-2934
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>              Labels: hbase
>         Attachments: PIG-2934.1.patch
>
>
> Our HBase pal/guru Gary Helmling was kind enough to do a code review of 
> HBaseStorage. He suggested some good filter optimizations:
> * when using the "lt*" and "gt*" options, set the start/stop rows on the Scan 
> instance, at least in addition to the RowFilters. Without this you're doing a 
> full table scan, regardless of the RowFilters.
> * when selecting specific columns or entire families to return, it would be 
> more efficient to set the family + columns on the Scan object (addFamily(), 
> addColumn()), instead of using a FilterList. I'm not familiar with the 
> family:prefix handling you mention, but that would still seem to require 
> filters. But if that's not being used, it would be better to avoid the 
> FilterList for columns. At minimum, we should probably call Scan.addFamily() 
> with the distinct families, so we can skip entire column families that are 
> not being used. In the case of a table with 4 CFs, if, say, only 1 is being 
> used, this could be a big gain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to