[ 
https://issues.apache.org/jira/browse/PIG-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493232#comment-13493232
 ] 

Christoph Bauer commented on PIG-2934:
--------------------------------------

I'm starting on a patch for HBase Storage here at my company.

Regarding your first issue you're totally right. It seems weird that it was 
implemented with filters at all.
The second issue is different. In HBaseStorage.setLocation those Families are 
added to the scan object. I don't understand why it's done there though.
                
> HBaseStorage filter optimizations
> ---------------------------------
>
>                 Key: PIG-2934
>                 URL: https://issues.apache.org/jira/browse/PIG-2934
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>              Labels: hbase
>
> Our HBase pal/guru Gary Helmling was kind enough to do a code review of 
> HBaseStorage. He suggested some good filter optimizations:
> * when using the "lt*" and "gt*" options, set the start/stop rows on the Scan 
> instance, at least in addition to the RowFilters. Without this you're doing a 
> full table scan, regardless of the RowFilters.
> * when selecting specific columns or entire families to return, it would be 
> more efficient to set the family + columns on the Scan object (addFamily(), 
> addColumn()), instead of using a FilterList. I'm not familiar with the 
> family:prefix handling you mention, but that would still seem to require 
> filters. But if that's not being used, it would be better to avoid the 
> FilterList for columns. At minimum, we should probably call Scan.addFamily() 
> with the distinct families, so we can skip entire column families that are 
> not being used. In the case of a table with 4 CFs, if, say, only 1 is being 
> used, this could be a big gain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to