[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276517#comment-14276517 ]
Brian Johnson commented on HBASE-11144: --------------------------------------- I'm surprised by the modest speed increase. We ended up using Phoenix to get a similar capability and saw a speed up of several orders of magnitude vs a filter list on a similar size data set to that test, but we were retrieving a much smaller subset of the data from the ~100 ranges (thousands of records). > Filter to support scanning multiple row key ranges > -------------------------------------------------- > > Key: HBASE-11144 > URL: https://issues.apache.org/jira/browse/HBASE-11144 > Project: HBase > Issue Type: Improvement > Components: Filters > Reporter: Jiajia Li > Assignee: Jiajia Li > Fix For: 2.0.0, 1.1.0 > > Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, > HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, > HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, > HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, > HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, > MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, > MultiRowRangeFilter3.patch, hbase_11144_V8.patch > > > HBase is quite efficient when scanning only one small row key range. If user > needs to specify multiple row key ranges in one scan, the typical solutions > are: 1. through FilterList which is a list of row key Filters, 2. using the > SQL layer over HBase to join with two table, such as hive, phoenix etc. > However, both solutions are inefficient. Both of them can’t utilize the range > info to perform fast forwarding during scan which is quite time consuming. If > the number of ranges are quite big (e.g. millions), join is a proper solution > though it is slow. However, there are cases that user wants to specify a > small number of ranges to scan (e.g. <1000 ranges). Both solutions can’t > provide satisfactory performance in such case. > We provide this filter (MultiRowRangeFilter) to support such use case (scan > multiple row key ranges), which can construct the row key ranges from user > specified list and perform fast-forwarding during scan. Thus, the scan will > be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)