[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389849#comment-15389849 ] Hudson commented on HBASE-11144: FAILURE: Integrated in HBase-0.98-matrix #375 (See [https://builds.apache.org/job/HBase-0.98-matrix/375/]) HBASE-11144 Filter to support scanning multiple row key ranges (Jiajia (apurtell: rev b2d883ddcf47833786d7ee0eeaa52bee60c00de5) * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestMultiRowRangeFilter.java * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterSerialization.java * hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/FilterProtos.java * hbase-protocol/src/main/protobuf/Filter.proto * hbase-client/src/main/java/org/apache/hadoop/hbase/filter/MultiRowRangeFilter.java > Filter to support scanning multiple row key ranges > -- > > Key: HBASE-11144 > URL: https://issues.apache.org/jira/browse/HBASE-11144 > Project: HBase > Issue Type: Improvement > Components: Filters >Reporter: Jiajia Li >Assignee: Jiajia Li > Fix For: 2.0.0, 1.1.0, 0.98.21 > > Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, > HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, > HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, > HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, > HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, > MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, > MultiRowRangeFilter3.patch, hbase_11144_V8.patch > > > HBase is quite efficient when scanning only one small row key range. If user > needs to specify multiple row key ranges in one scan, the typical solutions > are: 1. through FilterList which is a list of row key Filters, 2. using the > SQL layer over HBase to join with two table, such as hive, phoenix etc. > However, both solutions are inefficient. Both of them can’t utilize the range > info to perform fast forwarding during scan which is quite time consuming. If > the number of ranges are quite big (e.g. millions), join is a proper solution > though it is slow. However, there are cases that user wants to specify a > small number of ranges to scan (e.g. <1000 ranges). Both solutions can’t > provide satisfactory performance in such case. > We provide this filter (MultiRowRangeFilter) to support such use case (scan > multiple row key ranges), which can construct the row key ranges from user > specified list and perform fast-forwarding during scan. Thus, the scan will > be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278363#comment-14278363 ] stack commented on HBASE-11144: --- [~jiajia] Thank you for the great release note. Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277190#comment-14277190 ] stack commented on HBASE-11144: --- [~br...@brianjohnson.cc] Thanks for the input. Lets keep the feature then. [~jiajia] any chance of your updating the release note to include some of the pros/cons and alternatives that have been discussed above? It will help clarify when this nice feature of yours should be used. Thank you. Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277123#comment-14277123 ] Brian Johnson commented on HBASE-11144: --- Even if you could do the same thing by issuing multiple scans, this filter has it's uses. If you were using something like rest, thrift or pig to access the data the filter might be your only practical solution that doesn't do a full table scan (filterlist) Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276633#comment-14276633 ] ramkrishna.s.vasudevan commented on HBASE-11144: When I went through the patch I had a similar thought. But in Phoenix case when we have different ranges for the scan - anyway we go about with parallelization of the scans. But I do think in those cases the ranges have to be consecutive? If the ranges are not consecutive we go with SKIP_SCAN filter I suppose. But making a scans on different ranges like a IN clause, which does not use Phoenix, a filter of this sort that works on smaller ranges would be efficient instead of issuing multiple RPCs? The suggestion of testing this with multiple scans is worth a try. Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276500#comment-14276500 ] Lars Hofhansl commented on HBASE-11144: --- bq. {noformat} The test is done using MultiRowRangeFilter and the FilterList with a list of row key Filters on a 7-node cluster, each node uses 32 CPUs and 90GB memory. There’re 4 rounds of the test and each round scan for 100 row key ranges in the table with 100million records, and get the count of results is 153437898. Following is the test results and the average time is computed without the max and min values. 1 2 3 4 Avg FilterList 8693479 8641336 8644194 8647838 8646016(ms) MultiRowRangeFilter 1264502 1263921 1262744 1252947 126(ms) Speed up to 6.84 times. {noformat} A bit late to this party, but have we compared this to issuing 100 scans individual scans with the proper start and stop keys set? Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276517#comment-14276517 ] Brian Johnson commented on HBASE-11144: --- I'm surprised by the modest speed increase. We ended up using Phoenix to get a similar capability and saw a speed up of several orders of magnitude vs a filter list on a similar size data set to that test, but we were retrieving a much smaller subset of the data from the ~100 ranges (thousands of records). Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276522#comment-14276522 ] Jiajia Li commented on HBASE-11144: --- I've only test between the filterlist and multirowrangefilter, the RowFilter is used by the filterlist. Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276529#comment-14276529 ] stack commented on HBASE-11144: --- bq. A bit late to this party, but have we compared this to issuing 100 scans individual scans with the proper start and stop keys set? We need to add this as a client feature [~lhofhansl]? Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276585#comment-14276585 ] Lars Hofhansl commented on HBASE-11144: --- bq. We need to add this as a client feature[...]? Maybe. It's not really that hard to issue a few scans. Finding a small sub-range out of a very large set of rows is precisely what HBase is good at, so I am bit surprised we need this. A filter like this implementing skip-scans is good for the equivalent of a IN (v1, v2, v3, v4, ...) query, i.e. many point queries (or Gets) that can now be executed in a single RPC. AFAIK that is what Phoenix uses its filter for. Maybe it'll work too if the individual ranges are small. Once the retrieved ranges approach a certain size (maybe 1000's or 1's of rows) I doubt this will be better over multiple scan RPCs. Especially when those are farmed out in parallel (as Phoenix does). Note that Phoenix parallelizes scan requests (so some of the perf comes from using more resources of the cluster). Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275479#comment-14275479 ] Hudson commented on HBASE-11144: FAILURE: Integrated in HBase-TRUNK #6017 (See [https://builds.apache.org/job/HBase-TRUNK/6017/]) HBASE-11144 Filter to support scanning multiple row key ranges (Jiajia Li) (tedyu: rev e5f3dd682fb8884a947b40b4348bd5d1386a6470) * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestMultiRowRangeFilter.java * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/FilterProtos.java * hbase-protocol/src/main/protobuf/Filter.proto * hbase-client/src/main/java/org/apache/hadoop/hbase/filter/MultiRowRangeFilter.java * hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterSerialization.java Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275499#comment-14275499 ] stack commented on HBASE-11144: --- Needs release note. Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges
[ https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275516#comment-14275516 ] Hudson commented on HBASE-11144: FAILURE: Integrated in HBase-1.1 #78 (See [https://builds.apache.org/job/HBase-1.1/78/]) HBASE-11144 Filter to support scanning multiple row key ranges (Jiajia Li) (tedyu: rev b79dbedad6d92a668f8d9913e268ffe9fcccbb87) * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/FilterProtos.java * hbase-client/src/main/java/org/apache/hadoop/hbase/filter/MultiRowRangeFilter.java * hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java * hbase-protocol/src/main/protobuf/Filter.proto * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestMultiRowRangeFilter.java * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterSerialization.java Filter to support scanning multiple row key ranges -- Key: HBASE-11144 URL: https://issues.apache.org/jira/browse/HBASE-11144 Project: HBase Issue Type: Improvement Components: Filters Reporter: Jiajia Li Assignee: Jiajia Li Fix For: 2.0.0, 1.1.0 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, MultiRowRangeFilter3.patch, hbase_11144_V8.patch HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can’t utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t provide satisfactory performance in such case. We provide this filter (MultiRowRangeFilter) to support such use case (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)