[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2016-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389849#comment-15389849
 ] 

Hudson commented on HBASE-11144:


FAILURE: Integrated in HBase-0.98-matrix #375 (See 
[https://builds.apache.org/job/HBase-0.98-matrix/375/])
HBASE-11144 Filter to support scanning multiple row key ranges (Jiajia 
(apurtell: rev b2d883ddcf47833786d7ee0eeaa52bee60c00de5)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestMultiRowRangeFilter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterSerialization.java
* hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/FilterProtos.java
* hbase-protocol/src/main/protobuf/Filter.proto
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/filter/MultiRowRangeFilter.java


> Filter to support scanning multiple row key ranges
> --
>
> Key: HBASE-11144
> URL: https://issues.apache.org/jira/browse/HBASE-11144
> Project: HBase
>  Issue Type: Improvement
>  Components: Filters
>Reporter: Jiajia Li
>Assignee: Jiajia Li
> Fix For: 2.0.0, 1.1.0, 0.98.21
>
> Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
> HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
> HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
> HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
> HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
> MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
> MultiRowRangeFilter3.patch, hbase_11144_V8.patch
>
>
> HBase is quite efficient when scanning only one small row key range. If user 
> needs to specify multiple row key ranges in one scan, the typical solutions 
> are: 1. through FilterList which is a list of row key Filters, 2. using the 
> SQL layer over HBase to join with two table, such as hive, phoenix etc. 
> However, both solutions are inefficient. Both of them can’t utilize the range 
> info to perform fast forwarding during scan which is quite time consuming. If 
> the number of ranges are quite big (e.g. millions), join is a proper solution 
> though it is slow. However, there are cases that user wants to specify a 
> small number of ranges to scan (e.g. <1000 ranges). Both solutions can’t 
> provide satisfactory performance in such case. 
> We provide this filter (MultiRowRangeFilter) to support such use case (scan 
> multiple row key ranges), which can construct the row key ranges from user 
> specified list and perform fast-forwarding during scan. Thus, the scan will 
> be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278363#comment-14278363
 ] 

stack commented on HBASE-11144:
---

[~jiajia] Thank you for the great release note.

 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277190#comment-14277190
 ] 

stack commented on HBASE-11144:
---

[~br...@brianjohnson.cc] Thanks for the input.  Lets keep the feature then.

[~jiajia] any chance of your updating the release note to include some of the 
pros/cons and alternatives that have been discussed above?  It will help 
clarify when this nice feature of yours should be used.  Thank you.

 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-14 Thread Brian Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277123#comment-14277123
 ] 

Brian Johnson commented on HBASE-11144:
---

Even if you could do the same thing by issuing multiple scans, this filter has 
it's uses. If you were using something like rest, thrift or pig to access the 
data the filter might be your only practical solution that doesn't do a full 
table scan (filterlist)

 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-14 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276633#comment-14276633
 ] 

ramkrishna.s.vasudevan commented on HBASE-11144:


When I went through the patch I had a similar thought. But in Phoenix case when 
we have different ranges for the scan -  anyway we go about with 
parallelization of the scans. But I do think in those cases the ranges have to 
be consecutive?  If the ranges are not consecutive we go with SKIP_SCAN filter 
I suppose.  
But making a scans on different ranges like a IN clause,  which does not use 
Phoenix, a filter of this sort that works on smaller ranges would be efficient 
instead of issuing multiple RPCs?
The suggestion of testing this with multiple scans is worth a try.

 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276500#comment-14276500
 ] 

Lars Hofhansl commented on HBASE-11144:
---

bq. {noformat}
The test is done using MultiRowRangeFilter and the FilterList with a list of 
row key Filters on a 7-node cluster, each node uses 32 CPUs and 90GB memory.
There’re 4 rounds of the test and each round scan for 100 row key ranges in the 
table with 100million records, and get the count of results is 153437898. 
Following is the test results and the average time is computed without the max 
and min values.

1 2 3 4 Avg
FilterList 8693479 8641336 8644194 8647838 8646016(ms)
MultiRowRangeFilter 1264502 1263921 1262744 1252947 126(ms)

Speed up to 6.84 times.
{noformat}

A bit late to this party, but have we compared this to issuing 100 scans 
individual scans with the proper start and stop keys set?


 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-13 Thread Brian Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276517#comment-14276517
 ] 

Brian Johnson commented on HBASE-11144:
---

I'm surprised by the modest speed increase. We ended up using Phoenix to get a 
similar capability and saw a speed up of several orders of magnitude vs a 
filter list on a similar size data set to that test, but we were retrieving a 
much smaller subset of the data from the ~100 ranges (thousands of records). 

 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-13 Thread Jiajia Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276522#comment-14276522
 ] 

Jiajia Li commented on HBASE-11144:
---

I've only test between the filterlist and multirowrangefilter, the RowFilter is 
used by the filterlist.

 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276529#comment-14276529
 ] 

stack commented on HBASE-11144:
---

bq. A bit late to this party, but have we compared this to issuing 100 scans 
individual scans with the proper start and stop keys set?


We need to add this as a client feature [~lhofhansl]?

 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276585#comment-14276585
 ] 

Lars Hofhansl commented on HBASE-11144:
---

bq. We need to add this as a client feature[...]?

Maybe. It's not really that hard to issue a few scans.

Finding a small sub-range out of a very large set of rows is precisely what 
HBase is good at, so I am bit surprised we need this.
A filter like this implementing skip-scans is good for the equivalent of a IN 
(v1, v2, v3, v4, ...) query, i.e. many point queries (or Gets) that can now be 
executed in a single RPC. AFAIK that is what Phoenix uses its filter for. Maybe 
it'll work too if the individual ranges are small.
Once the retrieved ranges approach a certain size (maybe 1000's or 1's of 
rows) I doubt this will be better over multiple scan RPCs. Especially when 
those are farmed out in parallel (as Phoenix does).

Note that Phoenix parallelizes scan requests (so some of the perf comes from 
using more resources of the cluster).


 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275479#comment-14275479
 ] 

Hudson commented on HBASE-11144:


FAILURE: Integrated in HBase-TRUNK #6017 (See 
[https://builds.apache.org/job/HBase-TRUNK/6017/])
HBASE-11144 Filter to support scanning multiple row key ranges (Jiajia Li) 
(tedyu: rev e5f3dd682fb8884a947b40b4348bd5d1386a6470)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestMultiRowRangeFilter.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/FilterProtos.java
* hbase-protocol/src/main/protobuf/Filter.proto
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/filter/MultiRowRangeFilter.java
* hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterSerialization.java


 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275499#comment-14275499
 ] 

stack commented on HBASE-11144:
---

Needs release note.

 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scanning multiple row key ranges

2015-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275516#comment-14275516
 ] 

Hudson commented on HBASE-11144:


FAILURE: Integrated in HBase-1.1 #78 (See 
[https://builds.apache.org/job/HBase-1.1/78/])
HBASE-11144 Filter to support scanning multiple row key ranges (Jiajia Li) 
(tedyu: rev b79dbedad6d92a668f8d9913e268ffe9fcccbb87)
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/FilterProtos.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/filter/MultiRowRangeFilter.java
* hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java
* hbase-protocol/src/main/protobuf/Filter.proto
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestMultiRowRangeFilter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterSerialization.java


 Filter to support scanning multiple row key ranges
 --

 Key: HBASE-11144
 URL: https://issues.apache.org/jira/browse/HBASE-11144
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Jiajia Li
Assignee: Jiajia Li
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE_11144_4.patch, HBASE_11144_V10.patch, 
 HBASE_11144_V11.patch, HBASE_11144_V12.patch, HBASE_11144_V13.patch, 
 HBASE_11144_V14.patch, HBASE_11144_V15.patch, HBASE_11144_V16.patch, 
 HBASE_11144_V17.patch, HBASE_11144_V18.patch, HBASE_11144_V5.patch, 
 HBASE_11144_V6.patch, HBASE_11144_V7.patch, HBASE_11144_V9.patch, 
 MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
 MultiRowRangeFilter3.patch, hbase_11144_V8.patch


 HBase is quite efficient when scanning only one small row key range. If user 
 needs to specify multiple row key ranges in one scan, the typical solutions 
 are: 1. through FilterList which is a list of row key Filters, 2. using the 
 SQL layer over HBase to join with two table, such as hive, phoenix etc. 
 However, both solutions are inefficient. Both of them can’t utilize the range 
 info to perform fast forwarding during scan which is quite time consuming. If 
 the number of ranges are quite big (e.g. millions), join is a proper solution 
 though it is slow. However, there are cases that user wants to specify a 
 small number of ranges to scan (e.g. 1000 ranges). Both solutions can’t 
 provide satisfactory performance in such case. 
 We provide this filter (MultiRowRangeFilter) to support such use case (scan 
 multiple row key ranges), which can construct the row key ranges from user 
 specified list and perform fast-forwarding during scan. Thus, the scan will 
 be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)