[jira] [Comment Edited] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806227#comment-13806227
 ] 

Lars Hofhansl edited comment on HBASE-9778 at 10/27/13 5:14 AM:


Note that this issue was brought up in HBASE-4433 already (the patch that 
introduced INCLUDE_AND_SEEK...).

It seems that for most scenarios we want to undo at least some part of that 
patch.


was (Author: lhofhansl):
Note that this issue was brought up in HBASE-4433 already (the patch that 
introduced INCLUDE_AND_SEEK...).

It seems that for most scenarios we want to undo and least least some part of 
that patch.

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 
 9778-0.94-v4.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801462#comment-13801462
 ] 

Lars Hofhansl edited comment on HBASE-9778 at 10/22/13 5:01 AM:


Forget my previous comment, I was wrong. ExplicitColumnTracker *does* seek to 
the next column it is interested in, so even with many column this patch would 
make it worse.



was (Author: lhofhansl):
Forgot my previous comment I was wrong. ExplicitColumnTracker *does* seek to 
the next column it is interested in, so even with many column this patch would 
make it worse.


 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 
 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)