[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341959#comment-14341959
 ] 

Lars Hofhansl commented on HBASE-13109:
---------------------------------------

Some more tests (similar to those in HBASE-9778, but this a different machine 
so don't compare them in absolute values): 4m row, 5 cols, 1 version.

Without patch:
||Wildcard||Col 2+4||
|3.9|7.27|

With patch:
||Wildcard||Col 2+4||
|3.9|5.1|
(selecting columns 2 and 4 is the worst case)

So this patch improves the ExplicitColumnTracker by almost 1/3rd, and the 
beauty of this change is that it will still work with very many versions, 
because it uses whether we can seek into another block as a metric to decide 
whether to seek or not.


> Make better SEEK vs SKIP decisions during scanning
> --------------------------------------------------
>
>                 Key: HBASE-13109
>                 URL: https://issues.apache.org/jira/browse/HBASE-13109
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: 13109-trunk.txt
>
>
> I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
> SKIP Cells. This has come up in various issues, and I think I have a way to 
> finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
> --- Old description ---
> This is a continuation of HBASE-9778.
> We've seen a scenario of a very slow scan over a region using a timerange 
> that happens to fall after the ts of any Cell in the region.
> Turns out we spend a lot of time seeking.
> Tested with a 5 column table, and the scan is 5x faster when the timerange 
> falls before all Cells' ts.
> We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
> SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to