[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341959#comment-14341959 ]
Lars Hofhansl commented on HBASE-13109: --------------------------------------- Some more tests (similar to those in HBASE-9778, but this a different machine so don't compare them in absolute values): 4m row, 5 cols, 1 version. Without patch: ||Wildcard||Col 2+4|| |3.9|7.27| With patch: ||Wildcard||Col 2+4|| |3.9|5.1| (selecting columns 2 and 4 is the worst case) So this patch improves the ExplicitColumnTracker by almost 1/3rd, and the beauty of this change is that it will still work with very many versions, because it uses whether we can seek into another block as a metric to decide whether to seek or not. > Make better SEEK vs SKIP decisions during scanning > -------------------------------------------------- > > Key: HBASE-13109 > URL: https://issues.apache.org/jira/browse/HBASE-13109 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > Priority: Minor > Attachments: 13109-trunk.txt > > > I'm re-purposing this issue to add a heuristic as to when to SEEK and when to > SKIP Cells. This has come up in various issues, and I think I have a way to > finally fix this now. HBASE-9778, HBASE-12311, and friends are related. > --- Old description --- > This is a continuation of HBASE-9778. > We've seen a scenario of a very slow scan over a region using a timerange > that happens to fall after the ts of any Cell in the region. > Turns out we spend a lot of time seeking. > Tested with a 5 column table, and the scan is 5x faster when the timerange > falls before all Cells' ts. > We can use the lookahead hint introduced in HBASE-9778 to do opportunistic > SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)