[jira] [Updated] (HBASE-12311) Version stats in HFiles?

Lars Hofhansl (JIRA) Fri, 27 Feb 2015 22:02:06 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lars Hofhansl updated HBASE-12311:
----------------------------------
    Attachment: 12311-indexed-0.98.txt

Here's a patch that illustrates the idea for 0.98. In store scanner when the 
SQM indicated we should seek, we check the nextIndexedKey (if available) and we 
would seek before that, we simply SKIP and let the SQM try again.

The only annoying part is that we only an indexed *key* (i.e. row, family, 
column), which we are trying to get rid of. 
HFileReaderV2.AbstractScannerV2.reseekTo performs the same check to decide 
whether to seek or to retry on the same block, this just pulls the check up. We 
can probably remove that optimization from the AbstractScannerV2 now (and save 
a few more compares).

> Version stats in HFiles?
> ------------------------
>
>                 Key: HBASE-12311
>                 URL: https://issues.apache.org/jira/browse/HBASE-12311
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>         Attachments: 12311-indexed-0.98.txt, 12311-v2.txt, 12311-v3.txt, 
> 12311.txt, CellStatTracker.java
>
>
> In HBASE-9778 I basically punted the decision on whether doing repeated 
> scanner.next() called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of 
> versions we've seen for any row/col combination and store these in the 
> HFile's metadata (just like the timerange, oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions 
> (i.e. seek between columns is better) or not (in which case we'd issue 
> repeated next()'s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12311) Version stats in HFiles?

Reply via email to