[
https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lars Hofhansl updated HBASE-12311:
----------------------------------
Attachment: 12311-indexed-0.98.txt
Here's a patch that illustrates the idea for 0.98. In store scanner when the
SQM indicated we should seek, we check the nextIndexedKey (if available) and we
would seek before that, we simply SKIP and let the SQM try again.
The only annoying part is that we only an indexed *key* (i.e. row, family,
column), which we are trying to get rid of.
HFileReaderV2.AbstractScannerV2.reseekTo performs the same check to decide
whether to seek or to retry on the same block, this just pulls the check up. We
can probably remove that optimization from the AbstractScannerV2 now (and save
a few more compares).
> Version stats in HFiles?
> ------------------------
>
> Key: HBASE-12311
> URL: https://issues.apache.org/jira/browse/HBASE-12311
> Project: HBase
> Issue Type: Brainstorming
> Reporter: Lars Hofhansl
> Attachments: 12311-indexed-0.98.txt, 12311-v2.txt, 12311-v3.txt,
> 12311.txt, CellStatTracker.java
>
>
> In HBASE-9778 I basically punted the decision on whether doing repeated
> scanner.next() called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of
> versions we've seen for any row/col combination and store these in the
> HFile's metadata (just like the timerange, oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions
> (i.e. seek between columns is better) or not (in which case we'd issue
> repeated next()'s).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)