[
https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341363#comment-14341363
]
Lars Hofhansl commented on HBASE-12311:
---------------------------------------
I've thought of another approach. StorefileScanners have the notion of the
"next indexed key", that is next known key to seek to (i.e. beginning of a
block). What if we took the next indexed key of the scanner that is on top of
the heap and only issue a seek if we would seek past that key? It's only a
heuristic and that check would not come free, but assuming it likely that
chunks of the Cells will come from the same file, we'd have a fairly good
indicator whether the seek will help. I have a 0.98 patch for that, and it
improves things. As an example I've used a range with the timerange. If the
range is before all Cells (except one so that the files isn't ruled out) it's
takes about 3.1s (we SKIP in that case) if the timerange fall after all Cells
(again except one) it 10.2s (we're seeking this time).
With the patch the first case is unchanged (3.1s), but the 2nd case it reduced
to 4.5s, since can avoid the unnecessary in many cases.
> Version stats in HFiles?
> ------------------------
>
> Key: HBASE-12311
> URL: https://issues.apache.org/jira/browse/HBASE-12311
> Project: HBase
> Issue Type: Brainstorming
> Reporter: Lars Hofhansl
> Attachments: 12311-v2.txt, 12311-v3.txt, 12311.txt,
> CellStatTracker.java
>
>
> In HBASE-9778 I basically punted the decision on whether doing repeated
> scanner.next() called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of
> versions we've seen for any row/col combination and store these in the
> HFile's metadata (just like the timerange, oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions
> (i.e. seek between columns is better) or not (in which case we'd issue
> repeated next()'s).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)