[
https://issues.apache.org/jira/browse/ACCUMULO-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233387#comment-13233387
]
Aaron Cordova commented on ACCUMULO-473:
----------------------------------------
Keith, when are you proposing doing the binary search? On the block after the
block has been read into memory or before?
I would advocate keeping block sizes small enough to be read into memory and if
a binary search is done, do it on the block in memory. This is in order to
restrict lookups and scans to one disk seek.
> Support binary search within RFile blocks
> -----------------------------------------
>
> Key: ACCUMULO-473
> URL: https://issues.apache.org/jira/browse/ACCUMULO-473
> Project: Accumulo
> Issue Type: Improvement
> Components: tserver
> Reporter: Keith Turner
> Assignee: Keith Turner
> Fix For: 1.5.0
>
>
> RFiles store blocks of key values using relative encoding. By default these
> blocks are small (100k). To find a key in an RFile the index is used to find
> the block, then the block is scanned for the key. It would be nice for
> iterators that do alot of seeking if a binary search were done instead of a
> scan.
> Relative encoding is a form of compression that serializes a key relative to
> the last key. For example if the row in a key is the same as the previous
> key, then the row is not stored again. This works well with sorted data.
> The current on disk format does not support random access within a block,
> because to read entry N all previous entries must be read. One option would
> be to deserialize the block into a form that supports random access and then
> do binary search. However this will consume memory and CPU. If the relative
> encoding format could be modified to support random access, then binary
> search could be supported in a memory and CPU efficient way.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira