Support binary search within RFile blocks
-----------------------------------------
Key: ACCUMULO-473
URL: https://issues.apache.org/jira/browse/ACCUMULO-473
Project: Accumulo
Issue Type: Improvement
Components: tserver
Reporter: Keith Turner
Assignee: Keith Turner
Fix For: 1.5.0
RFiles store blocks of key values using relative encoding. By default these
blocks are small (100k). To find a key in an RFile the index is used to find
the block, then the block is scanned for the key. It would be nice for
iterators that do alot of seeking if a binary search were done instead of a
scan.
Relative encoding is a form of compression that serializes a key relative to
the last key. For example if the row in a key is the same as the previous key,
then the row is not stored again. This works well with sorted data. The
current on disk format does not support random access within a block, because
to read entry N all previous entries must be read. One option would be to
deserialize the block into a form that supports random access and then do
binary search. However this will consume memory and CPU. If the relative
encoding format could be modified to support random access, then binary search
could be supported in a memory and CPU efficient way.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira