[ https://issues.apache.org/jira/browse/ACCUMULO-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Keith Turner updated ACCUMULO-473: ---------------------------------- Attachment: ACCUMULO-473-2.txt Added important comment to patch > Support binary search within RFile blocks > ----------------------------------------- > > Key: ACCUMULO-473 > URL: https://issues.apache.org/jira/browse/ACCUMULO-473 > Project: Accumulo > Issue Type: Improvement > Components: tserver > Reporter: Keith Turner > Assignee: Keith Turner > Fix For: 1.5.0 > > Attachments: ACCUMULO-473-1.txt, ACCUMULO-473-2.txt, Test.java > > > RFiles store blocks of key values using relative encoding. By default these > blocks are small (100k). To find a key in an RFile the index is used to find > the block, then the block is scanned for the key. It would be nice for > iterators that do alot of seeking if a binary search were done instead of a > scan. > Relative encoding is a form of compression that serializes a key relative to > the last key. For example if the row in a key is the same as the previous > key, then the row is not stored again. This works well with sorted data. > The current on disk format does not support random access within a block, > because to read entry N all previous entries must be read. One option would > be to deserialize the block into a form that supports random access and then > do binary search. However this will consume memory and CPU. If the relative > encoding format could be modified to support random access, then binary > search could be supported in a memory and CPU efficient way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira