[ 
https://issues.apache.org/jira/browse/ACCUMULO-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Turner updated ACCUMULO-473:
----------------------------------

    Status: Patch Available  (was: Open)
    
> Support binary search within RFile blocks
> -----------------------------------------
>
>                 Key: ACCUMULO-473
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-473
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
>
>
> RFiles store blocks of key values using relative encoding.  By default these 
> blocks are small (100k).  To find a key in an RFile the index is used to find 
> the block, then the block is scanned for the key.  It would be nice for 
> iterators that do alot of seeking if a binary search were done instead of a 
> scan.  
> Relative encoding is a form of compression that serializes a key relative to 
> the last key.  For example if the row in a key is the same as the previous 
> key, then the row is not stored again.  This works well with sorted data.   
> The current on disk format does not support random access within a block, 
> because to read entry N all previous entries must be read.   One option would 
> be to deserialize the block into a form that supports random access and then 
> do binary search.  However this will consume memory and CPU.  If the relative 
> encoding format could be modified to support random access, then binary 
> search could be supported in a memory and CPU efficient way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to