[ 
https://issues.apache.org/jira/browse/ACCUMULO-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs resolved ACCUMULO-790.
----------------------------------------

    Resolution: Fixed
    
> RFile should compress using common prefixes of key elements
> -----------------------------------------------------------
>
>                 Key: ACCUMULO-790
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-790
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Christopher Tubbs
>            Assignee: Christopher Tubbs
>              Labels: compression, file, hackathon, optimization, rfile
>             Fix For: 1.5.0
>
>
> Relative keys have proven themselves as a great way to compress within 
> dimensions of the key. However, we could probably do better, since we know 
> that our data is sorted lexicographically, we can make a reasonable 
> assumption that we will get better compression if we only store the fact that 
> a key (or portion of a key) has a common prefix with the previous key, even 
> if it is not an exact match.
> Currently, in RFile, unused bits from the delete flag byte are being used to 
> store flags that show whether an element of the key is exactly the same as 
> the previous, or if it is different. We can change the semantics of these 
> flags to store three states per element of the key: exact match as previous 
> key, has a common prefix as previous key, no relative key compression. If we 
> don't want to add a byte to store 2 bits for 3 states per element, we can 
> just take the ordinal value of the unused 7 bits of the delete flag field and 
> map it to an enumeration of relative key flags.
> In the case of a common prefix flag enabled for a given element of the 
> current key when reading the RFile, we can interpret the first bytes of that 
> element as a VInt expressing the length of the common prefix relative to the 
> previous key's same element. Because this will add at least one byte to the 
> the length of that element, we will not want to use the common prefix 
> compression if the common prefix is less than 2 bytes. For less than 2 bytes 
> in common (1 or 0 bytes in common), we'd select the no compression flag for 
> that element.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to