[ 
https://issues.apache.org/jira/browse/KUDU-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-1398:
-----------------------------------

    Assignee: Will Berkeley

> CFile index blocks can store shortest separating prefix
> -------------------------------------------------------
>
>                 Key: KUDU-1398
>                 URL: https://issues.apache.org/jira/browse/KUDU-1398
>             Project: Kudu
>          Issue Type: Bug
>          Components: cfile, perf
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Assignee: Will Berkeley
>
> Currently, the cfile value index blocks store the entire value for the first 
> value in each data block. This is actually not necessary -- we only need to 
> store the shortest string that falls between the last key of the previous 
> block and the first key of this block. For example:
> Data block 1: apple,banana,cardamom
> Data block 2: carrot,epazote,fennel
> Today we would store:
> Index block entries: ['apple' -> block 1, 'carrot' -> block 2]
> Minimally, we can store:
> Index block entries: ['' -> block 1, 'care' -> block 2]
> In this example only a few bytes are saved, but in the case of longer key 
> strings, the savings can be substantial. For example, if the key is a 36-byte 
> UUID uniformly distributed, and we have 1000x32KB data blocks in a 32MB 
> cfile, we can probably shorten the index entries to only 2-3 bytes on average 
> for a big savings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to