[
https://issues.apache.org/jira/browse/KUDU-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Will Berkeley reassigned KUDU-1398:
-----------------------------------
Assignee: Will Berkeley
> CFile index blocks can store shortest separating prefix
> -------------------------------------------------------
>
> Key: KUDU-1398
> URL: https://issues.apache.org/jira/browse/KUDU-1398
> Project: Kudu
> Issue Type: Bug
> Components: cfile, perf
> Affects Versions: 0.8.0
> Reporter: Todd Lipcon
> Assignee: Will Berkeley
>
> Currently, the cfile value index blocks store the entire value for the first
> value in each data block. This is actually not necessary -- we only need to
> store the shortest string that falls between the last key of the previous
> block and the first key of this block. For example:
> Data block 1: apple,banana,cardamom
> Data block 2: carrot,epazote,fennel
> Today we would store:
> Index block entries: ['apple' -> block 1, 'carrot' -> block 2]
> Minimally, we can store:
> Index block entries: ['' -> block 1, 'care' -> block 2]
> In this example only a few bytes are saved, but in the case of longer key
> strings, the savings can be substantial. For example, if the key is a 36-byte
> UUID uniformly distributed, and we have 1000x32KB data blocks in a 32MB
> cfile, we can probably shorten the index entries to only 2-3 bytes on average
> for a big savings.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)