[jira] [Updated] (LUCENE-7563) BKD index should compress unused leading bytes

Adrien Grand (JIRA) Mon, 05 Dec 2016 03:10:12 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-7563:
---------------------------------
    Attachment: LUCENE-7563-prefixlen-unary.patch

The change looks good and the drop is quite spectacular. 
http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#searcher_heap
 :-) I think there is just a redundant arraycopy in {{clone()}}?

For the record, I played with another idea leveraging the fact that the prefix 
lengths on two consecutive levels are likely close to each other, and the most 
common values for the deltas are 0, then 1, then -1. So we might be able to do 
more savings by encoding the delta between consecutive prefix length using 
unary coding on top of zig-zag encoding, which would allow to encode 0 on 1 
bit, 1 on 2 bits, 2 on 3 bits, etc. However it only saved 1% memory on IndexOSM 
and less than 1% on IndexTaxis. I'm attaching it here if someone wants to have 
a look but I don't think the gains are worth the complexity.

> BKD index should compress unused leading bytes
> ----------------------------------------------
>
>                 Key: LUCENE-7563
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7563
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: master (7.0), 6.4
>
>         Attachments: LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch, 
> LUCENE-7563.patch, LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-7563) BKD index should compress unused leading bytes

Reply via email to