[
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-7563:
---------------------------------------
Attachment: LUCENE-7563.patch
New patch; I think it's ready.
This breaks out a private BKD implementation for {{SimpleText}} which
is a nice cleanup for the core BKD implementation, e.g. {{BKDReader}}
is now final; its strange protected constructor is gone; protected
methods are now private.
This patch also implements [~jpountz]'s last compression idea, to often
use only 1 byte to encode prefix, splitDim and first-byte-delta of the
suffix instead of the 2 bytes required in the previous iterations.
This gives another ~4-5% further compression improvement:
* sparse-sorted -> 2.37 MB
* sparse -> 2.07 MB
* dense -> 2.00 MB
And the OpenStreetMaps geo benchmark:
* geo3d -> 1.75 MB
* LatLonPoint -> 1.72 MB
I'm running the 2B BKD and Points tests now ... if those pass, I plan
to push to master first and let this bake a bit before backporting.
> BKD index should compress unused leading bytes
> ----------------------------------------------
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563.patch, LUCENE-7563.patch, LUCENE-7563.patch,
> LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom
> two bytes in a given segment, we shouldn't store all those leading 0s in the
> index.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]