[
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722090#comment-15722090
]
Michael McCandless commented on LUCENE-7563:
--------------------------------------------
bq. I think there is just a redundant arraycopy in clone()?
Thanks, I pushed a fix!
bq. For the record, I played with another idea leveraging the fact that the
prefix lengths on two consecutive levels are likely close to each other,
I like this idea! But I hit this test failure ... doesn't reproduce on trunk:
{noformat}
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBKD
-Dtests.method=testWastedLeadingBytes -Dtests.seed=2E5F0E183BBA1098
-Dtests.locale=es-PR -Dtests.timezone=CST -Dtests.asserts=true
-Dtests.file.encoding=US-ASCII
[junit4] ERROR 0.90s J1 | TestBKD.testWastedLeadingBytes <<<
[junit4] > Throwable #1: java.lang.ArrayIndexOutOfBoundsException: -32
[junit4] > at
__randomizedtesting.SeedInfo.seed([2E5F0E183BBA1098:ABD9D50B47794EFC]:0)
[junit4] > at
org.apache.lucene.util.bkd.BKDReader$PackedIndexTree.readNodeData(BKDReader.java:442)
[junit4] > at
org.apache.lucene.util.bkd.BKDReader$PackedIndexTree.<init>(BKDReader.java:343)
[junit4] > at
org.apache.lucene.util.bkd.BKDReader.getIntersectState(BKDReader.java:526)
[junit4] > at
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:498)
[junit4] > at
org.apache.lucene.util.bkd.TestBKD.testWastedLeadingBytes(TestBKD.java:1042)
[junit4] > at java.lang.Thread.run(Thread.java:745)
{noformat}
> BKD index should compress unused leading bytes
> ----------------------------------------------
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7563-prefixlen-unary.patch, LUCENE-7563.patch,
> LUCENE-7563.patch, LUCENE-7563.patch, LUCENE-7563.patch
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom
> two bytes in a given segment, we shouldn't store all those leading 0s in the
> index.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]