[
https://issues.apache.org/jira/browse/LUCENE-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694530#comment-16694530
]
Ignacio Vera commented on LUCENE-8562:
--------------------------------------
Removing that logic makes test unhappy. In particular if we have the same
values in the index dimensions but different values in the data dimensions,
then it will fail when writing block packed values::
{code:java}
int prefixLenSum = Arrays.stream(commonPrefixLengths).sum();
if (prefixLenSum == packedBytesLength)
{
out.writeByte((byte) -1); }
else {
assert commonPrefixLengths[sortedDim] < bytesPerDim;
...
}{code}
The assert complains as the commonPrefixLengths for the sorted dimension is
equal to bytesPerDim.
> Speed up merging segments of points with data dimensions
> --------------------------------------------------------
>
> Key: LUCENE-8562
> URL: https://issues.apache.org/jira/browse/LUCENE-8562
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Affects Versions: master (8.0), 7.7
> Reporter: Ignacio Vera
> Priority: Major
> Attachments: LUCENE-8562.patch, LUCENE-8562.patch
>
>
> Currently when merging segments of points with data dimensions, all
> dimensions are sorted and carried over down the tree even though only
> indexing dimensions are needed to build the BKD tree. This is needed so leaf
> node data can be compressed by common prefix.
> But when using _MutablePointValues_, this ordering is done at the leaf level
> so we can se a similar approach from data dimensions and delay the sorting at
> leaf level. This seems to speed up indexing time as well as reduce the
> storage needed for building the index.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]