[
https://issues.apache.org/jira/browse/LUCENE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424581#comment-17424581
]
Julie Tibshirani edited comment on LUCENE-9695 at 10/5/21, 4:57 PM:
--------------------------------------------------------------------
I just noticed this commit is associated with a steep decrease in QPS on the
nightly benchmarks:
!Screen Shot 2021-10-05 at 9.50.53 AM.png|height=300!
Is this expected, or does it suggest there are performance improvements/ fixes
to look into?
was (Author: julietibs):
I just noticed this commit is associated with a steep decrease in QPS on the
nightly benchmarks:
!Screen Shot 2021-10-05 at 9.50.53 AM.png!
Is this expected, or does it suggest there are performance improvements/ fixes
to look into?
> Don't include deleted documents when merging vectors
> ----------------------------------------------------
>
> Key: LUCENE-9695
> URL: https://issues.apache.org/jira/browse/LUCENE-9695
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael Sokolov
> Priority: Major
> Attachments: Screen Shot 2021-10-05 at 9.50.53 AM.png
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> While testing HNSW searches with multi-segment indexes, all kinds of strange
> things were happening; recall performance was radically different for a
> force-merged multi-segment index than for the same index built as a single
> segment. Most testing I've done to date has been with single-segment indexes,
> shame on me.
> One issue is that when merging we iterate over all the vectors from 0 ..
> size-1. But this size was being calculated without taking deletions into
> account, and this caused deleted vectors to be included in the graph leading
> to exceptions and weird inconsistencies.
> The other issue has to do with aliasing in the diverse neighbor selection
> graph construction heuristic introduced recently. Sometimes vectors to be
> compared would be drawn from the same VectorValues, but this is a no-no since
> they are then the same vector (the first one will be overwritten when the
> second one is fetched). This leads to poor results, but not errors per se,
> but the results also became unpredictable in a way that causes the test
> written to reproduce the first issue to fail. Thus I'll include both fixes
> together.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]