leng25 commented on code in PR #15958:
URL: https://github.com/apache/lucene/pull/15958#discussion_r3319930244
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##########
@@ -309,31 +312,26 @@ private int
findWorstNonDiverse(UpdateableRandomVectorScorer scorer) throws IOEx
}
private boolean isWorstNonDiverse(
- int candidateIndex, int[] uncheckedIndexes, int uncheckedCursor,
RandomVectorScorer scorer)
+ int candidateIndex,
+ int[] uncheckedIndexes,
+ int uncheckedCursor,
+ RandomVectorScorer scorer,
+ int[] bulkScoreNodes,
+ float[] bulkScores)
throws IOException {
float minAcceptedSimilarity = scores[candidateIndex];
if (candidateIndex == uncheckedIndexes[uncheckedCursor]) {
// the candidate itself is unchecked
- for (int i = candidateIndex - 1; i >= 0; i--) {
- float neighborSimilarity = scorer.score(nodes[i]);
- // candidate node is too similar to node i given its score relative to
the base node
- if (neighborSimilarity >= minAcceptedSimilarity) {
- return true;
- }
- }
- } else {
- // else we just need to make sure candidate does not violate diversity
with the (newly
- // inserted) unchecked nodes
- assert candidateIndex > uncheckedIndexes[uncheckedCursor];
- for (int i = uncheckedCursor; i >= 0; i--) {
- float neighborSimilarity = scorer.score(nodes[uncheckedIndexes[i]]);
- // candidate node is too similar to node i given its score relative to
the base node
- if (neighborSimilarity >= minAcceptedSimilarity) {
- return true;
- }
- }
+ return scorer.bulkScore(nodes, bulkScores, candidateIndex) >=
minAcceptedSimilarity;
Review Comment:
Hi @benwtrent , thanks for the review!
Not exactly, it depends on whether the candidate itself is checked or
unchecked.
Case 1 (candidate is unchecked): brand new node, never been
diversity-checked against anyone, so we score against all better neighbors
(0..candidateIndex-1) not just unchecked ones.
Case 2 (candidate is checked): already survived diversity checks against all
checked nodes, so we only score against the newly added unchecked nodes.
This is the same logic as before my change, I only replaced the per-node
scorer.score() loop with bulkScore.
That being said, looking at it again I made two small improvements: renamed
bulkScoreNodes to uncheckedNodes to make it clearer what it actually contains,
and moved the population of uncheckedNodes up into findWorstNonDiverse so it's
built once and reused across calls to isWorstNonDiverse rather than being
recreated every time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]