Re: [PR] Bulk score hnsw neighbor array [lucene]

via GitHub Thu, 28 May 2026 11:35:13 -0700


leng25 commented on code in PR #15958:
URL: https://github.com/apache/lucene/pull/15958#discussion_r3319930244



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##########
@@ -309,31 +312,26 @@ private int 
findWorstNonDiverse(UpdateableRandomVectorScorer scorer) throws IOEx
   }
 
   private boolean isWorstNonDiverse(
-      int candidateIndex, int[] uncheckedIndexes, int uncheckedCursor, 
RandomVectorScorer scorer)
+      int candidateIndex,
+      int[] uncheckedIndexes,
+      int uncheckedCursor,
+      RandomVectorScorer scorer,
+      int[] bulkScoreNodes,
+      float[] bulkScores)
       throws IOException {
     float minAcceptedSimilarity = scores[candidateIndex];
     if (candidateIndex == uncheckedIndexes[uncheckedCursor]) {
       // the candidate itself is unchecked
-      for (int i = candidateIndex - 1; i >= 0; i--) {
-        float neighborSimilarity = scorer.score(nodes[i]);
-        // candidate node is too similar to node i given its score relative to 
the base node
-        if (neighborSimilarity >= minAcceptedSimilarity) {
-          return true;
-        }
-      }
-    } else {
-      // else we just need to make sure candidate does not violate diversity 
with the (newly
-      // inserted) unchecked nodes
-      assert candidateIndex > uncheckedIndexes[uncheckedCursor];
-      for (int i = uncheckedCursor; i >= 0; i--) {
-        float neighborSimilarity = scorer.score(nodes[uncheckedIndexes[i]]);
-        // candidate node is too similar to node i given its score relative to 
the base node
-        if (neighborSimilarity >= minAcceptedSimilarity) {
-          return true;
-        }
-      }
+      return scorer.bulkScore(nodes, bulkScores, candidateIndex) >= 
minAcceptedSimilarity;

Review Comment:
   Hi @benwtrent , thanks for the review!
   Not exactly, it depends on whether the candidate itself is checked or 
unchecked.
   
   Case 1 (candidate is unchecked): brand new node, never been 
diversity-checked against anyone, so we score against all better neighbors 
(0..candidateIndex-1) not just unchecked ones.
   Case 2 (candidate is checked): already survived diversity checks against all 
checked nodes, so we only score against the newly added unchecked nodes.
   
   This is the same logic as before my change, I only replaced the per-node 
scorer.score() loop with bulkScore.
   
   That being said, looking at it again I made two small improvements: renamed 
bulkScoreNodes to uncheckedNodes to make it clearer what it actually contains, 
and moved the population of uncheckedNodes up into findWorstNonDiverse so it's 
built once and reused across calls to isWorstNonDiverse rather than being 
recreated every time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Bulk score hnsw neighbor array [lucene]

Reply via email to