benwtrent commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1310431863


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -165,11 +134,7 @@ private HnswGraphBuilder(
    * @param vectorsToAdd the vectors for which to build a nearest neighbors 
graph. Must be an
    *     independent accessor for the vectors
    */
-  public OnHeapHnswGraph build(RandomAccessVectorValues<T> vectorsToAdd) 
throws IOException {
-    if (vectorsToAdd == this.vectors) {
-      throw new IllegalArgumentException(
-          "Vectors to build must be independent of the source of vectors 
provided to HnswGraphBuilder()");
-    }
+  public OnHeapHnswGraph build(RandomAccessVectorValues<?> vectorsToAdd) 
throws IOException {

Review Comment:
   So, this actually doesn't use anything related to `RandomAccessVectorValues` 
other than checking the size. Is there a better interface here than passing in 
`RandomAccessVectorValues<?> vectorsToAdd`? Maybe `public OnHeapHnswGraph 
build(int maxOrd) throws IOException {` ?



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##########
@@ -366,38 +324,11 @@ private void popToScratch(GraphBuilderKnnCollector 
candidates) {
    * @param neighbors the neighbors selected so far
    * @return whether the candidate is diverse given the existing neighbors
    */
-  private boolean diversityCheck(int candidate, float score, NeighborArray 
neighbors)
-      throws IOException {
-    return isDiverse(candidate, neighbors, score);
-  }
-
-  private boolean isDiverse(int candidate, NeighborArray neighbors, float 
score)
-      throws IOException {
-    return switch (vectorEncoding) {
-      case BYTE -> isDiverse((byte[]) vectors.vectorValue(candidate), 
neighbors, score);
-      case FLOAT32 -> isDiverse((float[]) vectors.vectorValue(candidate), 
neighbors, score);
-    };
-  }
-
-  private boolean isDiverse(float[] candidate, NeighborArray neighbors, float 
score)
-      throws IOException {
-    for (int i = 0; i < neighbors.size(); i++) {
-      float neighborSimilarity =
-          similarityFunction.compare(
-              candidate, (float[]) vectorsCopy.vectorValue(neighbors.node[i]));
-      if (neighborSimilarity >= score) {
-        return false;
-      }
-    }
-    return true;
-  }
-
-  private boolean isDiverse(byte[] candidate, NeighborArray neighbors, float 
score)
+  private boolean diversityCheck(
+      RandomVectorScorer scorer, int candidate, float score, NeighborArray 
neighbors)
       throws IOException {
     for (int i = 0; i < neighbors.size(); i++) {
-      float neighborSimilarity =
-          similarityFunction.compare(
-              candidate, (byte[]) vectorsCopy.vectorValue(neighbors.node[i]));
+      float neighborSimilarity = scorer.symmetricScore(candidate, 
neighbors.node[i]);

Review Comment:
   My thought here is that `scorerProvider` creates a new scorer for 
`candidate` and iterates the neighbors. 
   
   But, I wonder how many short lived objects we will be creating doing that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to