benwtrent commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1311719842
##########
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene94/Lucene94HnswVectorsWriter.java:
##########
@@ -630,7 +621,8 @@ private abstract static class FieldWriter<T> extends
KnnFieldVectorsWriter<T> {
private final int dim;
private final DocsWithFieldSet docsWithField;
private final List<T> vectors;
- private final HnswGraphBuilder<T> hnswGraphBuilder;
+ private final RandomAccessVectorValues<T> raVectors;
Review Comment:
This could be a local variable. It isn't used in anything other than the ctor
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##########
@@ -868,7 +859,8 @@ private abstract static class FieldWriter<T> extends
KnnFieldVectorsWriter<T> {
private final int dim;
private final DocsWithFieldSet docsWithField;
private final List<T> vectors;
- private final HnswGraphBuilder<T> hnswGraphBuilder;
+ private final RAVectorValues<T> raVectors;
Review Comment:
Can be a local variable. Its only used in the ctor.
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##########
@@ -482,35 +478,30 @@ public void mergeOneField(FieldInfo fieldInfo, MergeState
mergeState) throws IOE
}
}
- private <T> HnswGraphBuilder<T> createHnswGraphBuilder(
+ @SuppressWarnings("unchecked")
+ private HnswGraphBuilder createHnswGraphBuilder(
MergeState mergeState,
FieldInfo fieldInfo,
- RandomAccessVectorValues<T> floatVectorValues,
+ RandomAccessVectorValues<?> vectors,
int initializerIndex)
throws IOException {
+ RandomVectorScorerProvider scorerProvider =
Review Comment:
We should move this up into `mergeOneField`. We already switch on the
encoding and gather things of the correct type.
Then we can pass the constructed `RandomVectorScorerProvider scorerProvider`
to this private method instead of the `RandomAccessVectorValues<?> vectors,`
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##########
@@ -172,106 +83,36 @@ public static KnnCollector search(
* @return a set of collected vectors holding the nearest neighbors found
*/
public static KnnCollector search(
- byte[] query,
- int topK,
- RandomAccessVectorValues<byte[]> vectors,
- VectorEncoding vectorEncoding,
- VectorSimilarityFunction similarityFunction,
- HnswGraph graph,
- Bits acceptOrds,
- int visitedLimit)
- throws IOException {
- KnnCollector collector = new TopKnnCollector(topK, visitedLimit);
- search(query, collector, vectors, vectorEncoding, similarityFunction,
graph, acceptOrds);
- return collector;
- }
-
- /**
- * Searches HNSW graph for the nearest neighbors of a query vector.
- *
- * @param query search query vector
- * @param knnCollector a collector of top knn results to be returned
- * @param vectors the vector values
- * @param similarityFunction the similarity function to compare vectors
- * @param graph the graph values. May represent the entire graph, or a level
in a hierarchical
- * graph.
- * @param acceptOrds {@link Bits} that represents the allowed document
ordinals to match, or
- * {@code null} if they are all allowed to match.
- */
- public static void search(
- byte[] query,
- KnnCollector knnCollector,
- RandomAccessVectorValues<byte[]> vectors,
- VectorEncoding vectorEncoding,
- VectorSimilarityFunction similarityFunction,
- HnswGraph graph,
- Bits acceptOrds)
- throws IOException {
- if (query.length != vectors.dimension()) {
- throw new IllegalArgumentException(
- "vector query dimension: "
- + query.length
- + " differs from field dimension: "
- + vectors.dimension());
- }
- HnswGraphSearcher<byte[]> graphSearcher =
- new HnswGraphSearcher<>(
- vectorEncoding,
- similarityFunction,
- new NeighborQueue(knnCollector.k(), true),
- new SparseFixedBitSet(vectors.size()));
- search(query, knnCollector, vectors, graph, graphSearcher, acceptOrds);
- }
-
- /**
- * Search {@link OnHeapHnswGraph}, this method is thread safe, for
parameters please refer to
- * {@link #search(byte[], int, RandomAccessVectorValues, VectorEncoding,
VectorSimilarityFunction,
- * HnswGraph, Bits, int)}
- */
- public static KnnCollector search(
- byte[] query,
- int topK,
- RandomAccessVectorValues<byte[]> vectors,
- VectorEncoding vectorEncoding,
- VectorSimilarityFunction similarityFunction,
- OnHeapHnswGraph graph,
- Bits acceptOrds,
- int visitedLimit)
+ RandomVectorScorer scorer, int topK, OnHeapHnswGraph graph, Bits
acceptOrds, int visitedLimit)
Review Comment:
I think this breaks with the new change :(
Here we create a `new TopKnnCollector(topK, visitedLimit);`, but the
underlying vector storage could be sparse requiring
`OrdinalTranslatedKnnCollector`. We need to ensure that users that call for
`topK` directly like this correctly have their ordinals applied.
I am not 100% sure the best way to ensure this.
We could provide (again, sorry for suggesting otherwise) ordinal translation
in the `RandomVectorScorer`
Or we remove this and require callers to provide the collector directly. We
are already breaking the API by requiring a scorer now. So, it seems like
requiring that they provide a collector isn't any more breaking.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]