kaivalnp commented on code in PR #15979:
URL: https://github.com/apache/lucene/pull/15979#discussion_r3135429420


##########
lucene/core/src/resources/META-INF/services/org.apache.lucene.codecs.KnnVectorsFormat:
##########
@@ -16,3 +16,4 @@
 org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat
 org.apache.lucene.codecs.lucene104.Lucene104ScalarQuantizedVectorsFormat
 org.apache.lucene.codecs.lucene104.Lucene104HnswScalarQuantizedVectorsFormat
+org.apache.lucene.codecs.dedup.DedupFlatVectorsFormat

Review Comment:
   This is not required -- the raw vector format will be wrapped in its own 
`Lucene*HnswVectorsReader` which will be exposed here (keeping this for later).
   
   I had to add this here to demonstrate that tests are passing (directly uses 
it as the KNN vector format).



##########
lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java:
##########
@@ -68,6 +68,21 @@ public int getVectorByteLength() {
     return dimension() * getEncoding().byteSize;
   }
 
+  /**
+   * Returns the byte offset within the backing storage for the vector at the 
given ordinal. The
+   * default implementation assumes vectors are stored contiguously: {@code 
ord *
+   * getVectorByteLength()}.
+   *
+   * <p>Formats that use indirection (e.g., de-duplicating formats with an 
ordinal mapping) should
+   * override this to return the correct offset.
+   *
+   * @param ord the vector ordinal
+   * @return the byte offset for the vector
+   */
+  public long ordToOffset(int ord) {

Review Comment:
   IMO this is the biggest change with this format -- the offset of a vector in 
the raw file was simply `ord * byteSize` earlier -- but with a de-duplicating 
format the order is broken + multiple ordinals can "point" to the same vector, 
so we need an explicit function to resolve the offset.
   
   This is probably more suitable for an interface like `HasIndexSlice` though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to