jtibshirani commented on code in PR #1054:
URL: https://github.com/apache/lucene/pull/1054#discussion_r950631270


##########
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##########
@@ -213,4 +243,48 @@ public static void add(float[] u, float[] v) {
       u[i] += v[i];
     }
   }
+
+  /**
+   * Dot product computed over signed bytes.
+   *
+   * @param a bytes containing a vector
+   * @param b bytes containing another vector, of the same dimension
+   * @return the value of the dot product of the two vectors
+   */
+  public static float dotProduct(BytesRef a, BytesRef b) {
+    assert a.length == b.length;
+    int total = 0;
+    int aOffset = a.offset, bOffset = b.offset;
+    for (int i = 0; i < a.length; i++) {
+      total += a.bytes[aOffset++] * b.bytes[bOffset++];
+    }
+    return total;
+  }
+
+  /**
+   * Dot product score computed over signed bytes, scaled to be in [0, 1].
+   *
+   * @param a bytes containing a vector
+   * @param b bytes containing another vector, of the same dimension
+   * @return the value of the similarity function applied to the two vectors
+   */
+  public static float dotProductScore(BytesRef a, BytesRef b) {
+    // divide by 2 * 2^14 (maximum absolute value of product of 2 signed 
bytes) * len
+    return (1 + dotProduct(a, b)) / (float) (a.length * (1 << 15));
+  }
+
+  /**
+   * Convert a floating point vector to an array of bytes using casting; the 
vector values should be
+   * in [-128,127]
+   *
+   * @param vector a vector
+   * @return a new BytesRef containing the vector's values cast to byte.
+   */
+  public static BytesRef toBytesRef(float[] vector) {

Review Comment:
   Then maybe we could do a check in `Lucene94HnswVectorsReader`? The main 
thing I had in mind was if a user accidentally provides a query where the 
elements don't fall in the range [-128, 127]. We would silently cast the floats 
to bytes, which could result in incorrect search results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to