benwtrent commented on code in PR #11946:
URL: https://github.com/apache/lucene/pull/11946#discussion_r1048856364


##########
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##########
@@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) {
    * @throws IllegalArgumentException if <code>k</code> is less than 1
    */
   public KnnVectorQuery(String field, float[] target, int k, Query filter) {
+    this(field, target, k, Float.NEGATIVE_INFINITY, filter);
+  }
+
+  /**
+   * Find the <code>k</code> nearest documents to the target vector according 
to the vectors in the
+   * given field. <code>target</code> vector.
+   *
+   * @param field a field that has been indexed as a {@link KnnVectorField}.
+   * @param target the target of the search
+   * @param k the number of documents to find (the upper bound)
+   * @param similarityThreshold the minimum acceptable value of similarity

Review Comment:
   @msokolov you haven't missed anything. I am specifically talking about users 
providing `similarityThreshold` to the query. If they have calculating that 
they want a specific `cosine` or `dotProduct` similarity, they would then need 
to adjust that to match Lucene's scoring transformation.
   
   I think that `similarityThreshold` should mean vector similarities. We can 
transform it for the user to reflect the score that similarity represents 
(given vector encoding type and similarity function).
   
   
   An example here is `dotProduct`. The user knows they want `FLOAT32` vectors 
within a dotProduct of 0.7. With this API that ACTUALLY means they want to 
limit the scores to .85 (`(1 + dotProduct)/2`). How is the user supposed to 
know that?
   
   This seems really weird to me.
   
   This doesn't take into account the different scoring methods between vector 
types as well, which can get even more confusing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to