benwtrent commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1048856364
########## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ########## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @throws IllegalArgumentException if <code>k</code> is less than 1 */ public KnnVectorQuery(String field, float[] target, int k, Query filter) { + this(field, target, k, Float.NEGATIVE_INFINITY, filter); + } + + /** + * Find the <code>k</code> nearest documents to the target vector according to the vectors in the + * given field. <code>target</code> vector. + * + * @param field a field that has been indexed as a {@link KnnVectorField}. + * @param target the target of the search + * @param k the number of documents to find (the upper bound) + * @param similarityThreshold the minimum acceptable value of similarity Review Comment: @msokolov you haven't missed anything. I am specifically talking about users providing `similarityThreshold` to the query. If they have calculating that they want a specific `cosine` or `dotProduct` similarity, they would then need to adjust that to match Lucene's scoring transformation. I think that `similarityThreshold` should mean vector similarities. We can transform it for the user to reflect the score that similarity represents (given vector encoding type and similarity function). An example here is `dotProduct`. The user knows they want `FLOAT32` vectors within a dotProduct of 0.7. With this API that ACTUALLY means they want to limit the scores to .85 (`(1 + dotProduct)/2`). How is the user supposed to know that? This seems really weird to me. This doesn't take into account the different scoring methods between vector types as well, which can get even more confusing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org