[GitHub] [lucene] rmuir commented on a diff in pull request #11946: add similarity threshold for hnsw

GitBox Sat, 17 Dec 2022 20:26:00 -0800


rmuir commented on code in PR #11946:
URL: https://github.com/apache/lucene/pull/11946#discussion_r1051525868



##########
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##########
@@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) {
    * @throws IllegalArgumentException if <code>k</code> is less than 1
    */
   public KnnVectorQuery(String field, float[] target, int k, Query filter) {
+    this(field, target, k, Float.NEGATIVE_INFINITY, filter);
+  }
+
+  /**
+   * Find the <code>k</code> nearest documents to the target vector according 
to the vectors in the
+   * given field. <code>target</code> vector.
+   *
+   * @param field a field that has been indexed as a {@link KnnVectorField}.
+   * @param target the target of the search
+   * @param k the number of documents to find (the upper bound)
+   * @param similarityThreshold the minimum acceptable value of similarity

Review Comment:
   still don't have any explanation here as to why we'd do this for vector 
search query. we avoided any such thresholds or normalization in any of 
lucene's scoring for decades: if we didn't do that, we would have never been 
able to implement block-max WAND or other algorithms because they'd be 
incompatible.
   
   please see:
   * 
https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-CanIfilterbyscore?
   * https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages
   
   I don't mind being the bad guy blocking this change because it seems like it 
has not been thought thru.
   
   You must convince me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #11946: add similarity threshold for hnsw

Reply via email to