[jira] [Created] (LUCENE-10397) KnnVectorQuery doesn't tie break by doc ID

Adrien Grand (Jira) Mon, 31 Jan 2022 06:36:08 -0800

Adrien Grand created LUCENE-10397:
-------------------------------------

             Summary: KnnVectorQuery doesn't tie break by doc ID
                 Key: LUCENE-10397
                 URL: https://issues.apache.org/jira/browse/LUCENE-10397
             Project: Lucene - Core
          Issue Type: Task
            Reporter: Adrien Grand



I was expecting KnnVectorQUery to tie-break by doc ID so that if multiple 
documents get the same score then the ones that have the lowest doc ID would 
get returned first, similarly to how SortField.SCORE also tie-breaks by doc ID.

However the following test fails, suggesting that it is not the case.

{code:java}
  public void testTieBreak() throws IOException {
    try (Directory d = newDirectory()) {
      try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) {
        for (int j = 0; j < 5; j++) {
          Document doc = new Document();
          doc.add(
              new KnnVectorField("field", new float[] {0, 1}, 
VectorSimilarityFunction.DOT_PRODUCT));
          w.addDocument(doc);
        }
      }
      try (IndexReader reader = DirectoryReader.open(d)) {
        assertEquals(1, reader.leaves().size());
        IndexSearcher searcher = new IndexSearcher(reader);
        KnnVectorQuery query = new KnnVectorQuery("field", new float[] {2, 3}, 
3);
        TopDocs topHits = searcher.search(query, 3);
        assertEquals(0, topHits.scoreDocs[0].doc);
        assertEquals(1, topHits.scoreDocs[1].doc);
        assertEquals(2, topHits.scoreDocs[2].doc);
      }
    }
  }
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10397) KnnVectorQuery doesn't tie break by doc ID

Reply via email to