[ 
https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570929#comment-17570929
 ] 

ASF subversion and git services commented on LUCENE-10592:
----------------------------------------------------------

Commit b15bcd11c333a96c043a3cc1e3498b8b09e7d6a2 in lucene's branch 
refs/heads/branch_9x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b15bcd11c33 ]

LUCENE-10592 Strengthen 
TestHnswGraph::testSortedAndUnsortedIndicesReturnSameResults

This test occasionally fails if knn search returns only 1 document
in the index, as we have an assertion that returned doc IDs from
sorted and unsorted index must be different.

This patch ensures that we have many documents in the index, so
that knn search always returns enough results.


> Should we build HNSW graph on the fly during indexing
> -----------------------------------------------------
>
>                 Key: LUCENE-10592
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10592
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Mayya Sharipova
>            Assignee: Mayya Sharipova
>            Priority: Minor
>             Fix For: 9.4
>
>          Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, when we index vectors for KnnVectorField, we buffer those vectors 
> in memory and on flush during a segment construction we build an HNSW graph.  
> As building an HNSW graph is very expensive, this makes flush operation take 
> a lot of time. This also makes overall indexing performance quite 
> unpredictable (as the number of flushes are defined by memory used, and the 
> presence of concurrent searches), e.g. some indexing operations return almost 
> instantly while others that trigger flush take a lot of time. 
> Building an HNSW graph on the fly as we index vectors allows to avoid this 
> problem, and spread a load of HNSW graph construction evenly during indexing.
> This will also supersede LUCENE-10194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to