[ https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566853#comment-17566853 ]
Mayya Sharipova commented on LUCENE-10592: ------------------------------------------ [~julietibs] Thanks for studying this PR. Indeed, sorting logic got complicated, and for now I could not find a better way. As an alternative I was thinking to completely rebuild a graph with sorted vector values (similar to merging procedure) but I thought it would take more time than just re-maping the ordinals. > Should we build HNSW graph on the fly during indexing > ----------------------------------------------------- > > Key: LUCENE-10592 > URL: https://issues.apache.org/jira/browse/LUCENE-10592 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Mayya Sharipova > Assignee: Mayya Sharipova > Priority: Minor > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently, when we index vectors for KnnVectorField, we buffer those vectors > in memory and on flush during a segment construction we build an HNSW graph. > As building an HNSW graph is very expensive, this makes flush operation take > a lot of time. This also makes overall indexing performance quite > unpredictable (as the number of flushes are defined by memory used, and the > presence of concurrent searches), e.g. some indexing operations return almost > instantly while others that trigger flush take a lot of time. > Building an HNSW graph on the fly as we index vectors allows to avoid this > problem, and spread a load of HNSW graph construction evenly during indexing. > This will also supersede LUCENE-10194 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org