[
https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566853#comment-17566853
]
Mayya Sharipova commented on LUCENE-10592:
------------------------------------------
[~julietibs] Thanks for studying this PR. Indeed, sorting logic got
complicated, and for now I could not find a better way. As an alternative I was
thinking to completely rebuild a graph with sorted vector values (similar to
merging procedure) but I thought it would take more time than just re-maping
the ordinals.
> Should we build HNSW graph on the fly during indexing
> -----------------------------------------------------
>
> Key: LUCENE-10592
> URL: https://issues.apache.org/jira/browse/LUCENE-10592
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Mayya Sharipova
> Assignee: Mayya Sharipova
> Priority: Minor
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> Currently, when we index vectors for KnnVectorField, we buffer those vectors
> in memory and on flush during a segment construction we build an HNSW graph.
> As building an HNSW graph is very expensive, this makes flush operation take
> a lot of time. This also makes overall indexing performance quite
> unpredictable (as the number of flushes are defined by memory used, and the
> presence of concurrent searches), e.g. some indexing operations return almost
> instantly while others that trigger flush take a lot of time.
> Building an HNSW graph on the fly as we index vectors allows to avoid this
> problem, and spread a load of HNSW graph construction evenly during indexing.
> This will also supersede LUCENE-10194
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]