jmazanec15 opened a new pull request, #12050: URL: https://github.com/apache/lucene/pull/12050
### Description Related to #11354 (performance metrics can be found here). I also started a draft PR in #11719, but decided to refactor into a new PR. This PR adds the functionality to initialize a merged segment's HNSW graph from the largest HNSW graph from the segments being merged. The graph selected must not contain any dead documents. If no suitable intiailizer graph is found, it will fall back to creating the graph from scratch. To support this functionality, a couple of changes to current graph construction process needed to be made. OnHeapHnswGraph had to support out of order insertion. This is because the mapped ordinals of the nodes in the graph used for initialization are not necessarily the first X ordinals in the new graph. I also removed the implicit addition of the first node into the graph. Implicitly adding the first node created a lot of complexity for initialization. In #11719, I got it to work without changing this but thought it was cleaner to switch to require the first node to be added explicitly. In addition to this, graphs produced by merging two segments are no longer necessarily going to be equivalent to indexing one segment directly. This is caused by both differences in assigned random values as well as insertion order dictating which neighbors are selected for which nodes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org