jmazanec15 opened a new pull request, #12050:
URL: https://github.com/apache/lucene/pull/12050

   ### Description
   
   Related to #11354 (performance metrics can be found here). I also started a 
draft PR in #11719, but decided to refactor into a new PR.
   
   This PR adds the functionality to initialize a merged segment's HNSW graph 
from the largest HNSW graph from the segments being merged. The graph selected 
must not contain any dead documents. If no suitable intiailizer graph is found, 
it will fall back to creating the graph from scratch.
   
   To support this functionality, a couple of changes to current graph 
construction process needed to be made. OnHeapHnswGraph had to support out of 
order insertion. This is because the mapped ordinals of the nodes in the graph 
used for initialization are not necessarily the first X ordinals in the new 
graph.
   
   I also removed the implicit addition of the first node into the graph. 
Implicitly adding the first node created a lot of complexity for 
initialization. In #11719, I got it to work without changing this but thought 
it was cleaner to switch to require the first node to be added explicitly.
   
   In addition to this, graphs produced by merging two segments are no longer 
necessarily going to be equivalent to indexing one segment directly. This is 
caused by both differences in assigned random values as well as insertion order 
dictating which neighbors are selected for which nodes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to