Hi Folks I've talked with Mike Sokolov and learnt some KNN knowledge from him (thank you!) during ApacheCon and one thing I learnt was that our KNN implementation was kind of suffering from long merging time because we currently rebuild the graph from scratch every time we merge. I noticed there's one effort that is trying to reuse a graph from one segment to save part of the time: https://github.com/apache/lucene/issues/11354.
But I wonder whether it makes sense for us to take a step even further: to be able to delay the HNSW graph merge or only do partial merge and allow multiple HNSW graphs stay in one segment? For example, if we're merging 8 equal sized segments and we can tolerate up to 4 hnsw graphs, then we only need to re-insert half of the documents (after we're able to reuse old graphs). This could slow down the search within the segment by a factor of logK, but could potentially save a lot of merging time, especially when the merge policy is aggressive? Just want to throw this idea out and please feel free to comment! Best Patrick