Hi Folks

I've talked with Mike Sokolov and learnt some KNN knowledge from him (thank
you!) during ApacheCon and one thing I learnt was that our KNN
implementation was kind of suffering from long merging time because we
currently rebuild the graph from scratch every time we merge. I noticed
there's one effort that is trying to reuse a graph from one segment to save
part of the time: https://github.com/apache/lucene/issues/11354.

But I wonder whether it makes sense for us to take a step even further: to
be able to delay the HNSW graph merge or only do partial merge and allow
multiple HNSW graphs stay in one segment? For example, if we're merging 8
equal sized segments and we can tolerate up to 4 hnsw graphs, then we only
need to re-insert half of the documents (after we're able to reuse old
graphs). This could slow down the search within the segment by a factor of
logK, but could potentially save a lot of merging time, especially when the
merge policy is aggressive?

Just want to throw this idea out and please feel free to comment!

Best
Patrick

Reply via email to