Weird HNSW merge performance result

Patrick Zhai Tue, 10 Oct 2023 20:07:40 -0700

Hi folks,
I was running the HNSW benchmark today and found some weird results. Want
to share it here and see whether people have any ideas.


The set up is:
the 384 dimension vector that's available in luceneutil, 100k documents.
And lucene main branch.
max_conn=64, fanout=0, beam_width=250

I first tried with the default setting where we use a 1994MB writer buffer,
so with 100k documents, there will be no merge happening and I will have 1
segment at the end.
This gives me 0.755 recall and 101113ms index building time.

Then I tried with 50MB writer buffer and then forcemerge at the last, and
with 100k documents, I'll get several segments (the final index is around
300MB so I guess 5 or 6) before merge, and then merge them into 1 at last.
This gives me 0.692 recall but it took only 81562ms (including 34394ms
doing the merge) to index.
I have also tried disabling the initialize from graph feature (such that
when we merge we always rebuild the whole graph), or change the random
seed, but still get the similar result.

I'm wondering:
1. Why recall drops that much in the later setup?
2. Why index time is way better? I think we still need to rebuild the whole
graph, or maybe it's just because we're using more off-heap memory (and
less heap) when merge (do we?)?

Best
Patrick

Weird HNSW merge performance result

Reply via email to