Greetings,

We are experiencing slow HNSW creation times during index merge. Specifically, 
we have noticed that the HNSW graph creation becomes progressively slow after 
reaching a certain size.

Our indexing workflow creates around 60 indices, each containing approximately 
500k vectors. The vector dimensions are 768 floats. We then merge all these 
small indices into a single large index, with a force segment size of 1. During 
the merge step, the HNSW graph creation starts off with good performance, 
taking about 15 seconds to process 10k documents. However, once the graph 
reaches around 7.5m documents, the performance starts to degrade significantly. 
10k documents now take about 30 minutes to process, and the processing time 
continues to increase as the graph becomes larger. We have observed similar 
performance issues with different setting, M=16 with a beam width of 100, and 
M=32 with a beam width of 50.

We are using Lucene version 9.8.0 and Java version `openjdk 17.0.3` Our Java 
heap is set to 30GB, and we do not use any data compression for the vectors. 
Additionally, we have not observed any long or continuous Garbage Collection 
pauses.

Greatly appreciate any pointers or thoughts on how to further debug this issue 
or improve the performance.

Thanks
Kannan Krishnamurthy.

Reply via email to