navneet1v commented on issue #15612: URL: https://github.com/apache/lucene/issues/15612#issuecomment-3865037707
Hi @vigyasharma , @atris Thanks for adding this idea. I see that on this thread we have been discussing mainly on indexing, I would like to add some thoughts for search too. 1. Similar to current HNSW search implementation we should add the concept of minSimilarity score to ensure that within a segment we can start pruning more and more centroids or may be a complete segment. 2. I see this PR: https://github.com/apache/lucene/pull/15676 in Lucene which provides the capability to share the minCompetitive scores across different shards(aka Lucene indices) in distributed vector search envs. I think this cluster based can take benefit from it and can prune more centroids during search. Some questions on the search quality: 1. In case of filtering/deleted docs do we think with centroids approach we will be able to get high recall? For filters like date range queries it is not necessary that query vector is embedding the meaning of filters in it which then can lead to selection of centroids that can lead to 0 or less than K results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
