Re: [I] Cluster Based ANN Vector Search for Lucene [lucene]

via GitHub Sat, 07 Feb 2026 10:22:21 -0800


navneet1v commented on issue #15612:
URL: https://github.com/apache/lucene/issues/15612#issuecomment-3865037707


   Hi @vigyasharma , @atris 
   Thanks for adding this idea. I see that on this thread we have been 
discussing mainly on indexing, I would like to add some thoughts for search too.
   
   1. Similar to current HNSW search implementation we should add the concept 
of minSimilarity score to ensure that within a segment we can start pruning 
more and more centroids or may be a complete segment.
   2. I see this PR: https://github.com/apache/lucene/pull/15676 in Lucene 
which provides the capability to share the minCompetitive scores across 
different shards(aka Lucene indices) in distributed vector search envs. I think 
this cluster based can take benefit from it and can prune more centroids during 
search.
   
   Some questions on the search quality:
   1. In case of filtering/deleted docs do we think with centroids approach we 
will be able to get high recall? For filters like date range queries it is not 
necessary that query vector is embedding the meaning of filters in it which 
then can lead to selection of centroids that can lead to 0 or less than K 
results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Cluster Based ANN Vector Search for Lucene [lucene]

Reply via email to