naveentatikonda commented on issue #13350:
URL: https://github.com/apache/lucene/issues/13350#issuecomment-2110683420

   > OK, I ran it again, on my index where the flush was set at 28MB & force 
merged. This time I ran it over all 10k queries (previously it was just 1k, as 
calculating the true nearest neighbors takes significant time).
   > 
   > Recall@100 is a stead: `0.738`.
   
   @benwtrent Sorry for the delay in my response. Can you share more details 
about the dataset you used to get this recall. Is it a subset of this 
[Cohere-wikipedia-22-12-en-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings/tree/main/data)
 dataset. I mean is the training data the first million vectors and the query 
data is the next 10k vectors ? If possible can you please share your dataset 
and the ground truth you generated through github or huggingface.
   
   Also, have you force merged to 1 segment before running the search queries?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to