naveentatikonda opened a new issue, #13350:
URL: https://github.com/apache/lucene/issues/13350

   ### Description
   
   While running some benchmarking tests using 
[opensearch-benchmark](https://github.com/opensearch-project/opensearch-benchmark)
 on int8 scalar quantization using some of the standard datasets, I observed 
that there is a significant drop in recall with max inner product space type 
when compared with other space types. 
   
   Here are some of those results
   
   Beam Width | Max Connections
   -- | --
   100 | 16
    
   S. No. | Datasets | Dimension of Vector | Train Data Size | Query Data Size 
| Space Type | fp32 hnsw Recall@100 | sq int8 Recall@100
   -- | -- | -- | -- | -- | -- | -- | --
   1 | cohere-768-l2 | 768 | 1M | 10K |  L2 |  0.94 | 0.87
   2 | cohere-768-IP | 768 | 1M | 10K | Inner Product | **0.94** | **0.36**
   3 | lastfm-64-dot | 65 | 292,385 | 50K | Inner Product | **0.95** | **0.16**
   4 | glove-200-angular | 200 | 1,183,514 | 10K | cosine | 0.74 | 0.60
   5 | glove-100-angular | 100 | 1,183,514 | 10K | cosine | 0.80 | 0.59
   6 | glove-50-angular | 50 | 1,183,514 | 10K | cosine | 0.89 | 0.56
   
   The `cohere-768-IP` dataset can be downloaded from this 
[link](https://dbyiw3u3rf9yr.cloudfront.net/corpora/vectorsearch/cohere-wikipedia-22-12-en-embeddings/documents-1m.hdf5.bz2).
 The L2 version of cohere dataset is generated from the same dataset by 
recomputing ground truth. Rest all datasets are downloaded from 
[here](https://github.com/erikbern/ann-benchmarks?tab=readme-ov-file#data-sets) 
   
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to