[ 
https://issues.apache.org/jira/browse/LUCENE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431363#comment-17431363
 ] 

ASF subversion and git services commented on LUCENE-10146:
----------------------------------------------------------

Commit 6bb2bbcd6ab2e07a646c17351437ea5210b08004 in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6bb2bbc ]

LUCENE-10146: Add note that dot product is preferred over cosine (#400)

While VectorSimilarityFunction#COSINE is helpful when you need to preserve the
original vectors, it is significantly slower than DOT_PRODUCT. This commit adds
javadocs to COSINE explaining that dot product is the fastest option.

> Add VectorSimilarityFunction.COSINE
> -----------------------------------
>
>                 Key: LUCENE-10146
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10146
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Julie Tibshirani
>            Priority: Major
>             Fix For: main (9.0)
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> To perform ANN search with cosine similarity, users are expected to normalize 
> the document and query vectors to unit length, then use 
> {{VectorSimilarityFunction.DOT_PRODUCT}}. I think it would be good to also 
> support cosine similarity directly through 
> {{VectorSimilarityFunction.COSINE}}. This would allow users to perform ANN 
> based on cosine similarity, while retaining access to the original vectors 
> through {{VectorValues}}. That way they can use the original vectors in a 
> reranking step or return them to the application for further processing.
> It looks like nmslib and hnswlib support cosine similarity. On the other 
> hand, FAISS only supports dot product and suggests users normalize the 
> vectors to perform cosine similarity 
> (https://github.com/facebookresearch/faiss/issues/95). To me adding this one 
> additional similarity is worth it in terms of what it lets users accomplish.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to