[ https://issues.apache.org/jira/browse/LUCENE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431363#comment-17431363 ]
ASF subversion and git services commented on LUCENE-10146: ---------------------------------------------------------- Commit 6bb2bbcd6ab2e07a646c17351437ea5210b08004 in lucene's branch refs/heads/main from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6bb2bbc ] LUCENE-10146: Add note that dot product is preferred over cosine (#400) While VectorSimilarityFunction#COSINE is helpful when you need to preserve the original vectors, it is significantly slower than DOT_PRODUCT. This commit adds javadocs to COSINE explaining that dot product is the fastest option. > Add VectorSimilarityFunction.COSINE > ----------------------------------- > > Key: LUCENE-10146 > URL: https://issues.apache.org/jira/browse/LUCENE-10146 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Julie Tibshirani > Priority: Major > Fix For: main (9.0) > > Time Spent: 2h 20m > Remaining Estimate: 0h > > To perform ANN search with cosine similarity, users are expected to normalize > the document and query vectors to unit length, then use > {{VectorSimilarityFunction.DOT_PRODUCT}}. I think it would be good to also > support cosine similarity directly through > {{VectorSimilarityFunction.COSINE}}. This would allow users to perform ANN > based on cosine similarity, while retaining access to the original vectors > through {{VectorValues}}. That way they can use the original vectors in a > reranking step or return them to the application for further processing. > It looks like nmslib and hnswlib support cosine similarity. On the other > hand, FAISS only supports dot product and suggests users normalize the > vectors to perform cosine similarity > (https://github.com/facebookresearch/faiss/issues/95). To me adding this one > additional similarity is worth it in terms of what it lets users accomplish. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org