[ https://issues.apache.org/jira/browse/LUCENE-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431364#comment-17431364 ]
Julie Tibshirani commented on LUCENE-10147: ------------------------------------------- Thanks [~sokolov], that makes sense. > KnnVectorQuery can produce negative scores > ------------------------------------------ > > Key: LUCENE-10147 > URL: https://issues.apache.org/jira/browse/LUCENE-10147 > Project: Lucene - Core > Issue Type: Bug > Reporter: Julie Tibshirani > Priority: Blocker > Time Spent: 1h 50m > Remaining Estimate: 0h > > The cosine similarity of two vectors falls in the range [-1, 1]. So currently > with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe > we should just adjust the scores in this case by adding 1, shifting them to > the range [0, 2]. > As a side note, this made me notice that > {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need > to know to normalize all document and query vectors to unit length when using > this similarity. Otherwise the output is unbounded and difficult to handle in > scoring. Also dot product is not a true metric: for example, it doesn't obey > the triangle inequality. So many ANN algorithms have trouble supporting it. > As part of this issue, we could improve the documentation on > {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is > required. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org