[ https://issues.apache.org/jira/browse/LUCENE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431160#comment-17431160 ]
Robert Muir commented on LUCENE-10191: -------------------------------------- do we really need these slower functions? IMO the dot product is already slow enough in java! Being a lower-level library, and having to support backwards compatibility for a long time, I'd like us to consider keeping this stuff to a minimum. Precomputing stuff to support these functions seems like the wrong direction to me, I think they should be removed, and users should just use the dot product. > Optimize vector functions by precomputing magnitudes > ---------------------------------------------------- > > Key: LUCENE-10191 > URL: https://issues.apache.org/jira/browse/LUCENE-10191 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Julie Tibshirani > Priority: Minor > > Both euclidean distance (L2 norm) and cosine similarity can be expressed in > terms of dot product and vector magnitudes: > * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2) > * cosine(a, b) = a . b / ||a|| ||b|| > We could compute and store each vector's magnitude upfront while indexing, > and compute the query vector's magnitude once per query. Then we'd calculate > the distance using our (very optimized) dot product method, plus the > precomputed values. > This is an exploratory issue: I haven't tested this out yet, so I'm not sure > how much it would help. I would at least expect it to help with cosine > similarity – several months ago we tried out similar ideas in Elasticsearch > and were able to get a nice boost in cosine performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org