[ 
https://issues.apache.org/jira/browse/LUCENE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431166#comment-17431166
 ] 

Robert Muir commented on LUCENE-10191:
--------------------------------------

Also, talking about storing stuff differently makes it obvious, that these 
slower functions should go.

Instead, slower functions needing different representation should really be 
different codecs. We can still reuse code, but it allows us to e.g. support 
different functions without signing up for backwards compatibility. Otherwise, 
I'm personally gonna feel the need to pushback every single time on all these 
functions, because I think we've already attempted to sign up for too much. And 
trying to support these functions the way it happens now is wrong to do and 
will lead to hairballs.

{{VectorSimilarityFunction}} must be removed, and support for this stuff placed 
in lucene/codecs

> Optimize vector functions by precomputing magnitudes
> ----------------------------------------------------
>
>                 Key: LUCENE-10191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10191
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Julie Tibshirani
>            Priority: Minor
>
> Both euclidean distance (L2 norm) and cosine similarity can be expressed in 
> terms of dot product and vector magnitudes:
>  * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2)
>  * cosine(a, b) = a . b / ||a|| ||b||
> We could compute and store each vector's magnitude upfront while indexing, 
> and compute the query vector's magnitude once per query. Then we'd calculate 
> the distance using our (very optimized) dot product method, plus the 
> precomputed values.
> This is an exploratory issue: I haven't tested this out yet, so I'm not sure 
> how much it would help. I would at least expect it to help with cosine 
> similarity – several months ago we tried out similar ideas in Elasticsearch 
> and were able to get a nice boost in cosine performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to