[ 
https://issues.apache.org/jira/browse/LUCENE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459633#comment-17459633
 ] 

Julie Tibshirani commented on LUCENE-10191:
-------------------------------------------

I've been pondering this more. It definitely seems possible to move 
VectorSimilarityFunction to Lucene90HnswVectorsFormat. Maybe we could start 
with something really simple, like a new method 
{{VectorValues#computeDistance(float[] query)}} that uses the configured 
distance function. I guess {{computeDistance}} could give a simple interface 
but do something fancy if it wanted to, since it knows how exactly its vectors 
are represented.

My main hesitation is that VectorSimilarityFunction is a cross-cutting concept 
that makes sense across format implementations. In fact, I would expect all 
KnnVectorsFormat to support dot product, euclidean, and cosine. Could there be 
drift across different formats (maybe a vector function is missing, or is named 
something different) in a way that hurts users?

> Optimize vector functions by precomputing magnitudes
> ----------------------------------------------------
>
>                 Key: LUCENE-10191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10191
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Julie Tibshirani
>            Priority: Minor
>
> Both euclidean distance (L2 norm) and cosine similarity can be expressed in 
> terms of dot product and vector magnitudes:
>  * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2)
>  * cosine(a, b) = a . b / ||a|| ||b||
> We could compute and store each vector's magnitude upfront while indexing, 
> and compute the query vector's magnitude once per query. Then we'd calculate 
> the distance using our (very optimized) dot product method, plus the 
> precomputed values.
> This is an exploratory issue: I haven't tested this out yet, so I'm not sure 
> how much it would help. I would at least expect it to help with cosine 
> similarity – several months ago we tried out similar ideas in Elasticsearch 
> and were able to get a nice boost in cosine performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to