[ 
https://issues.apache.org/jira/browse/LUCENE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435228#comment-17435228
 ] 

Adrien Grand commented on LUCENE-10191:
---------------------------------------

One downside of having distance functions in FieldType/FieldInfo is that every 
possible KnnVectorsFormat must support all possible distances in order to be a 
legal format.

On the other hand, if we made the distance an implementation detail of 
KnnVectorsFormat, then we would no longer be able to do things like falling 
back to scanning when a very selective filter is applied on the index 
(similarly to what IndexOrDocValuesQuery is doing).

I wonder if a middle ground could be to remove vectorSimilarityFunction from 
FieldType/FieldInfo and add new APIs to {{KnnVectorsReader}} / {{LeafReader}} 
that would expose how distances are computed. Maybe something like 
{{DoubleValues computeDistancesFrom(String field, float[] queryVector)}}. And 
Lucene90HnswFormat could take an additional distance function in its 
constructor, which is how the distance function could be configured.

> Optimize vector functions by precomputing magnitudes
> ----------------------------------------------------
>
>                 Key: LUCENE-10191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10191
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Julie Tibshirani
>            Priority: Minor
>
> Both euclidean distance (L2 norm) and cosine similarity can be expressed in 
> terms of dot product and vector magnitudes:
>  * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2)
>  * cosine(a, b) = a . b / ||a|| ||b||
> We could compute and store each vector's magnitude upfront while indexing, 
> and compute the query vector's magnitude once per query. Then we'd calculate 
> the distance using our (very optimized) dot product method, plus the 
> precomputed values.
> This is an exploratory issue: I haven't tested this out yet, so I'm not sure 
> how much it would help. I would at least expect it to help with cosine 
> similarity – several months ago we tried out similar ideas in Elasticsearch 
> and were able to get a nice boost in cosine performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to