adamjq commented on code in PR #4532:
URL: https://github.com/apache/solr/pull/4532#discussion_r3494011849


##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -814,7 +814,37 @@ Some use cases where `includeTags` and/or `excludeTags` 
may be more useful then
 
 
 
-=== Usage in Re-Ranking Query
+[[vector-reranking]]
+== Usage in Re-Ranking Query
+
+Dense vector similarity scores can be used to 
xref:query-guide:query-re-ranking.adoc[re-rank] first pass query results.
+Possible use cases include:
+
+* Re-ranking approximate results from a quantized vector field using full 
fidelity float vectors.
+* Re-ranking lexical search results with dense vector similarity scores.
+
+Details about using the ReRank Query Parser can be found in the 
xref:query-guide:query-re-ranking.adoc[Query Re-Ranking] section.
+
+=== Re-Ranking with vectorSimilarity Function Query
+
+The 
xref:query-guide:function-queries.adoc#vectorsimilarity-function[vectorSimilarity()]
 function can be used with the `{!func}` query parser to re-rank by vector 
similarity.
+When used as a function query, `vectorSimilarity()` computes the exact 
similarity for only the candidate documents selected for re-ranking, without 
traversing the index graph.
+
+Here is an example of re-ranking a lexical query using a `DenseVectorField` 
named `vector`:
+
+[source,text]
+?q=title:phone&rq={!rerank reRankQuery=$rqq reRankDocs=100 
reRankWeight=1}&rqq={!func}vectorSimilarity(vector,[1.0,2.0,3.0,4.0])
+
+NOTE: The default `reRankOperator` is `add`, which sums the first-pass score 
and the vector similarity score.
+Since these scores may differ in magnitude, you can adjust `reRankWeight` to 
control the balance between them, or use `reRankOperator=replace` to score 
re-ranked documents by vector similarity alone.
+
+When using a quantized vector field type (such as 
`ScalarQuantizedDenseVectorField`), the KNN first pass scores are computed on 
the quantized vectors.
+Here is an example of re-ranking those results with exact float similarity 
scores, where `topK` matches `reRankDocs`:
+
+[source,text]
+?q={!knn f=vector topK=100}[1.0,2.0,3.0,4.0]&rq={!rerank reRankQuery=$rqq 
reRankDocs=100 reRankWeight=1 
reRankOperator=replace}&rqq={!func}vectorSimilarity(vector,[1.0,2.0,3.0,4.0])

Review Comment:
   It's interesting that this was flagged by Copilot, because it's a source of 
confusion in the existing documentation and I believe the claim made is 
incorrect. This would be true if the field used BYTE encoding (meaning int8 
vectors are externally supplied), but not if Solr is quantizing vectors.
   
   `ScalarQuantizedDenseVectorField` builds the HNSW graphs using the scalar 
quantized representations of embeddings. The original float embeddings remain 
accessible via `FloatVectorValues` and the full precision vectors are used for 
re-quantization during segment merges.
   
   The code path for `{!func}vectorSimilarity` is:
   1. 
[VectorSimilaritySourceParser](https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/VectorSimilaritySourceParser.java#L108)
 - calls `getValueSource`
   2. 
[DenseVectorField.getValueSource](https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/schema/DenseVectorField.java#L487)
 returns `new FloatKnnVectorFieldSource(field.getName())`
   3. [(Lucene) 
FloatKnnVectorFieldSource](https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/FloatKnnVectorFieldSource.java#L46)
 - reads `getFloatVectorValues(fieldName);`
   
   I was able to prove this locally via unit tests that do the following:
   - Test 1: KNN vs KNN + rerank on DenseVectorField. Scores are identical as 
both paths use exact floats.                                                    
                                  
   - Test 2: KNN vs  KNN + rerank on ScalarQuantizedDenseVectorField. Scores 
differ as KNN uses quantized values and the rerank uses exact floats.
   
    I can add the unit tests to the PR if that would be useful.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to