(solr) branch branch_10_0 updated: Add seeded and earlyTermination examples in tutorial-vectors doc (#3797)

abenedetti Wed, 22 Oct 2025 08:30:14 -0700

This is an automated email from the ASF dual-hosted git repository.

abenedetti pushed a commit to branch branch_10_0
in repository https://gitbox.apache.org/repos/asf/solr.git



The following commit(s) were added to refs/heads/branch_10_0 by this push:
     new bbe788db6dd Add seeded and earlyTermination examples in 
tutorial-vectors doc (#3797)
bbe788db6dd is described below

commit bbe788db6dd1e56361dc17072aab110ddb22c101
Author: Ilaria Petreti <[email protected]>
AuthorDate: Wed Oct 22 17:25:36 2025 +0200

    Add seeded and earlyTermination examples in tutorial-vectors doc (#3797)
    
    (cherry picked from commit c163923212229fa322de54dae9909e19659628fb)
---
 .../getting-started/pages/tutorial-vectors.adoc    | 59 +++++++++++++---------
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git 
a/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc 
b/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc
index 133927b74c1..97d06a91a4b 100644
--- a/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc
+++ b/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc
@@ -67,7 +67,7 @@ $ curl http://localhost:8983/solr/films/schema -X POST -H 
'Content-type:applicat
         "type":"pdate",
         "stored":true
       }
-    ]  
+    ]
 }'
 ----
 
@@ -81,22 +81,22 @@ $ bin/solr post -c films example/films/films.json
 ----
 
 === Let's do some Vector searches
-Before making the queries, we define an example target vector, simulating a 
person that 
-watched 3 movies: _Finding Nemo_, _Bee Movie_, and _Harry Potter and the 
Chamber of Secrets_. 
-We get the vector of each movie, then calculate the resulting average vector, 
which will 
+Before making the queries, we define an example target vector, simulating a 
person that
+watched 3 movies: _Finding Nemo_, _Bee Movie_, and _Harry Potter and the 
Chamber of Secrets_.
+We get the vector of each movie, then calculate the resulting average vector, 
which will
 be used as the input vector for all the following example queries.
-        
+
 ```
 [-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415, 0.0859, 
-0.1789]
 ```
 
 [NOTE]
 ====
-Interested in calculating the vector using Solr's 
xref:query-guide:streaming-expressions.adoc[streaming capability]?   
+Interested in calculating the vector using Solr's 
xref:query-guide:streaming-expressions.adoc[streaming capability]?
 Here is an example of a streaming expression that you can run via the 
xref:query-guide:stream-screen.adoc[Solr Admin Stream UI]:
 ```
 let(
-  a=select(      
+  a=select(
         search(films,
           qt="/select",
           q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter and 
the Chamber of Secrets"",
@@ -141,43 +141,54 @@ The output is:
 
 // Solr URL examples below all have [ and ] characters which, when used with 
Curl, causes encoding issues so just putting plain http links
 
-Search for the top 10 movies most similar to the target vector that we 
previously calculated (KNN Query for recommendation):
+**KNN Query for recommendation** - Search for the top 10 movies most similar 
to the target vector that we previously calculated:
+
+       http://localhost:8983/solr/films/query?q={!knn f=film_vector 
topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
+
+- Notice that among the results, there are some animation family movies, such 
as _Curious George_ and _Bambi_, which makes sense, since the target vector was 
created with two other animation family movies (_Finding Nemo_ and _Bee Movie_).
+- We also notice that among the results there are two movies that the person 
already watched. In the next example we will filter them out.
+
+**KNN query with Filter Query** - Search for the top 10 movies most similar to 
the resulting vector, excluding the movies already watched:
+
+       http://localhost:8983/solr/films/query?q={!knn f=film_vector 
topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq=-id:("/en/finding_nemo"
 "/en/bee_movie" "/en/harry_potter_and_the_chamber_of_secrets_2002")
+
+**KNN as Filter Query** - Search for movies with "cinderella" in the name 
among the top 50 movies most similar to the target vector:
+
+       http://localhost:8983/solr/films/query?q=name:cinderella&fq={!knn 
f=film_vector 
topK=50}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
 
-       
'http://localhost:8983/solr/films/query?q={%21knn%20f=film_vector%20topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]'
+- There are 3 "cinderella" movies in the index, but only 1 is among the top 50 
most similar to the target vector (_Cinderella III: A Twist in Time_).
 
-* Notice that among the results, there are some animation family movies, such 
as _Curious George_ and _Bambi_, which makes sense, since the target vector was 
created with two other animation family movies (_Finding Nemo_ and _Bee Movie_).
-* We also notice that among the results there are two movies that the person 
already watched. In the next example we will filter them out.
+*KNN with SeededQuery* - Search for the top 10 movies most similar to the 
target vector, guided by a seed lexical query on the `genre` field, which 
provides the initial entry points in the vector graph search:
 
-Search for the top 10 movies most similar to the resulting vector, excluding 
the movies already watched (KNN query with Filter Query):
+         http://localhost:8983/solr/films/query?seedQuery=genre:Family&q={!knn 
f=film_vector topK=10 
seedQuery=$seedQuery}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
 
-       
http://localhost:8983/solr/films/query?q={!knn%20f=film_vector%20topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq=-id:("%2Fen%2Ffinding_nemo"%20"%2Fen%2Fbee_movie"%20"%2Fen%2Fharry_potter_and_the_chamber_of_secrets_2002")
+- This allows the KNN algorithm to start the similarity exploration from 
documents that already match the lexical criteria, potentially improving 
relevance and reducing search time.
 
-  - Search for movies with "cinderella" in the name among the top 50 movies 
most similar to the target vector (KNN as Filter Query):
+*KNN with EarlyTermination* - Search for the top 10 movies most similar to the 
target vector, allowing the KNN search to stop early for lower latency:
 
-       
http://localhost:8983/solr/films/query?q=name:cinderella&fq={!knn%20f=film_vector%20topK=50}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
+         http://localhost:8983/solr/films/query?q={!knn f=film_vector topK=10 
earlyTermination=true}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
 
-       * There are 3 "cinderella" movies in the index, but only 1 is among the 
top 50 most similar to the target vector (_Cinderella III: A Twist in Time_).
+- This allows Solr to return results faster by stopping the graph search once 
a good enough set of neighbors is found, instead of exploring all nodes in the 
vector index.
 
-     - Search for movies with "animation" in the genre, and rerank the top 5 
documents by combining (sum) the original query score with twice (2x) the 
similarity to the target vector (KNN with ReRanking):
+**KNN with ReRanking** - Search for movies with "animation" in the genre, and 
rerank the top 5 documents by combining (sum) the original query score with 
twice (2x) the similarity to the target vector:
 
-       
http://localhost:8983/solr/films/query?q=genre:animation&rqq={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=5%20reRankWeight=2}
+       http://localhost:8983/solr/films/query?q=genre:animation&rqq={!knn 
f=film_vector 
topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&rq={!rerank
 reRankQuery=$rqq reRankDocs=5 reRankWeight=2}
 
-       * To guarantee we calculate the vector similarity score for all the 
movies, we set `topK=10000`, a number higher than the total number of documents 
(`1100`).
+- To guarantee we calculate the vector similarity score for all the movies, we 
set `topK=10000`, a number higher than the total number of documents (`1100`).
 
-   * It's possible to combine the vector similarity scores with other scores, 
by using Sub-query, 
-     xref:query-guide:function-queries.adoc[Function Queries] and 
xref:query-guide:local-params.adoc#parameter-dereferencing[Parameter 
Dereferencing] Solr features:
+It's possible to combine the vector similarity scores with other scores, by 
using Sub-query, xref:query-guide:function-queries.adoc[Function Queries] and 
xref:query-guide:local-params.adoc#parameter-dereferencing[Parameter 
Dereferencing] Solr features:
 
      - Search for "harry potter" movies, ranking the results by the similarity 
to the target vector instead of the lexical query score. Beside the `q` 
parameter, we define a "sub-query" named `q_vector`, that will calculate the 
similarity score between all the movies (since we set `topK=10000`). Then we 
use the sub-query parameter name as input for the `sort`, specifying that we 
want to rank descending according to the vector similarity score 
(`sort=$q_vector desc`):
 
-       
http://localhost:8983/solr/films/query?q=name:"harry%20potter"&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&sort=$q_vector%20desc
+       http://localhost:8983/solr/films/query?q=name:"harry 
potter"&q_vector={!knn f=film_vector 
topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&sort=$q_vector
 desc
 
      - Search for movies with "the" in the name, keeping the original lexical 
query ranking, but returning only movies with similarity to the target vector 
of 0.8 or higher. Like previously, we define the sub-query `q_vector`, but this 
time we use it as input for the `frange` filter, specifying that we want 
documents with at least 0.8 of vector similarity score:
 
-       
http://localhost:8983/solr/films/query?q=name:the&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq={!frange%20l=0.8}$q_vector
+       http://localhost:8983/solr/films/query?q=name:the&q_vector={!knn 
f=film_vector 
topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq={!frange
 l=0.8}$q_vector
 
      - Search for "batman" movies, ranking the results by combining 70% of the 
original lexical query score and 30% of the similarity to the target vector. 
Besides the `q` main query and the `q_vector` sub-query, we also specify the 
`q_lexical` query, which will hold the lexical score of the main `q` query. 
Then we specify a parameter variable called `score_combined`, which scales the 
lexical and similarity scores, applies the 0.7 and 0.3 weights, then sum the 
result. We set the `sort` p [...]
 
-       
http://localhost:8983/solr/films/query?q=name:batman&q_lexical={!edismax%20v=$q}&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&score_combined=sum(mul(scale($q_lexical,0,1),0.7),mul(scale($q_vector,0,1),0.3))&sort=$score_combined%20desc&fl=name,score,$q_lexical,$q_vector,$score_combined
+       
http://localhost:8983/solr/films/query?q=name:batman&q_lexical={!edismax 
v=$q}&q_vector={!knn f=film_vector 
topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&score_combined=sum(mul(scale($q_lexical,0,1),0.7),mul(scale($q_vector,0,1),0.3))&sort=$score_combined
 desc&fl=name,score,$q_lexical,$q_vector,$score_combined
 
 
 === Exercise 5 Wrap Up

(solr) branch branch_10_0 updated: Add seeded and earlyTermination examples in tutorial-vectors doc (#3797)

Reply via email to