This is an automated email from the ASF dual-hosted git repository.
abenedetti pushed a commit to branch branch_10_0
in repository https://gitbox.apache.org/repos/asf/solr.git
The following commit(s) were added to refs/heads/branch_10_0 by this push:
new bbe788db6dd Add seeded and earlyTermination examples in
tutorial-vectors doc (#3797)
bbe788db6dd is described below
commit bbe788db6dd1e56361dc17072aab110ddb22c101
Author: Ilaria Petreti <[email protected]>
AuthorDate: Wed Oct 22 17:25:36 2025 +0200
Add seeded and earlyTermination examples in tutorial-vectors doc (#3797)
(cherry picked from commit c163923212229fa322de54dae9909e19659628fb)
---
.../getting-started/pages/tutorial-vectors.adoc | 59 +++++++++++++---------
1 file changed, 35 insertions(+), 24 deletions(-)
diff --git
a/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc
b/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc
index 133927b74c1..97d06a91a4b 100644
--- a/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc
+++ b/solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc
@@ -67,7 +67,7 @@ $ curl http://localhost:8983/solr/films/schema -X POST -H
'Content-type:applicat
"type":"pdate",
"stored":true
}
- ]
+ ]
}'
----
@@ -81,22 +81,22 @@ $ bin/solr post -c films example/films/films.json
----
=== Let's do some Vector searches
-Before making the queries, we define an example target vector, simulating a
person that
-watched 3 movies: _Finding Nemo_, _Bee Movie_, and _Harry Potter and the
Chamber of Secrets_.
-We get the vector of each movie, then calculate the resulting average vector,
which will
+Before making the queries, we define an example target vector, simulating a
person that
+watched 3 movies: _Finding Nemo_, _Bee Movie_, and _Harry Potter and the
Chamber of Secrets_.
+We get the vector of each movie, then calculate the resulting average vector,
which will
be used as the input vector for all the following example queries.
-
+
```
[-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415, 0.0859,
-0.1789]
```
[NOTE]
====
-Interested in calculating the vector using Solr's
xref:query-guide:streaming-expressions.adoc[streaming capability]?
+Interested in calculating the vector using Solr's
xref:query-guide:streaming-expressions.adoc[streaming capability]?
Here is an example of a streaming expression that you can run via the
xref:query-guide:stream-screen.adoc[Solr Admin Stream UI]:
```
let(
- a=select(
+ a=select(
search(films,
qt="/select",
q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter and
the Chamber of Secrets"",
@@ -141,43 +141,54 @@ The output is:
// Solr URL examples below all have [ and ] characters which, when used with
Curl, causes encoding issues so just putting plain http links
-Search for the top 10 movies most similar to the target vector that we
previously calculated (KNN Query for recommendation):
+**KNN Query for recommendation** - Search for the top 10 movies most similar
to the target vector that we previously calculated:
+
+ http://localhost:8983/solr/films/query?q={!knn f=film_vector
topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
+
+- Notice that among the results, there are some animation family movies, such
as _Curious George_ and _Bambi_, which makes sense, since the target vector was
created with two other animation family movies (_Finding Nemo_ and _Bee Movie_).
+- We also notice that among the results there are two movies that the person
already watched. In the next example we will filter them out.
+
+**KNN query with Filter Query** - Search for the top 10 movies most similar to
the resulting vector, excluding the movies already watched:
+
+ http://localhost:8983/solr/films/query?q={!knn f=film_vector
topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq=-id:("/en/finding_nemo"
"/en/bee_movie" "/en/harry_potter_and_the_chamber_of_secrets_2002")
+
+**KNN as Filter Query** - Search for movies with "cinderella" in the name
among the top 50 movies most similar to the target vector:
+
+ http://localhost:8983/solr/films/query?q=name:cinderella&fq={!knn
f=film_vector
topK=50}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
-
'http://localhost:8983/solr/films/query?q={%21knn%20f=film_vector%20topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]'
+- There are 3 "cinderella" movies in the index, but only 1 is among the top 50
most similar to the target vector (_Cinderella III: A Twist in Time_).
-* Notice that among the results, there are some animation family movies, such
as _Curious George_ and _Bambi_, which makes sense, since the target vector was
created with two other animation family movies (_Finding Nemo_ and _Bee Movie_).
-* We also notice that among the results there are two movies that the person
already watched. In the next example we will filter them out.
+*KNN with SeededQuery* - Search for the top 10 movies most similar to the
target vector, guided by a seed lexical query on the `genre` field, which
provides the initial entry points in the vector graph search:
-Search for the top 10 movies most similar to the resulting vector, excluding
the movies already watched (KNN query with Filter Query):
+ http://localhost:8983/solr/films/query?seedQuery=genre:Family&q={!knn
f=film_vector topK=10
seedQuery=$seedQuery}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
-
http://localhost:8983/solr/films/query?q={!knn%20f=film_vector%20topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq=-id:("%2Fen%2Ffinding_nemo"%20"%2Fen%2Fbee_movie"%20"%2Fen%2Fharry_potter_and_the_chamber_of_secrets_2002")
+- This allows the KNN algorithm to start the similarity exploration from
documents that already match the lexical criteria, potentially improving
relevance and reducing search time.
- - Search for movies with "cinderella" in the name among the top 50 movies
most similar to the target vector (KNN as Filter Query):
+*KNN with EarlyTermination* - Search for the top 10 movies most similar to the
target vector, allowing the KNN search to stop early for lower latency:
-
http://localhost:8983/solr/films/query?q=name:cinderella&fq={!knn%20f=film_vector%20topK=50}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
+ http://localhost:8983/solr/films/query?q={!knn f=film_vector topK=10
earlyTermination=true}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
- * There are 3 "cinderella" movies in the index, but only 1 is among the
top 50 most similar to the target vector (_Cinderella III: A Twist in Time_).
+- This allows Solr to return results faster by stopping the graph search once
a good enough set of neighbors is found, instead of exploring all nodes in the
vector index.
- - Search for movies with "animation" in the genre, and rerank the top 5
documents by combining (sum) the original query score with twice (2x) the
similarity to the target vector (KNN with ReRanking):
+**KNN with ReRanking** - Search for movies with "animation" in the genre, and
rerank the top 5 documents by combining (sum) the original query score with
twice (2x) the similarity to the target vector:
-
http://localhost:8983/solr/films/query?q=genre:animation&rqq={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=5%20reRankWeight=2}
+ http://localhost:8983/solr/films/query?q=genre:animation&rqq={!knn
f=film_vector
topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&rq={!rerank
reRankQuery=$rqq reRankDocs=5 reRankWeight=2}
- * To guarantee we calculate the vector similarity score for all the
movies, we set `topK=10000`, a number higher than the total number of documents
(`1100`).
+- To guarantee we calculate the vector similarity score for all the movies, we
set `topK=10000`, a number higher than the total number of documents (`1100`).
- * It's possible to combine the vector similarity scores with other scores,
by using Sub-query,
- xref:query-guide:function-queries.adoc[Function Queries] and
xref:query-guide:local-params.adoc#parameter-dereferencing[Parameter
Dereferencing] Solr features:
+It's possible to combine the vector similarity scores with other scores, by
using Sub-query, xref:query-guide:function-queries.adoc[Function Queries] and
xref:query-guide:local-params.adoc#parameter-dereferencing[Parameter
Dereferencing] Solr features:
- Search for "harry potter" movies, ranking the results by the similarity
to the target vector instead of the lexical query score. Beside the `q`
parameter, we define a "sub-query" named `q_vector`, that will calculate the
similarity score between all the movies (since we set `topK=10000`). Then we
use the sub-query parameter name as input for the `sort`, specifying that we
want to rank descending according to the vector similarity score
(`sort=$q_vector desc`):
-
http://localhost:8983/solr/films/query?q=name:"harry%20potter"&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&sort=$q_vector%20desc
+ http://localhost:8983/solr/films/query?q=name:"harry
potter"&q_vector={!knn f=film_vector
topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&sort=$q_vector
desc
- Search for movies with "the" in the name, keeping the original lexical
query ranking, but returning only movies with similarity to the target vector
of 0.8 or higher. Like previously, we define the sub-query `q_vector`, but this
time we use it as input for the `frange` filter, specifying that we want
documents with at least 0.8 of vector similarity score:
-
http://localhost:8983/solr/films/query?q=name:the&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq={!frange%20l=0.8}$q_vector
+ http://localhost:8983/solr/films/query?q=name:the&q_vector={!knn
f=film_vector
topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq={!frange
l=0.8}$q_vector
- Search for "batman" movies, ranking the results by combining 70% of the
original lexical query score and 30% of the similarity to the target vector.
Besides the `q` main query and the `q_vector` sub-query, we also specify the
`q_lexical` query, which will hold the lexical score of the main `q` query.
Then we specify a parameter variable called `score_combined`, which scales the
lexical and similarity scores, applies the 0.7 and 0.3 weights, then sum the
result. We set the `sort` p [...]
-
http://localhost:8983/solr/films/query?q=name:batman&q_lexical={!edismax%20v=$q}&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&score_combined=sum(mul(scale($q_lexical,0,1),0.7),mul(scale($q_vector,0,1),0.3))&sort=$score_combined%20desc&fl=name,score,$q_lexical,$q_vector,$score_combined
+
http://localhost:8983/solr/films/query?q=name:batman&q_lexical={!edismax
v=$q}&q_vector={!knn f=film_vector
topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&score_combined=sum(mul(scale($q_lexical,0,1),0.7),mul(scale($q_vector,0,1),0.3))&sort=$score_combined
desc&fl=name,score,$q_lexical,$q_vector,$score_combined
=== Exercise 5 Wrap Up