alessandrobenedetti commented on code in PR #2157:
URL: https://github.com/apache/solr/pull/2157#discussion_r1478016732
##########
solr/core/src/test/org/apache/solr/search/QueryEqualityTest.java:
##########
@@ -1364,12 +1420,12 @@ public void testQueryKNN() throws Exception {
* for coverage sanity checking
*
* @see #testParserCoverage
- * @see #assertQueryEquals
+ * @see #assertQueryEqualsAndReturn
Review Comment:
If I am reading it right, these are 4 different test cases but currently run
as one?
I would prefer to see them separate, because right now if I want to run and
debug the last one I'm forced to run all of them?
I may be wrong, so feel free to elaborate!
##########
solr/core/src/test/org/apache/solr/search/neural/KnnQParserTest.java:
##########
@@ -453,52 +455,397 @@ public void
knnQueryUsedInFilter_shouldFilterResultsBeforeTheQueryExecution() {
@Test
public void
knnQueryUsedInFilters_shouldFilterResultsBeforeTheQueryExecution() {
String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+
+ // topK=4 -> 1,4,2,10
assertQ(
req(
CommonParams.Q,
"id:(3 4 9 2)",
"fq",
"{!knn f=vector topK=4}" + vectorToSearch,
"fq",
- "id:(4 20)",
+ "id:(4 20 9)",
Review Comment:
I am trying to understand why this test needs to be changed now?
##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -264,45 +264,113 @@ The `DenseVectorField` to search in.
+
How many k-nearest results to return.
-Here's how to run a KNN search:
+`preFilter`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: Depends on usage, see below.
+|===
++
+Specifies an explicit list of Pre-Filter query strings to use.
-[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+`includeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that only `fq` filters with the specified `tag` should be considered
for implicit Pre-Filtering. May not be combind with `preFilter`.
-The search results retrieved are the k-nearest to the vector in input `[1.0,
2.0, 3.0, 4.0]`, ranked by the similarityFunction configured at indexing time.
-==== Usage with Filter Queries
-The `knn` query parser can be used in filter queries:
+`excludeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that `fq` filters with the specified `tag` should be excluded from
consideration for implicit Pre-Filtering. May not be combind with `preFilter`.
+
+
+Here's how to run a simple KNN search:
+
[source,text]
-&q=id:(1 2 3)&fq={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+?q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+
+The search results retrieved are the k=10 nearest documents to the vector in
input `[1.0, 2.0, 3.0, 4.0]`, ranked by the `similarityFunction` configured at
indexing time.
+
+
+==== Explicit KNN Pre-Filtering
+
+The `knn` query parser's `preFilter` parameter can be specified to reduce the
number of candidate documents evaluated for the k-nearest distance calculation:
-The `knn` query parser can be used with filter queries:
[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]&fq=id:(1 2 3)
+?q={!knn f=vector topK=10 preFilter=inStock:true}[1.0, 2.0, 3.0, 4.0]
-[IMPORTANT]
-====
-Filter queries are executed as pre-filters: the main query refines the sub-set
of search results derived from the application of all the filter queries
combined as 'MUST' clauses(boolean AND).
+In the above example, only documents matching the Pre-Filter `inStock:true`
will be candidates for consideration when evaluating the k-nearest search
against the specified vector.
+
+The `preFilter` parameter may be blank (ex: `preFilter=""`) to indicate that
no Pre-Filtering should be performed; or it may be multi-valued -- either
through repetition, or via duplicated
xref:local-params.adoc#parameter-dereferencing[Parameter References].
+
+These two examples are equivilent:
+
+[source,text]
+?q={!knn f=vector topK=10 preFilter=category:AAA preFilter=inStock:true}[1.0,
2.0, 3.0, 4.0]
-This means that in
[source,text]
-&q=id:(1 2 3)&fq={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+----
+?q={!knn f=vector topK=10 preFilter=$knnPreFilter}[1.0, 2.0, 3.0, 4.0]
+&knnPreFilter=category:AAA
+&knnPreFilter=inStock:true
+----
-The results are prefiltered by the topK knn retrieval and then only the
documents from this subset, matching the query 'q=id:(1 2 3)' are returned.
+==== Implicit KNN Pre-Filtering
+
+While the `preFilter` parameter may be explicitly specified on *_any_* usage
of the `knn` query parser, the default Pre-Filtering behavior (when no
`preFilter` parameter is specified) will vary based on how the `knn` query
parser is used:
+
+* When used as the main `q` param: `fq` filters in the request (that are not
xref:common-query-parameters.adoc#cache-local-parameter[Solr Post Filters])
will be combined to form an implicit KNN Pre-Filter.
+** This default behavior optimizes the number of vector distance calculations
considered, eliminating documents that would eventually be excluded by an `fq`
filter anyway.
+** `includeTags` and `excludeTags` may be used to limit the set of `fq`
filters used in the Pre-Filter.
+* When used as an `fq` param, or as a subquery clause in a larger query: No
implicit Pre-Filter is used.
+** `includeTags` and `excludeTags` may not be used in these situations.
Review Comment:
same consideration for 'may
##########
solr/core/src/test/org/apache/solr/search/neural/KnnQParserTest.java:
##########
@@ -453,52 +455,397 @@ public void
knnQueryUsedInFilter_shouldFilterResultsBeforeTheQueryExecution() {
@Test
public void
knnQueryUsedInFilters_shouldFilterResultsBeforeTheQueryExecution() {
String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+
+ // topK=4 -> 1,4,2,10
assertQ(
req(
CommonParams.Q,
"id:(3 4 9 2)",
"fq",
"{!knn f=vector topK=4}" + vectorToSearch,
"fq",
- "id:(4 20)",
+ "id:(4 20 9)",
"fl",
"id"),
"//result[@numFound='1']",
"//result/doc[1]/str[@name='id'][.='4']");
- }
-
- @Test
- public void
knnQueryWithFilterQuery_shouldPerformKnnSearchInPreFilteredResults() {
- String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+ // topK=4 w/localparam preFilter -> 1,4,7,9
assertQ(
req(
CommonParams.Q,
- "{!knn f=vector topK=10}" + vectorToSearch,
+ "id:(3 4 9 2)",
+ "fq",
+ "{!knn f=vector topK=4 preFilter='id:(1 4 7 8 9)'}" +
vectorToSearch,
"fq",
- "id:(1 2 7 20)",
+ "id:(4 20 9)",
"fl",
"id"),
+ "//result[@numFound='2']",
+ "//result/doc[1]/str[@name='id'][.='4']",
+ "//result/doc[2]/str[@name='id'][.='9']");
+
+ for (String fq :
+ Arrays.asList(
+ "{!knn f=vector topK=5 includeTags=xxx}" + vectorToSearch,
+ "{!knn f=vector topK=5 excludeTags=xxx}" + vectorToSearch)) {
+ assertQEx(
+ "fq={!knn...} incompatible with include/exclude localparams",
+ "used as a filter does not support",
+ req("q", "*:*", "fq", fq),
+ SolrException.ErrorCode.BAD_REQUEST);
+ }
+ }
+
+ @Test
+ public void knnQueryAsSubQuery() {
+ final SolrParams common = params("fl", "id", "vec", "[1.0, 2.0, 3.0,
4.0]");
+ final String filt = "id:(2 4 7 9 8 20 3)";
+
+ // When knn parser is a subquery, it should not pre-filter on any global
fq params
+ // topK -> 1,4,2,10,3 -> fq -> 4,2,3
+ assertQ(
+ req(common, "fq", filt, "q", "*:* AND {!knn f=vector topK=5 v=$vec}"),
"//result[@numFound='3']",
- "//result/doc[1]/str[@name='id'][.='1']",
+ "//result/doc[1]/str[@name='id'][.='4']",
+ "//result/doc[2]/str[@name='id'][.='2']",
+ "//result/doc[3]/str[@name='id'][.='3']");
+ // topK -> 1,4,2,10,3 + '8' -> fq -> 4,2,3,8
+ assertQ(
+ req(common, "fq", filt, "q", "id:8^=0.01 OR {!knn f=vector topK=5
v=$vec}"),
+ "//result[@numFound='4']",
+ "//result/doc[1]/str[@name='id'][.='4']",
"//result/doc[2]/str[@name='id'][.='2']",
- "//result/doc[3]/str[@name='id'][.='7']");
+ "//result/doc[3]/str[@name='id'][.='3']",
+ "//result/doc[4]/str[@name='id'][.='8']");
+ // knn subquery should still accept `preFilter` local param
+ // filt -> topK -> 4,2,3,7,9
+ assertQ(
+ req(common, "q", "*:* AND {!knn f=vector topK=5 preFilter='" + filt +
"' v=$vec}"),
+ "//result[@numFound='5']",
+ "//result/doc[1]/str[@name='id'][.='4']",
+ "//result/doc[2]/str[@name='id'][.='2']",
+ "//result/doc[3]/str[@name='id'][.='3']",
+ "//result/doc[4]/str[@name='id'][.='7']",
+ "//result/doc[5]/str[@name='id'][.='9']");
+
+ // knn subquery should still accept `preFilter` local param, and not
pre-filter on any global fq
+ // params
+ // filt -> topK -> 4,2,3,7,9 -> fq -> 3,9
assertQ(
req(
- CommonParams.Q,
- "{!knn f=vector topK=4}" + vectorToSearch,
+ common,
"fq",
- "id:(3 4 9 2)",
- "fl",
- "id"),
+ "id:(1 9 20 3 5 6 8)",
+ "q",
+ "*:* AND {!knn f=vector topK=5 preFilter='" + filt + "' v=$vec}"),
+ "//result[@numFound='2']",
+ "//result/doc[1]/str[@name='id'][.='3']",
+ "//result/doc[2]/str[@name='id'][.='9']");
+ // filt -> topK -> 4,2,3,7,9 + '8' -> fq -> 8,3,9
+ assertQ(
+ req(
+ common,
+ "fq",
+ "id:(1 9 20 3 5 6 8)",
+ "q",
+ "id:8^=100 OR {!knn f=vector topK=5 preFilter='" + filt + "'
v=$vec}"),
+ "//result[@numFound='3']",
+ "//result/doc[1]/str[@name='id'][.='8']",
+ "//result/doc[2]/str[@name='id'][.='3']",
+ "//result/doc[3]/str[@name='id'][.='9']");
+
+ for (String knn :
+ Arrays.asList(
+ "{!knn f=vector topK=5 includeTags=xxx v=$vec}",
+ "{!knn f=vector topK=5 excludeTags=xxx v=$vec}")) {
+ assertQEx(
+ "knn as subquery incompatible with include/exclude localparams",
+ "used as a sub-query does not support",
+ req(common, "q", "*:* OR " + knn),
+ SolrException.ErrorCode.BAD_REQUEST);
+ }
+ }
+
+ @Test
+ public void
knnQueryWithFilterQuery_shouldPerformKnnSearchInPreFilteredResults() {
Review Comment:
also here, ideally having split tests would help
##########
solr/core/src/test/org/apache/solr/search/neural/KnnQParserTest.java:
##########
@@ -453,52 +455,397 @@ public void
knnQueryUsedInFilter_shouldFilterResultsBeforeTheQueryExecution() {
@Test
public void
knnQueryUsedInFilters_shouldFilterResultsBeforeTheQueryExecution() {
String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+
+ // topK=4 -> 1,4,2,10
assertQ(
req(
CommonParams.Q,
"id:(3 4 9 2)",
"fq",
"{!knn f=vector topK=4}" + vectorToSearch,
"fq",
- "id:(4 20)",
+ "id:(4 20 9)",
"fl",
"id"),
"//result[@numFound='1']",
"//result/doc[1]/str[@name='id'][.='4']");
- }
-
- @Test
- public void
knnQueryWithFilterQuery_shouldPerformKnnSearchInPreFilteredResults() {
- String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+ // topK=4 w/localparam preFilter -> 1,4,7,9
assertQ(
req(
CommonParams.Q,
- "{!knn f=vector topK=10}" + vectorToSearch,
+ "id:(3 4 9 2)",
+ "fq",
+ "{!knn f=vector topK=4 preFilter='id:(1 4 7 8 9)'}" +
vectorToSearch,
"fq",
- "id:(1 2 7 20)",
+ "id:(4 20 9)",
"fl",
"id"),
+ "//result[@numFound='2']",
+ "//result/doc[1]/str[@name='id'][.='4']",
+ "//result/doc[2]/str[@name='id'][.='9']");
+
+ for (String fq :
+ Arrays.asList(
+ "{!knn f=vector topK=5 includeTags=xxx}" + vectorToSearch,
+ "{!knn f=vector topK=5 excludeTags=xxx}" + vectorToSearch)) {
+ assertQEx(
Review Comment:
Also in this case I would prefer to see these as separate test cases, that
can run independently
##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -240,7 +240,7 @@ client.add(Arrays.asList(d1, d2));
This is the Apache Solr query approach designed to support dense vector search:
=== knn Query Parser
-The `knn` k-nearest neighbors query parser allows to find the k-nearest
documents to the target vector according to indexed dense vectors in the given
field.
+The `knn` k-nearest neighbors query parser allows to find the k-nearest
documents to the target vector according to indexed dense vectors in the given
field. The set of documents can be Pre-Riltered to reduce the number of vector
distance calculations that must be computed, and ensure the best `topK` are
returned.
Review Comment:
The set of documents can be Pre-Filtered to reduce the number of vector
distance calculations that must be computed, and ensure the best `topK` are
returned.
##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -264,45 +264,113 @@ The `DenseVectorField` to search in.
+
How many k-nearest results to return.
-Here's how to run a KNN search:
+`preFilter`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: Depends on usage, see below.
+|===
++
+Specifies an explicit list of Pre-Filter query strings to use.
-[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+`includeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that only `fq` filters with the specified `tag` should be considered
for implicit Pre-Filtering. May not be combind with `preFilter`.
Review Comment:
Please ignore this suggestion in case it's wrong, I'm not a native speaker:
May not be combined -> Must not be combined?
May suggests 'optional'
Must implies a 'mandatory' condition? (because here it's not optional but
mandatory I think)
##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -264,45 +264,113 @@ The `DenseVectorField` to search in.
+
How many k-nearest results to return.
-Here's how to run a KNN search:
+`preFilter`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: Depends on usage, see below.
+|===
++
+Specifies an explicit list of Pre-Filter query strings to use.
-[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+`includeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that only `fq` filters with the specified `tag` should be considered
for implicit Pre-Filtering. May not be combind with `preFilter`.
-The search results retrieved are the k-nearest to the vector in input `[1.0,
2.0, 3.0, 4.0]`, ranked by the similarityFunction configured at indexing time.
-==== Usage with Filter Queries
-The `knn` query parser can be used in filter queries:
+`excludeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that `fq` filters with the specified `tag` should be excluded from
consideration for implicit Pre-Filtering. May not be combind with `preFilter`.
+
+
+Here's how to run a simple KNN search:
+
[source,text]
-&q=id:(1 2 3)&fq={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+?q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+
+The search results retrieved are the k=10 nearest documents to the vector in
input `[1.0, 2.0, 3.0, 4.0]`, ranked by the `similarityFunction` configured at
indexing time.
+
+
+==== Explicit KNN Pre-Filtering
+
+The `knn` query parser's `preFilter` parameter can be specified to reduce the
number of candidate documents evaluated for the k-nearest distance calculation:
-The `knn` query parser can be used with filter queries:
[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]&fq=id:(1 2 3)
+?q={!knn f=vector topK=10 preFilter=inStock:true}[1.0, 2.0, 3.0, 4.0]
-[IMPORTANT]
-====
-Filter queries are executed as pre-filters: the main query refines the sub-set
of search results derived from the application of all the filter queries
combined as 'MUST' clauses(boolean AND).
+In the above example, only documents matching the Pre-Filter `inStock:true`
will be candidates for consideration when evaluating the k-nearest search
against the specified vector.
+
+The `preFilter` parameter may be blank (ex: `preFilter=""`) to indicate that
no Pre-Filtering should be performed; or it may be multi-valued -- either
through repetition, or via duplicated
xref:local-params.adoc#parameter-dereferencing[Parameter References].
+
+These two examples are equivilent:
Review Comment:
equivalent?
##########
solr/core/src/test/org/apache/solr/search/neural/KnnQParserTest.java:
##########
@@ -453,52 +455,397 @@ public void
knnQueryUsedInFilter_shouldFilterResultsBeforeTheQueryExecution() {
@Test
public void
knnQueryUsedInFilters_shouldFilterResultsBeforeTheQueryExecution() {
String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+
+ // topK=4 -> 1,4,2,10
assertQ(
req(
CommonParams.Q,
"id:(3 4 9 2)",
"fq",
"{!knn f=vector topK=4}" + vectorToSearch,
"fq",
- "id:(4 20)",
+ "id:(4 20 9)",
"fl",
"id"),
"//result[@numFound='1']",
"//result/doc[1]/str[@name='id'][.='4']");
- }
-
- @Test
- public void
knnQueryWithFilterQuery_shouldPerformKnnSearchInPreFilteredResults() {
- String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+ // topK=4 w/localparam preFilter -> 1,4,7,9
assertQ(
req(
CommonParams.Q,
- "{!knn f=vector topK=10}" + vectorToSearch,
+ "id:(3 4 9 2)",
+ "fq",
+ "{!knn f=vector topK=4 preFilter='id:(1 4 7 8 9)'}" +
vectorToSearch,
"fq",
- "id:(1 2 7 20)",
+ "id:(4 20 9)",
"fl",
"id"),
+ "//result[@numFound='2']",
+ "//result/doc[1]/str[@name='id'][.='4']",
+ "//result/doc[2]/str[@name='id'][.='9']");
+
+ for (String fq :
+ Arrays.asList(
+ "{!knn f=vector topK=5 includeTags=xxx}" + vectorToSearch,
+ "{!knn f=vector topK=5 excludeTags=xxx}" + vectorToSearch)) {
+ assertQEx(
+ "fq={!knn...} incompatible with include/exclude localparams",
+ "used as a filter does not support",
+ req("q", "*:*", "fq", fq),
+ SolrException.ErrorCode.BAD_REQUEST);
+ }
+ }
+
+ @Test
+ public void knnQueryAsSubQuery() {
Review Comment:
also here, possibly split the tests?
##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -264,45 +264,113 @@ The `DenseVectorField` to search in.
+
How many k-nearest results to return.
-Here's how to run a KNN search:
+`preFilter`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: Depends on usage, see below.
+|===
++
+Specifies an explicit list of Pre-Filter query strings to use.
-[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+`includeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that only `fq` filters with the specified `tag` should be considered
for implicit Pre-Filtering. May not be combind with `preFilter`.
-The search results retrieved are the k-nearest to the vector in input `[1.0,
2.0, 3.0, 4.0]`, ranked by the similarityFunction configured at indexing time.
-==== Usage with Filter Queries
-The `knn` query parser can be used in filter queries:
+`excludeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that `fq` filters with the specified `tag` should be excluded from
consideration for implicit Pre-Filtering. May not be combind with `preFilter`.
Review Comment:
Same consideration for 'May'
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]