alessandrobenedetti commented on code in PR #2157:
URL: https://github.com/apache/solr/pull/2157#discussion_r1478016732


##########
solr/core/src/test/org/apache/solr/search/QueryEqualityTest.java:
##########
@@ -1364,12 +1420,12 @@ public void testQueryKNN() throws Exception {
    * for coverage sanity checking
    *
    * @see #testParserCoverage
-   * @see #assertQueryEquals
+   * @see #assertQueryEqualsAndReturn

Review Comment:
   If I am reading it right, these are 4 different test cases but currently run 
as one?
   I would prefer to see them separate, because right now if I want to run and 
debug the last one I'm forced to run all of them?
   I may be wrong, so feel free to elaborate!



##########
solr/core/src/test/org/apache/solr/search/neural/KnnQParserTest.java:
##########
@@ -453,52 +455,397 @@ public void 
knnQueryUsedInFilter_shouldFilterResultsBeforeTheQueryExecution() {
   @Test
   public void 
knnQueryUsedInFilters_shouldFilterResultsBeforeTheQueryExecution() {
     String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+
+    // topK=4 -> 1,4,2,10
     assertQ(
         req(
             CommonParams.Q,
             "id:(3 4 9 2)",
             "fq",
             "{!knn f=vector topK=4}" + vectorToSearch,
             "fq",
-            "id:(4 20)",
+            "id:(4 20 9)",

Review Comment:
   I am trying to understand why this test needs to be changed now?



##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -264,45 +264,113 @@ The `DenseVectorField` to search in.
 +
 How many k-nearest results to return.
 
-Here's how to run a KNN search:
+`preFilter`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: Depends on usage, see below.
+|===
++
+Specifies an explicit list of Pre-Filter query strings to use.
 
-[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+`includeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that only `fq` filters with the specified `tag` should be considered 
for implicit Pre-Filtering.  May not be combind with `preFilter`.
 
-The search results retrieved are the k-nearest to the vector in input `[1.0, 
2.0, 3.0, 4.0]`, ranked by the similarityFunction configured at indexing time.
 
-==== Usage with Filter Queries
-The `knn` query parser can be used in filter queries:
+`excludeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that `fq` filters with the specified `tag` should be excluded from 
consideration for implicit Pre-Filtering.  May not be combind with `preFilter`.
+
+
+Here's how to run a simple KNN search:
+
 [source,text]
-&q=id:(1 2 3)&fq={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+?q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+
+The search results retrieved are the k=10 nearest documents to the vector in 
input `[1.0, 2.0, 3.0, 4.0]`, ranked by the `similarityFunction` configured at 
indexing time.
+
+
+==== Explicit KNN Pre-Filtering
+
+The `knn` query parser's `preFilter` parameter can be specified to reduce the 
number of candidate documents evaluated for the k-nearest distance calculation:
 
-The `knn` query parser can be used with filter queries:
 [source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]&fq=id:(1 2 3)
+?q={!knn f=vector topK=10 preFilter=inStock:true}[1.0, 2.0, 3.0, 4.0]
 
-[IMPORTANT]
-====
-Filter queries are executed as pre-filters: the main query refines the sub-set 
of search results derived from the application of all the filter queries 
combined as 'MUST' clauses(boolean AND).
+In the above example, only documents matching the Pre-Filter `inStock:true` 
will be candidates for consideration when evaluating the k-nearest search 
against the specified vector.
+
+The `preFilter` parameter may be blank (ex: `preFilter=""`) to indicate that 
no Pre-Filtering should be performed; or it may be multi-valued -- either 
through repetition, or via duplicated 
xref:local-params.adoc#parameter-dereferencing[Parameter References].
+
+These two examples are equivilent:
+
+[source,text]
+?q={!knn f=vector topK=10 preFilter=category:AAA preFilter=inStock:true}[1.0, 
2.0, 3.0, 4.0]
 
-This means that in
 [source,text]
-&q=id:(1 2 3)&fq={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+----
+?q={!knn f=vector topK=10 preFilter=$knnPreFilter}[1.0, 2.0, 3.0, 4.0]
+&knnPreFilter=category:AAA
+&knnPreFilter=inStock:true
+----
 
-The results are prefiltered by the topK knn retrieval and then only the 
documents from this subset, matching the query 'q=id:(1 2 3)' are returned.
+==== Implicit KNN Pre-Filtering
+
+While the `preFilter` parameter may be explicitly specified on *_any_* usage 
of the `knn` query parser, the default Pre-Filtering behavior (when no 
`preFilter` parameter is specified) will vary based on how the `knn` query 
parser is used:
+
+* When used as the main `q` param: `fq` filters in the request (that are not 
xref:common-query-parameters.adoc#cache-local-parameter[Solr Post Filters]) 
will be combined to form an implicit KNN Pre-Filter.
+** This default behavior optimizes the number of vector distance calculations 
considered, eliminating documents that would eventually be excluded by an `fq` 
filter anyway.
+** `includeTags` and `excludeTags` may be used to limit the set of `fq` 
filters used in the Pre-Filter.
+* When used as an `fq` param, or as a subquery clause in a larger query: No 
implicit Pre-Filter is used.
+** `includeTags` and `excludeTags` may not be used in these situations.

Review Comment:
   same consideration for 'may



##########
solr/core/src/test/org/apache/solr/search/neural/KnnQParserTest.java:
##########
@@ -453,52 +455,397 @@ public void 
knnQueryUsedInFilter_shouldFilterResultsBeforeTheQueryExecution() {
   @Test
   public void 
knnQueryUsedInFilters_shouldFilterResultsBeforeTheQueryExecution() {
     String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+
+    // topK=4 -> 1,4,2,10
     assertQ(
         req(
             CommonParams.Q,
             "id:(3 4 9 2)",
             "fq",
             "{!knn f=vector topK=4}" + vectorToSearch,
             "fq",
-            "id:(4 20)",
+            "id:(4 20 9)",
             "fl",
             "id"),
         "//result[@numFound='1']",
         "//result/doc[1]/str[@name='id'][.='4']");
-  }
-
-  @Test
-  public void 
knnQueryWithFilterQuery_shouldPerformKnnSearchInPreFilteredResults() {
-    String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
 
+    // topK=4 w/localparam preFilter -> 1,4,7,9
     assertQ(
         req(
             CommonParams.Q,
-            "{!knn f=vector topK=10}" + vectorToSearch,
+            "id:(3 4 9 2)",
+            "fq",
+            "{!knn f=vector topK=4 preFilter='id:(1 4 7 8 9)'}" + 
vectorToSearch,
             "fq",
-            "id:(1 2 7 20)",
+            "id:(4 20 9)",
             "fl",
             "id"),
+        "//result[@numFound='2']",
+        "//result/doc[1]/str[@name='id'][.='4']",
+        "//result/doc[2]/str[@name='id'][.='9']");
+
+    for (String fq :
+        Arrays.asList(
+            "{!knn f=vector topK=5 includeTags=xxx}" + vectorToSearch,
+            "{!knn f=vector topK=5 excludeTags=xxx}" + vectorToSearch)) {
+      assertQEx(
+          "fq={!knn...} incompatible with include/exclude localparams",
+          "used as a filter does not support",
+          req("q", "*:*", "fq", fq),
+          SolrException.ErrorCode.BAD_REQUEST);
+    }
+  }
+
+  @Test
+  public void knnQueryAsSubQuery() {
+    final SolrParams common = params("fl", "id", "vec", "[1.0, 2.0, 3.0, 
4.0]");
+    final String filt = "id:(2 4 7 9 8 20 3)";
+
+    // When knn parser is a subquery, it should not pre-filter on any global 
fq params
+    // topK -> 1,4,2,10,3 -> fq -> 4,2,3
+    assertQ(
+        req(common, "fq", filt, "q", "*:* AND {!knn f=vector topK=5 v=$vec}"),
         "//result[@numFound='3']",
-        "//result/doc[1]/str[@name='id'][.='1']",
+        "//result/doc[1]/str[@name='id'][.='4']",
+        "//result/doc[2]/str[@name='id'][.='2']",
+        "//result/doc[3]/str[@name='id'][.='3']");
+    // topK -> 1,4,2,10,3 + '8' -> fq -> 4,2,3,8
+    assertQ(
+        req(common, "fq", filt, "q", "id:8^=0.01 OR {!knn f=vector topK=5 
v=$vec}"),
+        "//result[@numFound='4']",
+        "//result/doc[1]/str[@name='id'][.='4']",
         "//result/doc[2]/str[@name='id'][.='2']",
-        "//result/doc[3]/str[@name='id'][.='7']");
+        "//result/doc[3]/str[@name='id'][.='3']",
+        "//result/doc[4]/str[@name='id'][.='8']");
 
+    // knn subquery should still accept `preFilter` local param
+    // filt -> topK -> 4,2,3,7,9
+    assertQ(
+        req(common, "q", "*:* AND {!knn f=vector topK=5 preFilter='" + filt + 
"' v=$vec}"),
+        "//result[@numFound='5']",
+        "//result/doc[1]/str[@name='id'][.='4']",
+        "//result/doc[2]/str[@name='id'][.='2']",
+        "//result/doc[3]/str[@name='id'][.='3']",
+        "//result/doc[4]/str[@name='id'][.='7']",
+        "//result/doc[5]/str[@name='id'][.='9']");
+
+    // knn subquery should still accept `preFilter` local param, and not 
pre-filter on any global fq
+    // params
+    // filt -> topK -> 4,2,3,7,9 -> fq -> 3,9
     assertQ(
         req(
-            CommonParams.Q,
-            "{!knn f=vector topK=4}" + vectorToSearch,
+            common,
             "fq",
-            "id:(3 4 9 2)",
-            "fl",
-            "id"),
+            "id:(1 9 20 3 5 6 8)",
+            "q",
+            "*:* AND {!knn f=vector topK=5 preFilter='" + filt + "' v=$vec}"),
+        "//result[@numFound='2']",
+        "//result/doc[1]/str[@name='id'][.='3']",
+        "//result/doc[2]/str[@name='id'][.='9']");
+    // filt -> topK -> 4,2,3,7,9 + '8' -> fq -> 8,3,9
+    assertQ(
+        req(
+            common,
+            "fq",
+            "id:(1 9 20 3 5 6 8)",
+            "q",
+            "id:8^=100 OR {!knn f=vector topK=5 preFilter='" + filt + "' 
v=$vec}"),
+        "//result[@numFound='3']",
+        "//result/doc[1]/str[@name='id'][.='8']",
+        "//result/doc[2]/str[@name='id'][.='3']",
+        "//result/doc[3]/str[@name='id'][.='9']");
+
+    for (String knn :
+        Arrays.asList(
+            "{!knn f=vector topK=5 includeTags=xxx v=$vec}",
+            "{!knn f=vector topK=5 excludeTags=xxx v=$vec}")) {
+      assertQEx(
+          "knn as subquery incompatible with include/exclude localparams",
+          "used as a sub-query does not support",
+          req(common, "q", "*:* OR " + knn),
+          SolrException.ErrorCode.BAD_REQUEST);
+    }
+  }
+
+  @Test
+  public void 
knnQueryWithFilterQuery_shouldPerformKnnSearchInPreFilteredResults() {

Review Comment:
   also here, ideally having split tests would help



##########
solr/core/src/test/org/apache/solr/search/neural/KnnQParserTest.java:
##########
@@ -453,52 +455,397 @@ public void 
knnQueryUsedInFilter_shouldFilterResultsBeforeTheQueryExecution() {
   @Test
   public void 
knnQueryUsedInFilters_shouldFilterResultsBeforeTheQueryExecution() {
     String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+
+    // topK=4 -> 1,4,2,10
     assertQ(
         req(
             CommonParams.Q,
             "id:(3 4 9 2)",
             "fq",
             "{!knn f=vector topK=4}" + vectorToSearch,
             "fq",
-            "id:(4 20)",
+            "id:(4 20 9)",
             "fl",
             "id"),
         "//result[@numFound='1']",
         "//result/doc[1]/str[@name='id'][.='4']");
-  }
-
-  @Test
-  public void 
knnQueryWithFilterQuery_shouldPerformKnnSearchInPreFilteredResults() {
-    String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
 
+    // topK=4 w/localparam preFilter -> 1,4,7,9
     assertQ(
         req(
             CommonParams.Q,
-            "{!knn f=vector topK=10}" + vectorToSearch,
+            "id:(3 4 9 2)",
+            "fq",
+            "{!knn f=vector topK=4 preFilter='id:(1 4 7 8 9)'}" + 
vectorToSearch,
             "fq",
-            "id:(1 2 7 20)",
+            "id:(4 20 9)",
             "fl",
             "id"),
+        "//result[@numFound='2']",
+        "//result/doc[1]/str[@name='id'][.='4']",
+        "//result/doc[2]/str[@name='id'][.='9']");
+
+    for (String fq :
+        Arrays.asList(
+            "{!knn f=vector topK=5 includeTags=xxx}" + vectorToSearch,
+            "{!knn f=vector topK=5 excludeTags=xxx}" + vectorToSearch)) {
+      assertQEx(

Review Comment:
   Also in this case I would prefer to see these as separate test cases, that 
can run independently



##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -240,7 +240,7 @@ client.add(Arrays.asList(d1, d2));
 This is the Apache Solr query approach designed to support dense vector search:
 
 === knn Query Parser
-The `knn` k-nearest neighbors query parser allows to find the k-nearest 
documents to the target vector according to indexed dense vectors in the given 
field.
+The `knn` k-nearest neighbors query parser allows to find the k-nearest 
documents to the target vector according to indexed dense vectors in the given 
field.  The set of documents can be Pre-Riltered to reduce the number of vector 
distance calculations that must be computed, and ensure the best `topK` are 
returned.

Review Comment:
   The set of documents can be Pre-Filtered to reduce the number of vector 
distance calculations that must be computed, and ensure the best `topK` are 
returned.



##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -264,45 +264,113 @@ The `DenseVectorField` to search in.
 +
 How many k-nearest results to return.
 
-Here's how to run a KNN search:
+`preFilter`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: Depends on usage, see below.
+|===
++
+Specifies an explicit list of Pre-Filter query strings to use.
 
-[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+`includeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that only `fq` filters with the specified `tag` should be considered 
for implicit Pre-Filtering.  May not be combind with `preFilter`.

Review Comment:
   Please ignore this suggestion in case it's wrong, I'm not a native speaker:
   May not be combined -> Must not be combined?
   May suggests 'optional'
   Must implies a 'mandatory' condition? (because here it's not optional but 
mandatory I think)



##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -264,45 +264,113 @@ The `DenseVectorField` to search in.
 +
 How many k-nearest results to return.
 
-Here's how to run a KNN search:
+`preFilter`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: Depends on usage, see below.
+|===
++
+Specifies an explicit list of Pre-Filter query strings to use.
 
-[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+`includeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that only `fq` filters with the specified `tag` should be considered 
for implicit Pre-Filtering.  May not be combind with `preFilter`.
 
-The search results retrieved are the k-nearest to the vector in input `[1.0, 
2.0, 3.0, 4.0]`, ranked by the similarityFunction configured at indexing time.
 
-==== Usage with Filter Queries
-The `knn` query parser can be used in filter queries:
+`excludeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that `fq` filters with the specified `tag` should be excluded from 
consideration for implicit Pre-Filtering.  May not be combind with `preFilter`.
+
+
+Here's how to run a simple KNN search:
+
 [source,text]
-&q=id:(1 2 3)&fq={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+?q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+
+The search results retrieved are the k=10 nearest documents to the vector in 
input `[1.0, 2.0, 3.0, 4.0]`, ranked by the `similarityFunction` configured at 
indexing time.
+
+
+==== Explicit KNN Pre-Filtering
+
+The `knn` query parser's `preFilter` parameter can be specified to reduce the 
number of candidate documents evaluated for the k-nearest distance calculation:
 
-The `knn` query parser can be used with filter queries:
 [source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]&fq=id:(1 2 3)
+?q={!knn f=vector topK=10 preFilter=inStock:true}[1.0, 2.0, 3.0, 4.0]
 
-[IMPORTANT]
-====
-Filter queries are executed as pre-filters: the main query refines the sub-set 
of search results derived from the application of all the filter queries 
combined as 'MUST' clauses(boolean AND).
+In the above example, only documents matching the Pre-Filter `inStock:true` 
will be candidates for consideration when evaluating the k-nearest search 
against the specified vector.
+
+The `preFilter` parameter may be blank (ex: `preFilter=""`) to indicate that 
no Pre-Filtering should be performed; or it may be multi-valued -- either 
through repetition, or via duplicated 
xref:local-params.adoc#parameter-dereferencing[Parameter References].
+
+These two examples are equivilent:

Review Comment:
   equivalent?



##########
solr/core/src/test/org/apache/solr/search/neural/KnnQParserTest.java:
##########
@@ -453,52 +455,397 @@ public void 
knnQueryUsedInFilter_shouldFilterResultsBeforeTheQueryExecution() {
   @Test
   public void 
knnQueryUsedInFilters_shouldFilterResultsBeforeTheQueryExecution() {
     String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
+
+    // topK=4 -> 1,4,2,10
     assertQ(
         req(
             CommonParams.Q,
             "id:(3 4 9 2)",
             "fq",
             "{!knn f=vector topK=4}" + vectorToSearch,
             "fq",
-            "id:(4 20)",
+            "id:(4 20 9)",
             "fl",
             "id"),
         "//result[@numFound='1']",
         "//result/doc[1]/str[@name='id'][.='4']");
-  }
-
-  @Test
-  public void 
knnQueryWithFilterQuery_shouldPerformKnnSearchInPreFilteredResults() {
-    String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
 
+    // topK=4 w/localparam preFilter -> 1,4,7,9
     assertQ(
         req(
             CommonParams.Q,
-            "{!knn f=vector topK=10}" + vectorToSearch,
+            "id:(3 4 9 2)",
+            "fq",
+            "{!knn f=vector topK=4 preFilter='id:(1 4 7 8 9)'}" + 
vectorToSearch,
             "fq",
-            "id:(1 2 7 20)",
+            "id:(4 20 9)",
             "fl",
             "id"),
+        "//result[@numFound='2']",
+        "//result/doc[1]/str[@name='id'][.='4']",
+        "//result/doc[2]/str[@name='id'][.='9']");
+
+    for (String fq :
+        Arrays.asList(
+            "{!knn f=vector topK=5 includeTags=xxx}" + vectorToSearch,
+            "{!knn f=vector topK=5 excludeTags=xxx}" + vectorToSearch)) {
+      assertQEx(
+          "fq={!knn...} incompatible with include/exclude localparams",
+          "used as a filter does not support",
+          req("q", "*:*", "fq", fq),
+          SolrException.ErrorCode.BAD_REQUEST);
+    }
+  }
+
+  @Test
+  public void knnQueryAsSubQuery() {

Review Comment:
   also here, possibly split the tests?



##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -264,45 +264,113 @@ The `DenseVectorField` to search in.
 +
 How many k-nearest results to return.
 
-Here's how to run a KNN search:
+`preFilter`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: Depends on usage, see below.
+|===
++
+Specifies an explicit list of Pre-Filter query strings to use.
 
-[source,text]
-&q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
+`includeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that only `fq` filters with the specified `tag` should be considered 
for implicit Pre-Filtering.  May not be combind with `preFilter`.
 
-The search results retrieved are the k-nearest to the vector in input `[1.0, 
2.0, 3.0, 4.0]`, ranked by the similarityFunction configured at indexing time.
 
-==== Usage with Filter Queries
-The `knn` query parser can be used in filter queries:
+`excludeTags`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Indicates that `fq` filters with the specified `tag` should be excluded from 
consideration for implicit Pre-Filtering.  May not be combind with `preFilter`.

Review Comment:
   Same consideration for 'May'



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to