[jira] [Commented] (SOLR-16858) Allow KnnQParser to selectively apply filters

Chris M. Hostetter (Jira) Mon, 18 Dec 2023 10:59:04 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-16858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798321#comment-17798321
 ]


Chris M. Hostetter commented on SOLR-16858:
-------------------------------------------

{quote}...let me explain here and if that old pull request is not relevant 
anymore, I'll proceed with a deep review of this PR...
{quote}
yes please – I definitely think you should give a more in depth read of some of 
the examples I posted above, and the test cases added in my PR – what you're 
talking about really seems to be orthogonal to the flexibility I'm trying to 
add?
{quote}In this way, any filter should work as it works for any other Solr query 
(where you can decide if doing a prefilter of postfilter based on the cache and 
cost >100 of the filter).
Also, include/exclude should work as usual.
{quote}
I think we're having a disconnect in concepts, so I'd like to clarify 
terminology....

Historically, before any notion of knn, solr has had 2 types of {{fq}} "filters"
 * Plain old "regular" filter queries – that can be evaluated completely 
independently of each other or the main query
 ** These are typically cached, so that they can be re-used in other request
 ** If they aren't cached some small optimizations are available
 * {{PostFilter}} which allows the ability to defer "expensive" match/scoring 
computation until we _know_ that the document matches all other parts of the 
query (including "regular filter queries"
 ** Only a small handful of built in {{QParser}} s can be used a {{PostFilter}}
 ** Never cached, because they are entirely dependent on situation

KNN queries, really have their own filtering:
 * "pre-filter" the set of documents considered per segment when identifying 
the topK
 ** The {{KnnVectorQuery}} classes is essentially a _wrapper_ around another 
"inner" query (like {{{}ConstantScoreQuery{}}}, {{{}BoostQuery{}}}, etc...)
 ** The KNN score calculations only consider documents that match the inner 
query

So conceptually, we've got

1. Solr's "regular" filters
2. Solr's {{PostFilter}}
3. KNN's pre-filter

When the {{KnnQParser}} was added to Solr, The assumption/implementation was/is:
 * (A) when {{KnnQParser}} is used as the main query:
 ** *All* of Solr's "regular" filter quries should be used as the KNN 
"pre-filter"
 * (B) when {{KnnQParser}} is itself used as a "regular" solr filter, or as a 
subquery:
 ** There should be *no* KNN pre-filter at all

Essentially: the design of the {{KnnQParser}} assumes a tight, "all or nothing" 
coupling between the KNN pre-filter and Solr's "regular" filters.

If we go back to what you described regarding your PR...
{quote}In this way, any filter should work as it works for any other Solr query 
(where you can decide if doing a prefilter of postfilter based on the cache and 
cost >100 of the filter).
Also, include/exclude should work as usual.
{quote}
... that sounds like you're saying there is a bug in "(A)" you want to address, 
such that it's not just wrapping the "regular" filters, it's also wrapping 
any/all {{PostFilter}} s as well (which i hadn't noticed before, but thinking 
about how the code works it makes sense that bug would exist) and you have an 
approach in PR you were planning to use to tackle that.

As I said, I have not considered the {{PostFilter}} bug case you are describing 
– What I'm focused on is giving users the ability to decouple that "all or 
nothing" assumption:
 * Give users the ability to control what "pre-filtering" is used when 
{{KnnQParser}} is the main query
 ** Really important for a lot of faceting usecases, when you want to be able 
to add "regular filters" for facet drill down (to narrow the set of documents 
returned) that should not become part of the KNN "pre-filter"
 * Give users the ability to specify _some_ "pre-filtering" even when 
{{KnnQParser}} is *NOT* the main query
 ** Example: pre-filter on {{inStock:true}} to get the best {{topK}} possible, 
even when {{KnnQParser}} is a subquery...
{noformat}
q=(name:foo AND (category:hot-reviews OR {!knn f=vfield topK=100 v=$vec 
fq='inStock'})
{noformat}

 ** Example: Multiple {{KnnQParser}} instances in the same request that want to 
pre-filter on different things...
{noformat}
q=({!knn f=vfield topK=100 v=$vec fq='category:legal'}^3 OR {!knn f=vfield 
topK=100 v=$vec fq='category:hr'}^7)
vec=...
{noformat}

Does that make sense?
----
{quote}Do we have other query parsers that have a local FQ?
{quote}
No, but we also don't have any other query parsers that _implicitly_ slurp up 
all other "regular" filters and use those to change their internal behavior. My 
goal in adding an {{fq}} (and {{{}excludeTag{}}}/{{{}includeTag{}}}) local 
params is to give users the ability to override that very special implicit 
behavior.

We _DO_ have lots of other query parsers that are designed to "wrap" other 
queries – and IMO {{KnnQParser}} probably should have been implemented that way 
from the beginning – similar to how the {{boost}} or {{join}} QParsers work. 
ie: no implicit slurping of {{fq}} params, the vector to score with is a local 
param, and the body of the parser is the query to wrap for the "knn 
pre-filter"...
{noformat}
q={!knn f=myfield topK=10 v='[1,2,3,4]'}inStock:true
fq=foo:"nothing special happens to this filter"
{noformat}
...but we can't go back in time now :)

So instead i'm trying to provide ways to move forward with additional use cases 
that aren't supported by the current design.

> Allow KnnQParser to selectively apply filters
> ---------------------------------------------
>
>                 Key: SOLR-16858
>                 URL: https://issues.apache.org/jira/browse/SOLR-16858
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Joel Bernstein
>            Assignee: Chris M. Hostetter
>            Priority: Major
>              Labels: hybrid-search
>         Attachments: SOLR-16858-1.patch, SOLR-16858.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The KnnQParser is parsing the filter query which limits the rows considered 
> by the vector query with the following method:
> {code:java}
> private Query getFilterQuery() throws SolrException, SyntaxError {
>     boolean isSubQuery = recurseCount != 0;
>     if (!isFilter() && !isSubQuery) {
>       String[] filterQueries = req.getParams().getParams(CommonParams.FQ);
>       if (filterQueries != null && filterQueries.length != 0) {
>         try {
>           List<Query> filters = QueryUtils.parseFilterQueries(req);
>           SolrIndexSearcher.ProcessedFilter processedFilter =
>               req.getSearcher().getProcessedFilter(filters);
>           return processedFilter.filter;
>         } catch (IOException e) {
>           throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
>         }
>       }
>     }
>     return null;
>   }
> {code}
> This is pulling all filter queries from the main query parameters and using 
> them to limit the vector query. This is the automatic behavior of the 
> KnnQParser.
> There are cases where you may want to selectively apply different filters. 
> One such case is SOLR-16857 which involves reRanking a collapsed query.
> Overriding the default filter behavior could be done by adding an "fq" local 
> parameter to the KnnQParser which would override the default filtering 
> behavior.
> {code:java}
> {!knn f=vector topK=10 fq=$kfq}[...]&kfq=myquery
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16858) Allow KnnQParser to selectively apply filters

Reply via email to