[jira] [Commented] (SOLR-14166) Use TwoPhaseIterator for non-cached filter queries

David Smiley (Jira) Sun, 05 Jan 2020 12:06:26 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-14166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008410#comment-17008410
 ]


David Smiley commented on SOLR-14166:
-------------------------------------

The PR has the code details but I want to mention some more bigger picture here.

I have this as a sub-task of Remove/refactor Filter because this reduces the 
use of the old Filter abstraction.  SolrIndexSearcher.ProcessedFilter.filter is 
now declared as a Query.  SolrIndexSearcher no longer has FilterImpl.  Now that 
pf.filter is a Query, this allowed for SolrIndexSearcher.getDocSet(List<Query> 
fqs) to be simpler and allowed me to remove the similar getDocSetScore.

So how is TwoPhaseIterator used efficiently you may ask?  BooleanQuery's FILTER 
clauses use this internally via ConjunctionDISI.  I modified 
SolrIndexSearcher.getProcessedFilter to create a BooleanQuery with these FILTER 
clauses for the non-cached queries.

Unfortunately we lose the ability for the "cost" param on these non-cached 
filter queries to have meaning.  Instead, the Queries themselves and any TPIs 
they may have ought to have suitable costs, and they are not externally 
configurable.  Maybe we could make a wrapping query that wraps the underlying 
TPI.matchCost... or just not bother, letting the queries themselves actually 
compute an internal cost that is perhaps better than whatever the user 
supplies.  I lean this way; less complexity.  Unfortunately, 
ValueSourceScorer's TPI matchCost is a constant 100 instead of varying based on 
the particular FunctionValues implementation.  That should be its own issue to 
address.

> Use TwoPhaseIterator for non-cached filter queries
> --------------------------------------------------
>
>                 Key: SOLR-14166
>                 URL: https://issues.apache.org/jira/browse/SOLR-14166
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> "fq" filter queries that have cache=false and which aren't processed as a 
> PostFilter (thus either aren't a PostFilter or have a cost < 100) are 
> processed in SolrIndexSearcher using a custom Filter thingy which uses a 
> cost-ordered series of DocIdSetIterators.  This is not TwoPhaseIterator 
> aware, and thus the match() method may be called on docs that ideally would 
> have been filtered by lower-cost filter queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-14166) Use TwoPhaseIterator for non-cached filter queries

Reply via email to