[ 
https://issues.apache.org/jira/browse/LUCENE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907521#comment-13907521
 ] 

Michael McCandless commented on LUCENE-5460:
--------------------------------------------

I think the particular case that advanceExact (or something like it) would 
speed up is a costly query matching many documents, and a Filter also matching 
many documents, where that Filter cannot be random-access but is "relatively" 
lost cost at iteration (e.g. a TermFilter).

In this case, today, we do the leap-frog thing, and ask the query's scorer to 
.advance, which is essentially a .advanceExact and then a .nextDoc, but that 
.nextDoc is wasted cost because with a separate .advanceExact we could avoid 
that.

It's not clear how often this really arises in practice.  Do most apps cache 
all their filters, like Solr does?  In which case the filter will likely be 
random access (provide Bits) and would (if estimated density is > 1%) be pushed 
"down low".

> Allow driving a query by sparse filters
> ---------------------------------------
>
>                 Key: LUCENE-5460
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5460
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Shai Erera
>
> Today if a filter is very sparse we execute the query in sort of a leap-frog 
> manner between the query and filter. If the query is very expensive to 
> compute, and/or matching few docs only too, calling scorer.advance(doc) just 
> to discover the doc it landed on isn't accepted by the filter, is a waste of 
> time. Since Filter is always the "final ruler", I wonder if we had something 
> like {{boolean DISI.advanceExact(doc)}} we could use it instead, in some 
> cases.
> There are many combinations in which I think we'd want to use/not-use this 
> API, and they depend on: Filter's complexity, Filter.cost(), Scorer.cost(), 
> query complexity (span-near, many clauses) etc.
> I open an issue so we can discuss. DISI.advanceExact(doc) is just a 
> preliminary proposal, to get an API we could experiment with. The default 
> implementation should be fairly easy and straightforward, and we could 
> override where we can offer a more optimized imp.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to