[jira] [Commented] (LUCENE-6276) Add matchCost() api to TwoPhaseDocIdSetIterator

Robert Muir (JIRA) Mon, 12 Oct 2015 06:32:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953114#comment-14953114
 ]


Robert Muir commented on LUCENE-6276:
-------------------------------------

{quote}
As to TwoPhaseIterator or DocIdSetIterator, I think this boils down to whether 
the leading iterator in ConjunctionDISI should be chosen using the expected 
number of matching docs only, or also using the totalTermFreq's somehow. This 
is for more complex queries, for example a conjunction with at least one phrase 
or SpanNearQuery.

But for the more complex queries two phase approximation is already in place, 
so having matchCost() only in the two phase code could be enough even for these 
queries.
{quote}

Yes, to keep things simple, I imagined this api would just be the cost of 
calling {{matches()}} itself so I think the two phase API is the correct place 
to put it (like in your patch).

We already have a {{cost()}} api for DISI for doing things like conjunctions 
(yes its purely based on density and maybe that is imperfect) but I think we 
should try to narrow the scope of this issue to just the cost of the 
{{matches()}} operation, which can vary wildly depending on query type or 
document size.

What adrien says about "likelyhood of match" is also interesting but I think we 
want to defer that too. To me that is just a matter of having more accurate 
{{cost()}} and it may not be easy or feasible to improve...


> Add matchCost() api to TwoPhaseDocIdSetIterator
> -----------------------------------------------
>
>                 Key: LUCENE-6276
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6276
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6276-ExactPhraseOnly.patch
>
>
> We could add a method like TwoPhaseDISI.matchCost() defined as something like 
> estimate of nanoseconds or similar. 
> ConjunctionScorer could use this method to sort its 'twoPhaseIterators' array 
> so that cheaper ones are called first. Today it has no idea if one scorer is 
> a simple phrase scorer on a short field vs another that might do some geo 
> calculation or more expensive stuff.
> PhraseScorers could implement this based on index statistics (e.g. 
> totalTermFreq/maxDoc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6276) Add matchCost() api to TwoPhaseDocIdSetIterator

Reply via email to