[jira] [Commented] (LUCENE-6276) Add matchCost() api to TwoPhaseDocIdSetIterator

Adrien Grand (JIRA) Mon, 12 Oct 2015 04:04:56 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952926#comment-14952926
 ]


Adrien Grand commented on LUCENE-6276:
--------------------------------------

I think it would make more sense to sum up {{totalTermFreq/docFreq}} for each 
term instead of {{totalTermFreq/conjunctionDISI.cost()}}, so that we get the 
average number of positions per document? But otherwise I think you got the 
intention right. Something else to be careful with is that 
{{TermStatistics.totalTermFreq()}} may return -1, so we need a fallback for 
that case. Maybe we could just assume 1 position per document?

A related question is what definition we should give to {{matchCost()}}. The 
patch does not have the issue yet since it only deals with phrase queries, but 
eventually we should be able to compare the cost of eg. a phrase query against 
a doc values range query even though they perform very different computations. 
Maybe the javadocs of matchCost could suggest a scale of costs of operations 
that implementors of matchCost() could use in order to compute the cost of 
matching the two-phase iterator. It could be something like 1 for nextDoc(), 
nextPosition(), comparisons and basic arithmetic operations and eg. 10 for 
advance()?

> Add matchCost() api to TwoPhaseDocIdSetIterator
> -----------------------------------------------
>
>                 Key: LUCENE-6276
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6276
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6276-ExactPhraseOnly.patch
>
>
> We could add a method like TwoPhaseDISI.matchCost() defined as something like 
> estimate of nanoseconds or similar. 
> ConjunctionScorer could use this method to sort its 'twoPhaseIterators' array 
> so that cheaper ones are called first. Today it has no idea if one scorer is 
> a simple phrase scorer on a short field vs another that might do some geo 
> calculation or more expensive stuff.
> PhraseScorers could implement this based on index statistics (e.g. 
> totalTermFreq/maxDoc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6276) Add matchCost() api to TwoPhaseDocIdSetIterator

Reply via email to