[ https://issues.apache.org/jira/browse/LUCENE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952926#comment-14952926 ]
Adrien Grand commented on LUCENE-6276: -------------------------------------- I think it would make more sense to sum up {{totalTermFreq/docFreq}} for each term instead of {{totalTermFreq/conjunctionDISI.cost()}}, so that we get the average number of positions per document? But otherwise I think you got the intention right. Something else to be careful with is that {{TermStatistics.totalTermFreq()}} may return -1, so we need a fallback for that case. Maybe we could just assume 1 position per document? A related question is what definition we should give to {{matchCost()}}. The patch does not have the issue yet since it only deals with phrase queries, but eventually we should be able to compare the cost of eg. a phrase query against a doc values range query even though they perform very different computations. Maybe the javadocs of matchCost could suggest a scale of costs of operations that implementors of matchCost() could use in order to compute the cost of matching the two-phase iterator. It could be something like 1 for nextDoc(), nextPosition(), comparisons and basic arithmetic operations and eg. 10 for advance()? > Add matchCost() api to TwoPhaseDocIdSetIterator > ----------------------------------------------- > > Key: LUCENE-6276 > URL: https://issues.apache.org/jira/browse/LUCENE-6276 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Attachments: LUCENE-6276-ExactPhraseOnly.patch > > > We could add a method like TwoPhaseDISI.matchCost() defined as something like > estimate of nanoseconds or similar. > ConjunctionScorer could use this method to sort its 'twoPhaseIterators' array > so that cheaper ones are called first. Today it has no idea if one scorer is > a simple phrase scorer on a short field vs another that might do some geo > calculation or more expensive stuff. > PhraseScorers could implement this based on index statistics (e.g. > totalTermFreq/maxDoc) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org