[ https://issues.apache.org/jira/browse/LUCENE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953114#comment-14953114 ]
Robert Muir commented on LUCENE-6276: ------------------------------------- {quote} As to TwoPhaseIterator or DocIdSetIterator, I think this boils down to whether the leading iterator in ConjunctionDISI should be chosen using the expected number of matching docs only, or also using the totalTermFreq's somehow. This is for more complex queries, for example a conjunction with at least one phrase or SpanNearQuery. But for the more complex queries two phase approximation is already in place, so having matchCost() only in the two phase code could be enough even for these queries. {quote} Yes, to keep things simple, I imagined this api would just be the cost of calling {{matches()}} itself so I think the two phase API is the correct place to put it (like in your patch). We already have a {{cost()}} api for DISI for doing things like conjunctions (yes its purely based on density and maybe that is imperfect) but I think we should try to narrow the scope of this issue to just the cost of the {{matches()}} operation, which can vary wildly depending on query type or document size. What adrien says about "likelyhood of match" is also interesting but I think we want to defer that too. To me that is just a matter of having more accurate {{cost()}} and it may not be easy or feasible to improve... > Add matchCost() api to TwoPhaseDocIdSetIterator > ----------------------------------------------- > > Key: LUCENE-6276 > URL: https://issues.apache.org/jira/browse/LUCENE-6276 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Attachments: LUCENE-6276-ExactPhraseOnly.patch > > > We could add a method like TwoPhaseDISI.matchCost() defined as something like > estimate of nanoseconds or similar. > ConjunctionScorer could use this method to sort its 'twoPhaseIterators' array > so that cheaper ones are called first. Today it has no idea if one scorer is > a simple phrase scorer on a short field vs another that might do some geo > calculation or more expensive stuff. > PhraseScorers could implement this based on index statistics (e.g. > totalTermFreq/maxDoc) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org