[ https://issues.apache.org/jira/browse/LUCENE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997422#comment-14997422 ]
Paul Elschot commented on LUCENE-6276: -------------------------------------- I went over the patch and the earlier posts to get an overview of open points, TODO's, etc. There are quite a lot of them, so we'll need to prioritize and/or move/defer to other issues. lucene core: ConjunctionDISI matchCost(): give the lower matchCosts a higher weight PhraseQuery: TERM_POSNS_SEEK_OPS_PER_DOC = 128, guess PHRASE_TO_SPAN_TERM_POSITIONS_COST = 4, guess TwoPhaseIterator: Return value of matchCost(): long instead of float? RandomAccessWeight matchCost(): 10, use cost of matchingDocs.get() ReqExclScorer matchCost(): also use cost of exclApproximation.advance() SpanTermQuery: termPositionsCost is copy of PhraseQuery termPositionsCost SpanOrQuery: add cost of balancing priority queues for positions? facet module (defer to other issue): DoubleRange matchCost(): 100, use cost of range.accept() LongRange matchCost(): 100, use cost of range.accept() join module (defer to other issue ?): GlobalOrdinals(WithScore)Query matchCost(): 100, use cost of values.getOrd() and foundOrds.get() GlobalOrdinals(WithScore)Query 2nd matchCost(): 100, use cost of values.getOrd() and foundOrds.get() queries module (defer to other issue): ValueSourceScorer matchCost(): 100, use cost of ValueSourceScorer.this.matches()ValueSourceScorer matchCost(): 100, use cost of spatial module (defer to other issue):: CompositeVerifyQuery matchCost(): 100, use cost of predFuncValues.boolVal() IntersectsRPTVerifyQuery matchCost(): 100, use cost of exactIterator.advance() and predFuncValues.boolVal() test-framework module: RandomApproximationQuery randomMatchCost: between 0 and 200: ok? solr core: Filter matchCost(): 10, use cost of bits.get() ? At this issue: Performance test based on Wikipedia to estimate guessed values. tests for matchCost() ? Check result of ConjunctionSpans.asTwoPhaseIterator: more similar to TwoPhaseConjunctionDISI ? For other issues: At LUCENE-6871 remove copy of SpanTermQuery.termPositionsCost(). SpanOrQuery is getting too big, split off DisjunctionSpans. cost() implementation of conjunctions and disjunctions could improve: add use of indepence assumption. The result of cost() is used here for weighting, so it should be good as possible. > Add matchCost() api to TwoPhaseDocIdSetIterator > ----------------------------------------------- > > Key: LUCENE-6276 > URL: https://issues.apache.org/jira/browse/LUCENE-6276 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Attachments: LUCENE-6276-ExactPhraseOnly.patch, > LUCENE-6276-NoSpans.patch, LUCENE-6276-NoSpans2.patch, LUCENE-6276.patch, > LUCENE-6276.patch, LUCENE-6276.patch, LUCENE-6276.patch, LUCENE-6276.patch, > LUCENE-6276.patch, LUCENE-6276.patch > > > We could add a method like TwoPhaseDISI.matchCost() defined as something like > estimate of nanoseconds or similar. > ConjunctionScorer could use this method to sort its 'twoPhaseIterators' array > so that cheaper ones are called first. Today it has no idea if one scorer is > a simple phrase scorer on a short field vs another that might do some geo > calculation or more expensive stuff. > PhraseScorers could implement this based on index statistics (e.g. > totalTermFreq/maxDoc) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org