[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Da Huang updated LUCENE-4396: ----------------------------- Attachment: LUCENE-4396.patch This is the first try to merge scorers, so that we can get a better perf of boolean retrieval. I create a new class named "BooleanMixedScorerDecider" to choose the best scorer. Rules for choosing remains to be improved. I have been working on it to find an elegant way to define rules. {code} TaskQPS baseline StdDevQPS my_version StdDev Pct diff HighAndSomeLowNot 11.53 (7.3%) 10.75 (10.1%) -6.8% ( -22% - 11%) HighAndTonsLowNot 4.87 (4.0%) 4.64 (6.0%) -4.9% ( -14% - 5%) LowAndSomeLowOr 306.20 (2.2%) 299.06 (2.8%) -2.3% ( -7% - 2%) HighAndSomeLowOr 13.67 (9.4%) 13.38 (2.7%) -2.1% ( -13% - 11%) HighAndTonsLowOr 4.04 (6.4%) 3.96 (1.9%) -1.9% ( -9% - 6%) LowAndSomeLowNot 215.18 (1.9%) 211.14 (2.2%) -1.9% ( -5% - 2%) PKLookup 96.26 (2.3%) 94.56 (2.8%) -1.8% ( -6% - 3%) HighAndTonsHighNot 0.06 (2.3%) 0.06 (2.6%) -1.0% ( -5% - 4%) HighAndTonsHighOr 0.06 (0.6%) 0.06 (1.3%) 0.9% ( 0% - 2%) HighAndSomeHighNot 1.59 (2.2%) 1.62 (2.9%) 1.7% ( -3% - 6%) LowAndSomeHighNot 66.33 (2.1%) 68.77 (2.1%) 3.7% ( 0% - 8%) LowAndSomeHighOr 53.75 (1.6%) 56.86 (2.1%) 5.8% ( 1% - 9%) LowAndTonsLowNot 14.00 (1.7%) 14.84 (1.5%) 6.1% ( 2% - 9%) HighAndSomeHighOr 2.39 (2.2%) 2.68 (3.5%) 12.4% ( 6% - 18%) LowAndTonsLowOr 17.69 (0.9%) 21.64 (1.7%) 22.3% ( 19% - 25%) LowAndTonsHighOr 1.83 (1.3%) 2.33 (2.4%) 27.2% ( 23% - 31%) LowAndTonsHighNot 1.15 (1.5%) 1.51 (3.1%) 30.9% ( 25% - 36%) {code} > BooleanScorer should sometimes be used for MUST clauses > ------------------------------------------------------- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch, > luceneutil-score-equal.patch, stat.cpp, stat.cpp > > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 1000000 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org