[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Da Huang updated LUCENE-4396: ----------------------------- Attachment: all.perf stat.cpp I have retested previous explored methods, and do an statistic on their performance. The file all.perf is the original perf. data. stat.cpp is used to do an statistic on all.perf. {code} g++ -std=c++0x stat.cpp -o stat ./stat < all.perf {code} The perf. statistic results are showed as follows. {code} Task ArrayNotDel BS BitSet ll llbs size5 size8 size9 HighAndSomeHighNot 0.7 15.3* 7.4 8.9 2.0 6.6 10.0 3.4 HighAndSomeHighOr 13.3 24.5* 7.8 9.1 10.9 17.3+ 18.3+ 21.3+ HighAndSomeLowNot -45.1 -53.9 -55.0 -57.3 -45.5 -47.8 -42.2 -41.5 HighAndSomeLowOr -44.7 -55.4 -51.2 -58.1 -54.5 -47.9 -39.7 -44.9 HighAndTonsHighNot 475.7+ 472.7+ 507.0+ 552.9+ 627.9* 149.1 144.7 143.7 HighAndTonsHighOr 141.0+ 135.4+ 162.4+ 153.4+ 169.7* 154.0+ 150.0+ 149.1+ HighAndTonsLowNot -49.9 -66.2 -46.8 -76.9 -30.3 -73.7 -28.6 -15.6 HighAndTonsLowOr -22.4 -69.4 -30.2 -67.5 -41.9 -63.8 -24.4 -13.9 LowAndSomeHighNot 3.7 -2.6 -9.0 -7.3 -6.2 4.5+ 6.2* 4.7+ LowAndSomeHighOr 1.5 -14.0 -15.5 -10.8 -12.0 6.8* 5.8+ 6.6+ LowAndSomeLowNot -26.4 -43.7 -56.5 -47.3 -43.7 3.7* -2.3 -4.0 LowAndSomeLowOr -23.2 -41.8 -60.5 -46.2 -43.4 2.2* -2.3 -8.8 LowAndTonsHighNot 380.6+ 171.5 118.4 248.3 381.8* 22.5 23.8 26.5 LowAndTonsHighOr 29.8* 5.2 -1.1 10.7 5.4 24.2+ 27.5+ 28.2+ LowAndTonsLowNot 28.9 9.1 -39.3 5.3 1.3 39.1+ 47.2* 44.3+ LowAndTonsLowOr 30.9+ 7.2 -38.1 0.5 9.0 29.9+ 40.9* 38.1+ Task Good Method HighAndSomeHighNot BS, HighAndSomeHighOr BS, size9, size8, size5, HighAndSomeLowNot HighAndSomeLowOr HighAndTonsHighNot llbs, ll, BitSet, ArrayNotDel, BS, HighAndTonsHighOr llbs, BitSet, size5, ll, size8, size9, ArrayNotDel, BS, HighAndTonsLowNot HighAndTonsLowOr LowAndSomeHighNot size8, size9, size5, LowAndSomeHighOr size5, size9, size8, LowAndSomeLowNot size5, LowAndSomeLowOr size5, LowAndTonsHighNot llbs, ArrayNotDel, LowAndTonsHighOr ArrayNotDel, size9, size8, size5, LowAndTonsLowNot size8, size9, size5, LowAndTonsLowOr size8, size9, ArrayNotDel, size5, {code} Among them, 'll' is the linkedlist docs method, while 'llbs' is the linkedlist with bitset. Character '*' marks the best perf, while '+' marks ones some kind of as good as the best perf. I have been merging these methods. I decided to move the scorer choosing logic into a new class, but a bug come to me. I'm working on it. > BooleanScorer should sometimes be used for MUST clauses > ------------------------------------------------------- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > SIZE.perf, all.perf, luceneutil-score-equal.patch, > luceneutil-score-equal.patch, stat.cpp, stat.cpp > > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 1000000 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org