[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Da Huang updated LUCENE-4396: ----------------------------- Attachment: SIZE.perf stat.cpp I have done tests for different SIZE of bucketTable. The file 'SIZE.perf' is the original test result data. 'stat.cpp' is a C++ program used to do statistic on *.perf files. You can compile it with 'g++ stat.cpp -std=c++0x -o stat' and run by './stat < SIZE.perf' The statistic result for SIZE.perf is supposed to be as follows. {code} Task size10 size11 size5 size6 size7 size8 size9 HighAndSomeHighNot -14.5 4.0 6.6 -3.0 5.2 10.0* 3.4 HighAndSomeHighOr 2.4 10.9 17.3 17.4 12.9 18.3 21.3* HighAndSomeLowNot -36.8 -37.3 -47.8 -47.8 -40.2 -42.2 -41.5 HighAndSomeLowOr -45.1 -46.4 -47.9 -46.2 -38.7 -39.7 -44.9 HighAndTonsHighNot 162.4* 145.1 149.1 130.1 142.9 144.7 143.7 HighAndTonsHighOr 154.8* 146.5 154.0 137.8 144.9 150.0 149.1 HighAndTonsLowNot -27.0 -17.4 -73.7 -49.6 -40.1 -28.6 -15.6 HighAndTonsLowOr -28.7 -14.3 -63.8 -44.8 -33.0 -24.4 -13.9 LowAndSomeHighNot 3.0 0.2 4.5 6.2* 5.7 6.2* 4.7 LowAndSomeHighOr 5.3 1.4 6.8* 6.7 7.7 5.8 6.6 LowAndSomeLowNot -6.3 -24.4 3.7* 0.8 1.7 -2.3 -4.0 LowAndSomeLowOr -10.3 -22.7 2.2* 2.0 1.7 -2.3 -8.8 LowAndTonsHighNot 27.3* 21.4 22.5 21.5 21.0 23.8 26.5 LowAndTonsHighOr 23.1 28.2 24.2 23.9 29.1* 27.5 28.2 LowAndTonsLowNot 33.0 46.5 39.1 33.4 30.0 47.2* 44.3 LowAndTonsLowOr 45.7* 34.6 29.9 36.8 45.3 40.9 38.1 {code} size7 means the bucketTable's size is 1 << 7. It seems that we can get a better result on *SOME* tasks if we combine size9 with size5. > BooleanScorer should sometimes be used for MUST clauses > ------------------------------------------------------- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, > luceneutil-score-equal.patch, luceneutil-score-equal.patch, stat.cpp > > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 1000000 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org