[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Da Huang updated LUCENE-4396: ----------------------------- Attachment: LUCENE-4396.patch This is a patch based on git mirror commit d707f783ab068b70752a3f9cfdc0dabb7f4fbadf . In this patch, I tried to fix the .getChildren() problem in BAS and BLS. I have tried to make .bulkScorer() choose DAAT, when scoreDocsInOrder is true. However, I discovered that I have to copy the scorer choosing logics to .scoreDocsOutOfOrder() to make things right. I have also tried to implement the .getChildren() method for BAS and BLS, but the TAAT strategy will make scorers exhausted at the beginning. Finally, I just throw UnsupportedOperationException in BAS.getChildren() and BLS.getChildren(). Besides, I have run more tests to make sure everything is right. As you can see, the performance of HighAnd.\*Low.\* cases showed in merge.png is not good. Therefore, I ran HighAnd.\*Low.\* cases with luceneutil's pattern filter, and the result is as follows. {code} TaskQPS baseline StdDevQPS my_version StdDev Pct diff HighAnd6LowOr 9.44 (6.4%) 9.19 (4.8%) -2.6% ( -12% - 9%) HighAnd5LowOr 9.00 (8.8%) 8.85 (7.4%) -1.6% ( -16% - 16%) HighAnd3LowOr 11.89 (8.9%) 11.71 (7.8%) -1.6% ( -16% - 16%) HighAnd4LowOr 10.78 (7.4%) 10.61 (6.3%) -1.5% ( -14% - 13%) HighAnd7LowOr 9.08 (7.2%) 8.94 (5.8%) -1.5% ( -13% - 12%) HighAnd8LowOr 6.32 (8.6%) 6.23 (6.9%) -1.4% ( -15% - 15%) HighAnd9LowOr 5.71 (5.7%) 5.65 (4.5%) -1.1% ( -10% - 9%) PKLookup 98.95 (4.5%) 98.38 (2.4%) -0.6% ( -7% - 6%) HighAnd9LowNot 7.49 (3.7%) 7.46 (3.2%) -0.4% ( -7% - 6%) HighAnd4LowNot 10.33 (6.4%) 10.31 (6.1%) -0.2% ( -11% - 13%) HighAnd8LowNot 6.69 (5.3%) 6.70 (4.9%) 0.1% ( -9% - 10%) HighAnd7LowNot 6.82 (5.1%) 6.84 (5.0%) 0.3% ( -9% - 10%) HighAnd6LowNot 9.45 (5.5%) 9.48 (4.7%) 0.3% ( -9% - 11%) HighAnd3LowNot 10.80 (6.7%) 10.87 (6.1%) 0.6% ( -11% - 14%) HighAnd5LowNot 4.28 (7.4%) 4.32 (7.1%) 1.0% ( -12% - 16%) {code} Everything looks right. I have also run tests for more complicate tasks. {code} TaskQPS baseline StdDevQPS my_version StdDev Pct diff LowAnd6LowOr6LowNot 31.59 (1.0%) 28.52 (2.4%) -9.7% ( -12% - -6%) HighAnd6LowOr6LowNot 6.10 (2.7%) 5.76 (4.0%) -5.6% ( -11% - 1%) MedAnd6LowOr6LowNot 7.33 (2.3%) 7.03 (3.1%) -4.0% ( -9% - 1%) HighAnd6MedOr6LowNot 3.51 (1.5%) 3.49 (2.6%) -0.6% ( -4% - 3%) PKLookup 95.99 (5.1%) 95.48 (4.9%) -0.5% ( -10% - 9%) HighAnd6MedOr6MedNot 1.96 (1.3%) 1.97 (2.5%) 0.4% ( -3% - 4%) MedAnd6MedOr6MedNot 2.34 (1.2%) 2.35 (2.3%) 0.5% ( -2% - 4%) HighAnd6LowOr6HighNot 1.31 (1.1%) 1.33 (2.4%) 0.9% ( -2% - 4%) HighAnd6LowOr6MedNot 3.08 (1.5%) 3.12 (2.7%) 1.2% ( -2% - 5%) MedAnd6LowOr6MedNot 3.72 (1.4%) 3.89 (2.6%) 4.8% ( 0% - 8%) HighAnd6MedOr6HighNot 1.40 (1.0%) 1.53 (2.4%) 9.3% ( 5% - 12%) LowAnd6LowOr6MedNot 9.23 (2.1%) 10.19 (2.7%) 10.4% ( 5% - 15%) LowAnd6LowOr6HighNot 6.04 (2.5%) 6.74 (2.9%) 11.6% ( 6% - 17%) LowAnd6HighOr6HighNot 4.15 (3.4%) 4.72 (4.2%) 13.8% ( 5% - 22%) MedAnd6MedOr6HighNot 1.65 (1.2%) 1.91 (2.2%) 15.7% ( 12% - 19%) MedAnd6LowOr6HighNot 2.42 (1.7%) 2.80 (2.7%) 16.0% ( 11% - 20%) LowAnd6HighOr6LowNot 4.69 (2.9%) 5.45 (3.7%) 16.1% ( 9% - 23%) MedAnd6MedOr6LowNot 3.45 (1.2%) 4.04 (2.1%) 17.1% ( 13% - 20%) LowAnd6MedOr6LowNot 8.77 (1.6%) 10.38 (2.4%) 18.4% ( 14% - 22%) LowAnd6MedOr6MedNot 6.36 (2.6%) 7.55 (3.5%) 18.6% ( 12% - 25%) LowAnd6MedOr6HighNot 5.48 (3.1%) 6.51 (3.9%) 18.8% ( 11% - 26%) LowAnd6HighOr6MedNot 5.77 (3.1%) 6.86 (4.3%) 18.9% ( 11% - 27%) MedAnd6HighOr6HighNot 1.22 (1.0%) 1.46 (2.0%) 19.8% ( 16% - 23%) HighAnd6HighOr6MedNot 1.32 (1.1%) 1.59 (2.0%) 20.7% ( 17% - 24%) MedAnd6HighOr6MedNot 1.72 (1.5%) 2.09 (2.2%) 21.3% ( 17% - 25%) HighAnd6HighOr6HighNot 1.26 (1.2%) 1.56 (2.1%) 24.0% ( 20% - 27%) HighAnd6HighOr6LowNot 1.54 (1.3%) 1.92 (2.0%) 24.7% ( 21% - 28%) MedAnd6HighOr6LowNot 2.26 (1.5%) 2.85 (1.9%) 26.3% ( 22% - 30%) {code} All look good. If no other problems, I will begin to clean up those unused logics in the code such as BLS, and refine the javadoc. > BooleanScorer should sometimes be used for MUST clauses > ------------------------------------------------------- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf, > luceneutil-score-equal.patch, luceneutil-score-equal.patch, merge.perf, > merge.png, perf.png, stat.cpp, stat.cpp, tasks.cpp > > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 1000000 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org