[ https://issues.apache.org/jira/browse/LUCENE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048947#comment-15048947 ]
Adrien Grand commented on LUCENE-6919: -------------------------------------- Woops, here is the actual patch I wanted to upload. > Change the Scorer API to expose an iterator instead of extending > DocIdSetIterator > --------------------------------------------------------------------------------- > > Key: LUCENE-6919 > URL: https://issues.apache.org/jira/browse/LUCENE-6919 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Attachments: LUCENE-6919.patch, LUCENE-6919.patch, LUCENE-6919.patch > > > I was working on trying to address the performance regression on LUCENE-6815 > but this is hard to do without introducing specialization of > DisjunctionScorer which I'd like to avoid at all costs. > I think the performance regression would be easy to address without > specialization if Scorers were changed to return an iterator instead of > extending DocIdSetIterator. So conceptually the API would move from > {code} > class Scorer extends DocIdSetIterator { > } > {code} > to > {code} > class Scorer { > DocIdSetIterator iterator(); > } > {code} > This would help me because then if none of the sub clauses support two-phase > iteration, DisjunctionScorer could directly return the approximation as an > iterator instead of having to check if twoPhase == null at every iteration. > Such an approach could also help remove some method calls. For instance > TermScorer.nextDoc calls PostingsEnum.nextDoc but with this change > TermScorer.iterator() could return the PostingsEnum and TermScorer would not > even appear in stack traces when scoring. I hacked a patch to see how much > that would help and luceneutil seems to like the change: > {noformat} > TaskQPS baseline StdDev QPS patch StdDev > Pct diff > Fuzzy1 88.54 (15.7%) 86.73 (16.6%) > -2.0% ( -29% - 35%) > AndHighLow 698.98 (4.1%) 691.11 (5.1%) > -1.1% ( -9% - 8%) > Fuzzy2 26.47 (11.2%) 26.28 (10.3%) > -0.7% ( -19% - 23%) > MedSpanNear 141.03 (3.3%) 140.51 (3.2%) > -0.4% ( -6% - 6%) > HighPhrase 60.66 (2.6%) 60.48 (3.3%) > -0.3% ( -5% - 5%) > LowSpanNear 29.25 (2.4%) 29.21 (2.1%) > -0.1% ( -4% - 4%) > MedPhrase 28.32 (1.9%) 28.28 (2.0%) > -0.1% ( -3% - 3%) > LowPhrase 17.31 (2.1%) 17.29 (2.6%) > -0.1% ( -4% - 4%) > HighSloppyPhrase 10.93 (6.0%) 10.92 (6.0%) > -0.1% ( -11% - 12%) > MedSloppyPhrase 72.21 (2.2%) 72.27 (1.8%) > 0.1% ( -3% - 4%) > Respell 57.35 (3.2%) 57.41 (3.4%) > 0.1% ( -6% - 6%) > HighSpanNear 26.71 (3.0%) 26.75 (2.5%) > 0.1% ( -5% - 5%) > OrNotHighLow 803.46 (3.4%) 807.03 (4.2%) > 0.4% ( -6% - 8%) > LowSloppyPhrase 88.02 (3.4%) 88.77 (2.5%) > 0.8% ( -4% - 7%) > OrNotHighMed 200.45 (2.7%) 203.83 (2.5%) > 1.7% ( -3% - 7%) > OrHighHigh 38.98 (7.9%) 40.30 (6.6%) > 3.4% ( -10% - 19%) > HighTerm 92.53 (5.3%) 95.94 (5.8%) > 3.7% ( -7% - 15%) > OrHighMed 53.80 (7.7%) 55.79 (6.6%) > 3.7% ( -9% - 19%) > AndHighMed 266.69 (1.7%) 277.15 (2.5%) > 3.9% ( 0% - 8%) > Prefix3 44.68 (5.4%) 46.60 (7.0%) > 4.3% ( -7% - 17%) > MedTerm 261.52 (4.9%) 273.52 (5.4%) > 4.6% ( -5% - 15%) > Wildcard 42.39 (6.1%) 44.35 (7.8%) > 4.6% ( -8% - 19%) > IntNRQ 10.46 (7.0%) 10.99 (9.5%) > 5.0% ( -10% - 23%) > OrNotHighHigh 67.15 (4.6%) 70.65 (4.5%) > 5.2% ( -3% - 15%) > OrHighNotHigh 43.07 (5.1%) 45.36 (5.4%) > 5.3% ( -4% - 16%) > OrHighLow 64.19 (6.4%) 67.72 (5.5%) > 5.5% ( -6% - 18%) > AndHighHigh 64.17 (2.3%) 67.87 (2.1%) > 5.8% ( 1% - 10%) > LowTerm 642.94 (10.9%) 681.48 (8.5%) > 6.0% ( -12% - 28%) > OrHighNotMed 12.68 (6.9%) 13.51 (6.6%) > 6.5% ( -6% - 21%) > OrHighNotLow 54.69 (6.8%) 58.25 (7.0%) > 6.5% ( -6% - 21%) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org