[ https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836467#comment-16836467 ]
Luca Cavanna edited comment on LUCENE-8796 at 5/9/19 3:22 PM: -------------------------------------------------------------- I have updated the PR after applying Yonik's suggestion and re-run benchmarks a few times. The run with the least noise had these results (note that I disabled the bitset optimization on both sides): {noformat} Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff HighTerm 1575.07 (5.9%) 1541.27 (6.9%) -2.1% ( -14% - 11%) MedTerm 1363.22 (6.5%) 1337.03 (7.0%) -1.9% ( -14% - 12%) LowTerm 1441.86 (4.2%) 1420.77 (5.2%) -1.5% ( -10% - 8%) IntNRQConjMedTerm 280.55 (4.0%) 277.64 (4.1%) -1.0% ( -8% - 7%) MedPhrase 153.84 (3.5%) 152.44 (3.3%) -0.9% ( -7% - 6%) Prefix3 224.92 (4.0%) 223.13 (3.7%) -0.8% ( -8% - 7%) HighSloppyPhrase 19.70 (3.7%) 19.56 (4.5%) -0.7% ( -8% - 7%) MedSloppyPhrase 18.23 (4.3%) 18.11 (4.7%) -0.7% ( -9% - 8%) OrNotHighMed 586.33 (3.4%) 582.47 (4.9%) -0.7% ( -8% - 7%) LowSloppyPhrase 18.56 (3.6%) 18.46 (3.9%) -0.5% ( -7% - 7%) HighPhrase 22.64 (2.7%) 22.54 (3.0%) -0.4% ( -6% - 5%) LowPhrase 144.10 (3.8%) 143.55 (3.3%) -0.4% ( -7% - 6%) AndHighLow 539.26 (3.7%) 537.25 (3.2%) -0.4% ( -7% - 6%) PKLookup 132.96 (3.0%) 132.48 (4.6%) -0.4% ( -7% - 7%) OrHighMed 115.79 (2.7%) 115.49 (3.5%) -0.3% ( -6% - 6%) PrefixConjHighTerm 36.98 (2.8%) 36.93 (3.4%) -0.1% ( -6% - 6%) WildcardConjHighTerm 45.79 (3.0%) 45.73 (3.1%) -0.1% ( -6% - 6%) OrHighLow 448.91 (3.7%) 448.70 (6.3%) -0.0% ( -9% - 10%) Wildcard 78.89 (3.2%) 78.95 (3.6%) 0.1% ( -6% - 7%) IntNRQConjHighTerm 78.35 (2.3%) 78.48 (2.4%) 0.2% ( -4% - 4%) IntNRQ 100.56 (2.7%) 100.84 (2.8%) 0.3% ( -5% - 5%) OrHighNotLow 732.45 (2.8%) 734.56 (5.3%) 0.3% ( -7% - 8%) OrHighNotHigh 544.87 (2.8%) 546.47 (4.6%) 0.3% ( -6% - 7%) IntNRQConjLowTerm 249.20 (4.2%) 249.99 (3.8%) 0.3% ( -7% - 8%) Respell 73.05 (3.1%) 73.28 (3.4%) 0.3% ( -6% - 7%) OrHighHigh 35.56 (3.0%) 35.68 (4.2%) 0.3% ( -6% - 7%) OrNotHighLow 695.41 (4.8%) 697.88 (6.5%) 0.4% ( -10% - 12%) MedSpanNear 59.99 (3.8%) 60.30 (4.0%) 0.5% ( -7% - 8%) AndHighMed 190.02 (3.1%) 191.04 (3.6%) 0.5% ( -5% - 7%) LowSpanNear 12.73 (3.9%) 12.81 (4.2%) 0.6% ( -7% - 8%) HighTermDayOfYearSort 88.42 (7.0%) 89.09 (7.1%) 0.8% ( -12% - 15%) PrefixConjLowTerm 54.95 (3.7%) 55.43 (3.8%) 0.9% ( -6% - 8%) OrHighNotMed 628.44 (3.4%) 634.02 (6.1%) 0.9% ( -8% - 10%) HighSpanNear 28.86 (3.2%) 29.11 (3.5%) 0.9% ( -5% - 7%) WildcardConjMedTerm 72.48 (3.4%) 73.19 (4.8%) 1.0% ( -7% - 9%) Fuzzy2 49.17 (9.9%) 49.68 (11.7%) 1.0% ( -18% - 25%) AndHighHigh 63.44 (3.8%) 64.11 (3.8%) 1.1% ( -6% - 9%) Fuzzy1 79.43 (9.9%) 80.55 (9.7%) 1.4% ( -16% - 23%) OrNotHighHigh 574.89 (3.6%) 584.43 (5.5%) 1.7% ( -7% - 11%) PrefixConjMedTerm 79.00 (3.2%) 80.50 (3.6%) 1.9% ( -4% - 8%) WildcardConjLowTerm 90.67 (2.9%) 92.49 (3.7%) 2.0% ( -4% - 8%) HighTermMonthSort 86.13 (11.8%) 88.79 (12.4%) 3.1% ( -18% - 30%) {noformat} I also ran benchmarks with the bitset optimization in place on both ends: {noformat} Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff IntNRQ 63.46 (24.6%) 62.28 (24.2%) -1.9% ( -40% - 62%) OrNotHighMed 596.89 (3.5%) 589.18 (4.6%) -1.3% ( -9% - 7%) OrNotHighHigh 769.65 (3.1%) 760.89 (2.9%) -1.1% ( -6% - 4%) Fuzzy1 76.45 (6.8%) 75.62 (7.4%) -1.1% ( -14% - 14%) OrHighNotHigh 626.48 (3.4%) 619.67 (2.6%) -1.1% ( -6% - 5%) MedTerm 1345.24 (3.6%) 1332.07 (3.7%) -1.0% ( -7% - 6%) PKLookup 136.70 (4.0%) 135.57 (3.9%) -0.8% ( -8% - 7%) HighTerm 1103.39 (3.5%) 1095.23 (3.7%) -0.7% ( -7% - 6%) OrNotHighLow 780.85 (3.8%) 775.40 (2.7%) -0.7% ( -6% - 5%) OrHighNotLow 815.32 (4.7%) 810.72 (5.7%) -0.6% ( -10% - 10%) Prefix3 310.00 (4.9%) 308.31 (3.7%) -0.5% ( -8% - 8%) LowTerm 1462.30 (3.7%) 1455.27 (4.4%) -0.5% ( -8% - 7%) OrHighLow 446.56 (4.6%) 445.11 (3.2%) -0.3% ( -7% - 7%) AndHighLow 594.90 (2.9%) 593.39 (3.4%) -0.3% ( -6% - 6%) Respell 64.46 (2.5%) 64.36 (2.6%) -0.2% ( -5% - 5%) OrHighNotMed 685.98 (4.5%) 685.69 (3.7%) -0.0% ( -7% - 8%) OrHighMed 67.90 (5.1%) 67.91 (3.4%) 0.0% ( -8% - 8%) Fuzzy2 50.18 (4.5%) 50.21 (5.8%) 0.1% ( -9% - 10%) LowSpanNear 59.27 (3.9%) 59.34 (4.0%) 0.1% ( -7% - 8%) OrHighHigh 30.89 (5.2%) 30.94 (3.3%) 0.2% ( -7% - 9%) LowPhrase 114.67 (3.1%) 114.87 (2.5%) 0.2% ( -5% - 5%) HighPhrase 22.34 (2.7%) 22.42 (2.2%) 0.4% ( -4% - 5%) AndHighHigh 59.53 (3.8%) 59.89 (4.4%) 0.6% ( -7% - 9%) MedPhrase 29.99 (2.9%) 30.19 (2.3%) 0.7% ( -4% - 6%) MedSloppyPhrase 71.57 (3.1%) 72.10 (3.0%) 0.7% ( -5% - 7%) IntNRQConjHighTerm 113.74 (7.3%) 114.66 (7.1%) 0.8% ( -12% - 16%) LowSloppyPhrase 14.18 (3.4%) 14.30 (2.6%) 0.8% ( -4% - 6%) PrefixConjLowTerm 89.05 (4.6%) 89.80 (5.1%) 0.8% ( -8% - 11%) AndHighMed 166.34 (3.1%) 167.76 (3.8%) 0.9% ( -5% - 7%) WildcardConjMedTerm 51.44 (2.6%) 51.88 (3.0%) 0.9% ( -4% - 6%) PrefixConjMedTerm 68.16 (4.8%) 68.80 (4.6%) 0.9% ( -8% - 10%) PrefixConjHighTerm 42.34 (6.1%) 42.81 (5.0%) 1.1% ( -9% - 13%) MedSpanNear 15.57 (5.5%) 15.74 (5.4%) 1.1% ( -9% - 12%) WildcardConjLowTerm 51.56 (3.7%) 52.15 (4.2%) 1.1% ( -6% - 9%) HighSpanNear 5.66 (5.8%) 5.73 (5.9%) 1.2% ( -9% - 13%) IntNRQConjLowTerm 120.28 (8.5%) 121.67 (8.8%) 1.2% ( -14% - 20%) WildcardConjHighTerm 55.43 (3.2%) 56.10 (3.4%) 1.2% ( -5% - 8%) IntNRQConjMedTerm 97.79 (8.3%) 98.98 (8.6%) 1.2% ( -14% - 19%) Wildcard 106.37 (2.9%) 107.75 (3.6%) 1.3% ( -5% - 7%) HighSloppyPhrase 18.21 (4.9%) 18.48 (4.4%) 1.5% ( -7% - 11%) HighTermMonthSort 146.10 (11.0%) 148.89 (10.5%) 1.9% ( -17% - 26%) HighTermDayOfYearSort 68.62 (6.1%) 70.08 (3.9%) 2.1% ( -7% - 12%) {noformat} I will next have a look at what Atri is suggesting. was (Author: lucacavanna): I have updated the PR after applying Yonik's suggestion and re-run benchmarks a few times. The run with the least noise had these results (note that I disabled the bitset optimization on both sides): {noformat} Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff HighTerm 1575.07 (5.9%) 1541.27 (6.9%) -2.1% ( -14% - 11%) MedTerm 1363.22 (6.5%) 1337.03 (7.0%) -1.9% ( -14% - 12%) LowTerm 1441.86 (4.2%) 1420.77 (5.2%) -1.5% ( -10% - 8%) IntNRQConjMedTerm 280.55 (4.0%) 277.64 (4.1%) -1.0% ( -8% - 7%) MedPhrase 153.84 (3.5%) 152.44 (3.3%) -0.9% ( -7% - 6%) Prefix3 224.92 (4.0%) 223.13 (3.7%) -0.8% ( -8% - 7%) HighSloppyPhrase 19.70 (3.7%) 19.56 (4.5%) -0.7% ( -8% - 7%) MedSloppyPhrase 18.23 (4.3%) 18.11 (4.7%) -0.7% ( -9% - 8%) OrNotHighMed 586.33 (3.4%) 582.47 (4.9%) -0.7% ( -8% - 7%) LowSloppyPhrase 18.56 (3.6%) 18.46 (3.9%) -0.5% ( -7% - 7%) HighPhrase 22.64 (2.7%) 22.54 (3.0%) -0.4% ( -6% - 5%) LowPhrase 144.10 (3.8%) 143.55 (3.3%) -0.4% ( -7% - 6%) AndHighLow 539.26 (3.7%) 537.25 (3.2%) -0.4% ( -7% - 6%) PKLookup 132.96 (3.0%) 132.48 (4.6%) -0.4% ( -7% - 7%) OrHighMed 115.79 (2.7%) 115.49 (3.5%) -0.3% ( -6% - 6%) PrefixConjHighTerm 36.98 (2.8%) 36.93 (3.4%) -0.1% ( -6% - 6%) WildcardConjHighTerm 45.79 (3.0%) 45.73 (3.1%) -0.1% ( -6% - 6%) OrHighLow 448.91 (3.7%) 448.70 (6.3%) -0.0% ( -9% - 10%) Wildcard 78.89 (3.2%) 78.95 (3.6%) 0.1% ( -6% - 7%) IntNRQConjHighTerm 78.35 (2.3%) 78.48 (2.4%) 0.2% ( -4% - 4%) IntNRQ 100.56 (2.7%) 100.84 (2.8%) 0.3% ( -5% - 5%) OrHighNotLow 732.45 (2.8%) 734.56 (5.3%) 0.3% ( -7% - 8%) OrHighNotHigh 544.87 (2.8%) 546.47 (4.6%) 0.3% ( -6% - 7%) IntNRQConjLowTerm 249.20 (4.2%) 249.99 (3.8%) 0.3% ( -7% - 8%) Respell 73.05 (3.1%) 73.28 (3.4%) 0.3% ( -6% - 7%) OrHighHigh 35.56 (3.0%) 35.68 (4.2%) 0.3% ( -6% - 7%) OrNotHighLow 695.41 (4.8%) 697.88 (6.5%) 0.4% ( -10% - 12%) MedSpanNear 59.99 (3.8%) 60.30 (4.0%) 0.5% ( -7% - 8%) AndHighMed 190.02 (3.1%) 191.04 (3.6%) 0.5% ( -5% - 7%) LowSpanNear 12.73 (3.9%) 12.81 (4.2%) 0.6% ( -7% - 8%) HighTermDayOfYearSort 88.42 (7.0%) 89.09 (7.1%) 0.8% ( -12% - 15%) PrefixConjLowTerm 54.95 (3.7%) 55.43 (3.8%) 0.9% ( -6% - 8%) OrHighNotMed 628.44 (3.4%) 634.02 (6.1%) 0.9% ( -8% - 10%) HighSpanNear 28.86 (3.2%) 29.11 (3.5%) 0.9% ( -5% - 7%) WildcardConjMedTerm 72.48 (3.4%) 73.19 (4.8%) 1.0% ( -7% - 9%) Fuzzy2 49.17 (9.9%) 49.68 (11.7%) 1.0% ( -18% - 25%) AndHighHigh 63.44 (3.8%) 64.11 (3.8%) 1.1% ( -6% - 9%) Fuzzy1 79.43 (9.9%) 80.55 (9.7%) 1.4% ( -16% - 23%) OrNotHighHigh 574.89 (3.6%) 584.43 (5.5%) 1.7% ( -7% - 11%) PrefixConjMedTerm 79.00 (3.2%) 80.50 (3.6%) 1.9% ( -4% - 8%) WildcardConjLowTerm 90.67 (2.9%) 92.49 (3.7%) 2.0% ( -4% - 8%) HighTermMonthSort 86.13 (11.8%) 88.79 (12.4%) 3.1% ( -18% - 30%) {noformat} I also ran benchmarks with the bitset optimization in place on both ends: {{{noformat}}} Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff IntNRQ 63.46 (24.6%) 62.28 (24.2%) -1.9% ( -40% - 62%) OrNotHighMed 596.89 (3.5%) 589.18 (4.6%) -1.3% ( -9% - 7%) OrNotHighHigh 769.65 (3.1%) 760.89 (2.9%) -1.1% ( -6% - 4%) Fuzzy1 76.45 (6.8%) 75.62 (7.4%) -1.1% ( -14% - 14%) OrHighNotHigh 626.48 (3.4%) 619.67 (2.6%) -1.1% ( -6% - 5%) MedTerm 1345.24 (3.6%) 1332.07 (3.7%) -1.0% ( -7% - 6%) PKLookup 136.70 (4.0%) 135.57 (3.9%) -0.8% ( -8% - 7%) HighTerm 1103.39 (3.5%) 1095.23 (3.7%) -0.7% ( -7% - 6%) OrNotHighLow 780.85 (3.8%) 775.40 (2.7%) -0.7% ( -6% - 5%) OrHighNotLow 815.32 (4.7%) 810.72 (5.7%) -0.6% ( -10% - 10%) Prefix3 310.00 (4.9%) 308.31 (3.7%) -0.5% ( -8% - 8%) LowTerm 1462.30 (3.7%) 1455.27 (4.4%) -0.5% ( -8% - 7%) OrHighLow 446.56 (4.6%) 445.11 (3.2%) -0.3% ( -7% - 7%) AndHighLow 594.90 (2.9%) 593.39 (3.4%) -0.3% ( -6% - 6%) Respell 64.46 (2.5%) 64.36 (2.6%) -0.2% ( -5% - 5%) OrHighNotMed 685.98 (4.5%) 685.69 (3.7%) -0.0% ( -7% - 8%) OrHighMed 67.90 (5.1%) 67.91 (3.4%) 0.0% ( -8% - 8%) Fuzzy2 50.18 (4.5%) 50.21 (5.8%) 0.1% ( -9% - 10%) LowSpanNear 59.27 (3.9%) 59.34 (4.0%) 0.1% ( -7% - 8%) OrHighHigh 30.89 (5.2%) 30.94 (3.3%) 0.2% ( -7% - 9%) LowPhrase 114.67 (3.1%) 114.87 (2.5%) 0.2% ( -5% - 5%) HighPhrase 22.34 (2.7%) 22.42 (2.2%) 0.4% ( -4% - 5%) AndHighHigh 59.53 (3.8%) 59.89 (4.4%) 0.6% ( -7% - 9%) MedPhrase 29.99 (2.9%) 30.19 (2.3%) 0.7% ( -4% - 6%) MedSloppyPhrase 71.57 (3.1%) 72.10 (3.0%) 0.7% ( -5% - 7%) IntNRQConjHighTerm 113.74 (7.3%) 114.66 (7.1%) 0.8% ( -12% - 16%) LowSloppyPhrase 14.18 (3.4%) 14.30 (2.6%) 0.8% ( -4% - 6%) PrefixConjLowTerm 89.05 (4.6%) 89.80 (5.1%) 0.8% ( -8% - 11%) AndHighMed 166.34 (3.1%) 167.76 (3.8%) 0.9% ( -5% - 7%) WildcardConjMedTerm 51.44 (2.6%) 51.88 (3.0%) 0.9% ( -4% - 6%) PrefixConjMedTerm 68.16 (4.8%) 68.80 (4.6%) 0.9% ( -8% - 10%) PrefixConjHighTerm 42.34 (6.1%) 42.81 (5.0%) 1.1% ( -9% - 13%) MedSpanNear 15.57 (5.5%) 15.74 (5.4%) 1.1% ( -9% - 12%) WildcardConjLowTerm 51.56 (3.7%) 52.15 (4.2%) 1.1% ( -6% - 9%) HighSpanNear 5.66 (5.8%) 5.73 (5.9%) 1.2% ( -9% - 13%) IntNRQConjLowTerm 120.28 (8.5%) 121.67 (8.8%) 1.2% ( -14% - 20%) WildcardConjHighTerm 55.43 (3.2%) 56.10 (3.4%) 1.2% ( -5% - 8%) IntNRQConjMedTerm 97.79 (8.3%) 98.98 (8.6%) 1.2% ( -14% - 19%) Wildcard 106.37 (2.9%) 107.75 (3.6%) 1.3% ( -5% - 7%) HighSloppyPhrase 18.21 (4.9%) 18.48 (4.4%) 1.5% ( -7% - 11%) HighTermMonthSort 146.10 (11.0%) 148.89 (10.5%) 1.9% ( -17% - 26%) HighTermDayOfYearSort 68.62 (6.1%) 70.08 (3.9%) 2.1% ( -7% - 12%) {{{noformat}}} I will next have a look at what Atri is suggesting. > Use exponential search in IntArrayDocIdSet advance method > --------------------------------------------------------- > > Key: LUCENE-8796 > URL: https://issues.apache.org/jira/browse/LUCENE-8796 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Luca Cavanna > Priority: Minor > > Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making > its advance method use exponential search instead of binary search. This > should help performance of queries including conjunctions: given that > ConjunctionDISI uses leap frog, it advances through doc ids in small steps, > hence exponential search should be faster when advancing on average compared > to binary search. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org