[jira] [Comment Edited] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-31 Thread Luca Cavanna (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852876#comment-16852876
 ] 

Luca Cavanna edited comment on LUCENE-8796 at 5/31/19 10:08 AM:


I updated the PR and addressed all the comments, here are the latest benchmark 
results (with bitset optimization disabled on both ends):
{noformat}
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
 MedTerm 1510.74  (6.8%) 1457.20  (8.4%)   
-3.5% ( -17% -   12%)
  Fuzzy1   70.49  (8.5%)   68.11  (9.8%)   
-3.4% ( -19% -   16%)
OrHighNotMed  650.57  (5.8%)  629.81  (6.0%)   
-3.2% ( -14% -9%)
   OrHighLow  447.13  (4.2%)  433.05  (4.5%)   
-3.2% ( -11% -5%)
OrNotHighMed  623.22  (6.3%)  605.19  (6.1%)   
-2.9% ( -14% -   10%)
OrHighNotLow  720.89  (7.0%)  701.26  (7.9%)   
-2.7% ( -16% -   13%)
   OrNotHighHigh  558.43  (6.3%)  544.82  (4.9%)   
-2.4% ( -12% -9%)
 LowTerm 1279.34  (4.9%) 1248.60  (5.2%)   
-2.4% ( -11% -8%)
  AndHighLow  690.75  (4.0%)  675.22  (5.3%)   
-2.2% ( -11% -7%)
   LowPhrase  358.90  (2.3%)  351.28  (4.0%)   
-2.1% (  -8% -4%)
PKLookup  139.97  (3.0%)  137.32  (3.5%)   
-1.9% (  -8% -4%)
OrNotHighLow  728.48  (6.8%)  714.79  (6.5%)   
-1.9% ( -14% -   12%)
HighTerm 1222.38  (6.3%) 1199.77  (7.1%)   
-1.8% ( -14% -   12%)
 AndHighHigh   58.93  (6.2%)   58.01  (5.8%)   
-1.6% ( -12% -   11%)
 Prefix3  152.21  (4.5%)  150.00  (5.0%)   
-1.5% ( -10% -8%)
   IntNRQConjMedTerm   79.15 (10.7%)   78.06 (10.5%)   
-1.4% ( -20% -   22%)
   HighTermDayOfYearSort   95.28  (5.1%)   94.10  (7.8%)   
-1.2% ( -13% -   12%)
Wildcard   64.23  (2.3%)   63.45  (2.3%)   
-1.2% (  -5% -3%)
 MedSpanNear   81.15  (2.2%)   80.19  (2.8%)   
-1.2% (  -6% -3%)
HighSpanNear   10.20  (3.9%)   10.08  (4.2%)   
-1.2% (  -8% -7%)
HighIntervalsOrdered4.07  (1.8%)4.03  (2.2%)   
-1.1% (  -4% -2%)
 LowSpanNear   41.62  (3.1%)   41.20  (3.6%)   
-1.0% (  -7% -5%)
   IntNRQConjLowTerm   20.36  (4.1%)   20.15  (4.5%)   
-1.0% (  -9% -7%)
  IntNRQConjHighTerm   64.84  (9.6%)   64.21  (9.4%)   
-1.0% ( -18% -   19%)
  AndHighMed  229.08  (2.8%)  227.00  (2.5%)   
-0.9% (  -6% -4%)
   MedPhrase   18.73  (1.5%)   18.57  (2.3%)   
-0.8% (  -4% -2%)
 LowSloppyPhrase  124.52  (2.3%)  123.48  (2.6%)   
-0.8% (  -5% -4%)
 Respell   69.26  (3.0%)   68.68  (2.9%)   
-0.8% (  -6% -5%)
  HighPhrase   12.98  (1.6%)   12.88  (2.2%)   
-0.7% (  -4% -3%)
   PrefixConjLowTerm   42.11  (2.6%)   41.81  (3.0%)   
-0.7% (  -6% -5%)
   OrHighNotHigh  680.34  (6.1%)  676.16  (7.6%)   
-0.6% ( -13% -   13%)
 MedSloppyPhrase   34.06  (4.9%)   33.89  (4.5%)   
-0.5% (  -9% -9%)
  IntNRQ   89.97 (12.4%)   89.62 (12.0%)   
-0.4% ( -22% -   27%)
HighSloppyPhrase8.28  (4.0%)8.25  (3.9%)   
-0.3% (  -7% -7%)
 WildcardConjLowTerm   36.35  (2.7%)   36.26  (2.7%)   
-0.3% (  -5% -5%)
  OrHighHigh   27.89  (2.6%)   27.85  (3.1%)   
-0.1% (  -5% -5%)
  Fuzzy2   44.19  (3.8%)   44.17  (3.1%)   
-0.1% (  -6% -7%)
   OrHighMed   90.42  (2.8%)   90.57  (2.8%)
0.2% (  -5% -6%)
   PrefixConjMedTerm   45.56  (2.8%)   45.79  (2.9%)
0.5% (  -5% -6%)
WildcardConjHighTerm   33.08  (2.6%)   33.47  (3.0%)
1.2% (  -4% -6%)
  PrefixConjHighTerm   83.65  (2.6%)   86.23  (3.7%)
3.1% (  -3% -9%)
   HighTermMonthSort  130.35 (15.8%)  135.08 (12.1%)
3.6% ( -20% -   37%)
 WildcardConjMedTerm   99.19  (3.6%)  103.37  (4.1%)
4.2% (  -3% -   12%)
{noformat}


was (Author: lucacavanna):
I updated the PR and addressed all the comments, here are the latest benchmark 
results:

{noformat}
Report after iter 19:
TaskQPS baseline  StdDevQPS 

[jira] [Comment Edited] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-09 Thread Luca Cavanna (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836467#comment-16836467
 ] 

Luca Cavanna edited comment on LUCENE-8796 at 5/9/19 3:21 PM:
--

I have updated the PR after applying Yonik's suggestion and re-run benchmarks a 
few times. The run with the least noise had these results (note that I disabled 
the bitset optimization on both sides):
{noformat}
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
HighTerm 1575.07  (5.9%) 1541.27  (6.9%)   
-2.1% ( -14% -   11%)
 MedTerm 1363.22  (6.5%) 1337.03  (7.0%)   
-1.9% ( -14% -   12%)
 LowTerm 1441.86  (4.2%) 1420.77  (5.2%)   
-1.5% ( -10% -8%)
   IntNRQConjMedTerm  280.55  (4.0%)  277.64  (4.1%)   
-1.0% (  -8% -7%)
   MedPhrase  153.84  (3.5%)  152.44  (3.3%)   
-0.9% (  -7% -6%)
 Prefix3  224.92  (4.0%)  223.13  (3.7%)   
-0.8% (  -8% -7%)
HighSloppyPhrase   19.70  (3.7%)   19.56  (4.5%)   
-0.7% (  -8% -7%)
 MedSloppyPhrase   18.23  (4.3%)   18.11  (4.7%)   
-0.7% (  -9% -8%)
OrNotHighMed  586.33  (3.4%)  582.47  (4.9%)   
-0.7% (  -8% -7%)
 LowSloppyPhrase   18.56  (3.6%)   18.46  (3.9%)   
-0.5% (  -7% -7%)
  HighPhrase   22.64  (2.7%)   22.54  (3.0%)   
-0.4% (  -6% -5%)
   LowPhrase  144.10  (3.8%)  143.55  (3.3%)   
-0.4% (  -7% -6%)
  AndHighLow  539.26  (3.7%)  537.25  (3.2%)   
-0.4% (  -7% -6%)
PKLookup  132.96  (3.0%)  132.48  (4.6%)   
-0.4% (  -7% -7%)
   OrHighMed  115.79  (2.7%)  115.49  (3.5%)   
-0.3% (  -6% -6%)
  PrefixConjHighTerm   36.98  (2.8%)   36.93  (3.4%)   
-0.1% (  -6% -6%)
WildcardConjHighTerm   45.79  (3.0%)   45.73  (3.1%)   
-0.1% (  -6% -6%)
   OrHighLow  448.91  (3.7%)  448.70  (6.3%)   
-0.0% (  -9% -   10%)
Wildcard   78.89  (3.2%)   78.95  (3.6%)
0.1% (  -6% -7%)
  IntNRQConjHighTerm   78.35  (2.3%)   78.48  (2.4%)
0.2% (  -4% -4%)
  IntNRQ  100.56  (2.7%)  100.84  (2.8%)
0.3% (  -5% -5%)
OrHighNotLow  732.45  (2.8%)  734.56  (5.3%)
0.3% (  -7% -8%)
   OrHighNotHigh  544.87  (2.8%)  546.47  (4.6%)
0.3% (  -6% -7%)
   IntNRQConjLowTerm  249.20  (4.2%)  249.99  (3.8%)
0.3% (  -7% -8%)
 Respell   73.05  (3.1%)   73.28  (3.4%)
0.3% (  -6% -7%)
  OrHighHigh   35.56  (3.0%)   35.68  (4.2%)
0.3% (  -6% -7%)
OrNotHighLow  695.41  (4.8%)  697.88  (6.5%)
0.4% ( -10% -   12%)
 MedSpanNear   59.99  (3.8%)   60.30  (4.0%)
0.5% (  -7% -8%)
  AndHighMed  190.02  (3.1%)  191.04  (3.6%)
0.5% (  -5% -7%)
 LowSpanNear   12.73  (3.9%)   12.81  (4.2%)
0.6% (  -7% -8%)
   HighTermDayOfYearSort   88.42  (7.0%)   89.09  (7.1%)
0.8% ( -12% -   15%)
   PrefixConjLowTerm   54.95  (3.7%)   55.43  (3.8%)
0.9% (  -6% -8%)
OrHighNotMed  628.44  (3.4%)  634.02  (6.1%)
0.9% (  -8% -   10%)
HighSpanNear   28.86  (3.2%)   29.11  (3.5%)
0.9% (  -5% -7%)
 WildcardConjMedTerm   72.48  (3.4%)   73.19  (4.8%)
1.0% (  -7% -9%)
  Fuzzy2   49.17  (9.9%)   49.68 (11.7%)
1.0% ( -18% -   25%)
 AndHighHigh   63.44  (3.8%)   64.11  (3.8%)
1.1% (  -6% -9%)
  Fuzzy1   79.43  (9.9%)   80.55  (9.7%)
1.4% ( -16% -   23%)
   OrNotHighHigh  574.89  (3.6%)  584.43  (5.5%)
1.7% (  -7% -   11%)
   PrefixConjMedTerm   79.00  (3.2%)   80.50  (3.6%)
1.9% (  -4% -8%)
 WildcardConjLowTerm   90.67  (2.9%)   92.49  (3.7%)
2.0% (  -4% -8%)
   HighTermMonthSort   86.13 (11.8%)   88.79 (12.4%)
3.1% ( -18% -   30%)
{noformat}
I also ran benchmarks with the bitset optimization in place on both ends:

{{{noformat}}}
 Report after iter 19:
 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
 IntNRQ 63.46 (24.6%) 62.28 (24.2%) -1.9% ( -40% - 62%)
 

[jira] [Comment Edited] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-09 Thread Luca Cavanna (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835542#comment-16835542
 ] 

Luca Cavanna edited comment on LUCENE-8796 at 5/9/19 3:22 PM:
--

I have made the change and played with luceneutil to run some benchmark. I 
opened a PR here: [https://github.com/apache/lucene-solr/pull/667] .

Luceneutil does not currently benchmark the queries that should be affected by 
this change, hence I added benchmarks for numeric range queries, prefix queries 
and wildcard queries in conjunction with term queries (low, medium and high 
frequency). See the changes I made to my luceneutil fork: 
[https://github.com/mikemccand/luceneutil/compare/master...javanna:conjunctions]
 .  Also, for the benchmarks I temporarily modified DocIdSetBuilder#grow to 
never call upgradeToBitSet (on both baseline and modified version), so that the 
updated code is exercised as much as possible during the benchmarks run, 
otherwise in many cases we would use bitsets instead and the changed code would 
not be exercised at all.

I ran the wikimedium10m benchmarks a few times, here is probably the run with 
the least noise, results show a little improvement for some queries, and no 
regressions in general:
  

 
{noformat}
 Report after iter 19:
 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
 WildcardConjMedTerm 75.49 (2.2%) 72.79 (2.0%) -3.6% ( -7% - 0%)
 OrHighNotMed 607.01 (5.7%) 593.10 (4.4%) -2.3% ( -11% - 8%)
 WildcardConjHighTerm 64.00 (1.7%) 62.55 (1.4%) -2.3% ( -5% - 0%)
 Fuzzy2 20.14 (3.4%) 19.72 (4.6%) -2.1% ( -9% - 6%)
 HighTerm 1174.41 (4.7%) 1150.11 (4.2%) -2.1% ( -10% - 7%)
 OrHighLow 483.40 (5.1%) 473.69 (6.9%) -2.0% ( -13% - 10%)
 OrNotHighLow 526.75 (3.6%) 516.47 (3.6%) -2.0% ( -8% - 5%)
 OrNotHighHigh 600.38 (4.9%) 590.21 (3.7%) -1.7% ( -9% - 7%)
 HighTermMonthSort 110.05 (11.7%) 108.58 (11.5%) -1.3% ( -21% - 24%)
 OrHighMed 107.83 (2.6%) 106.48 (4.7%) -1.3% ( -8% - 6%)
 PrefixConjMedTerm 56.98 (2.5%) 56.33 (1.7%) -1.1% ( -5% - 3%)
 AndHighLow 432.27 (3.6%) 427.46 (3.2%) -1.1% ( -7% - 5%)
 PrefixConjLowTerm 44.43 (2.8%) 43.98 (1.8%) -1.0% ( -5% - 3%)
 MedTerm 1409.97 (5.5%) 1396.33 (4.9%) -1.0% ( -10% - 9%)
 HighSloppyPhrase 11.98 (4.3%) 11.87 (5.1%) -0.9% ( -9% - 8%)
 OrNotHighMed 614.19 (4.6%) 608.74 (3.8%) -0.9% ( -8% - 7%)
 Respell 58.11 (2.4%) 57.61 (2.4%) -0.9% ( -5% - 3%)
 LowTerm 1342.33 (4.8%) 1330.86 (4.0%) -0.9% ( -9% - 8%)
 PrefixConjHighTerm 68.50 (2.9%) 67.93 (1.8%) -0.8% ( -5% - 3%)
 OrHighNotHigh 566.30 (5.2%) 561.88 (4.5%) -0.8% ( -9% - 9%)
 WildcardConjLowTerm 32.75 (2.5%) 32.56 (2.1%) -0.6% ( -5% - 4%)
 PKLookup 131.80 (2.4%) 131.28 (2.3%) -0.4% ( -5% - 4%)
 OrHighHigh 29.90 (3.4%) 29.79 (5.3%) -0.4% ( -8% - 8%)
 OrHighNotLow 497.65 (6.6%) 495.84 (5.2%) -0.4% ( -11% - 12%)
 AndHighMed 175.08 (3.5%) 174.58 (3.0%) -0.3% ( -6% - 6%)
 LowSpanNear 15.17 (1.8%) 15.13 (2.5%) -0.2% ( -4% - 4%)
 Fuzzy1 71.14 (5.9%) 70.97 (6.3%) -0.2% ( -11% - 12%)
 LowSloppyPhrase 35.23 (2.0%) 35.16 (2.6%) -0.2% ( -4% - 4%)
 LowPhrase 74.10 (1.7%) 73.98 (1.8%) -0.2% ( -3% - 3%)
 HighPhrase 34.18 (2.1%) 34.13 (2.0%) -0.1% ( -4% - 3%)
 Prefix3 45.33 (2.3%) 45.28 (2.1%) -0.1% ( -4% - 4%)
 MedPhrase 28.30 (2.1%) 28.27 (1.7%) -0.1% ( -3% - 3%)
 MedSloppyPhrase 6.80 (3.6%) 6.80 (3.2%) -0.0% ( -6% - 6%)
 AndHighHigh 53.79 (3.9%) 53.79 (4.0%) -0.0% ( -7% - 8%)
 MedSpanNear 61.78 (2.2%) 61.83 (1.7%) 0.1% ( -3% - 4%)
 Wildcard 37.83 (2.5%) 37.91 (1.7%) 0.2% ( -3% - 4%)
 IntNRQConjHighTerm 20.17 (3.8%) 20.24 (4.9%) 0.3% ( -8% - 9%)
 HighTermDayOfYearSort 53.55 (7.8%) 53.76 (7.3%) 0.4% ( -13% - 16%)
 HighSpanNear 5.39 (2.6%) 5.42 (2.6%) 0.5% ( -4% - 5%)
 IntNRQConjLowTerm 19.69 (4.3%) 19.86 (4.3%) 0.9% ( -7% - 9%)
 IntNRQConjMedTerm 15.93 (4.5%) 16.12 (5.4%) 1.2% ( -8% - 11%)
 IntNRQ 114.28 (10.3%) 116.41 (14.0%) 1.9% ( -20% - 29%)
 {noformat}
 


was (Author: lucacavanna):
I have made the change and played with luceneutil to run some benchmark. I 
opened a PR here: [https://github.com/apache/lucene-solr/pull/667] .

Luceneutil does not currently benchmark the queries that should be affected by 
this change, hence I added benchmarks for numeric range queries, prefix queries 
and wildcard queries in conjunction with term queries (low, medium and high 
frequency). See the changes I made to my luceneutil fork: 
[https://github.com/mikemccand/luceneutil/compare/master...javanna:conjunctions]
 .  Also, for the benchmarks I temporarily modified DocIdSetBuilder#grow to 
never call upgradeToBitSet (on both baseline and modified version), so that the 
updated code is exercised as much as possible during the benchmarks run, 
otherwise in many cases we would use bitsets instead and the changed code would 
not be exercised at all.

I ran the wikimedium10m benchmarks a few times, here is probably the run with 
the least noise, results show a little improvement for some queries, and no 
regressions in 

[jira] [Comment Edited] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-09 Thread Luca Cavanna (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836467#comment-16836467
 ] 

Luca Cavanna edited comment on LUCENE-8796 at 5/9/19 3:22 PM:
--

I have updated the PR after applying Yonik's suggestion and re-run benchmarks a 
few times. The run with the least noise had these results (note that I disabled 
the bitset optimization on both sides):
{noformat}
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
HighTerm 1575.07  (5.9%) 1541.27  (6.9%)   
-2.1% ( -14% -   11%)
 MedTerm 1363.22  (6.5%) 1337.03  (7.0%)   
-1.9% ( -14% -   12%)
 LowTerm 1441.86  (4.2%) 1420.77  (5.2%)   
-1.5% ( -10% -8%)
   IntNRQConjMedTerm  280.55  (4.0%)  277.64  (4.1%)   
-1.0% (  -8% -7%)
   MedPhrase  153.84  (3.5%)  152.44  (3.3%)   
-0.9% (  -7% -6%)
 Prefix3  224.92  (4.0%)  223.13  (3.7%)   
-0.8% (  -8% -7%)
HighSloppyPhrase   19.70  (3.7%)   19.56  (4.5%)   
-0.7% (  -8% -7%)
 MedSloppyPhrase   18.23  (4.3%)   18.11  (4.7%)   
-0.7% (  -9% -8%)
OrNotHighMed  586.33  (3.4%)  582.47  (4.9%)   
-0.7% (  -8% -7%)
 LowSloppyPhrase   18.56  (3.6%)   18.46  (3.9%)   
-0.5% (  -7% -7%)
  HighPhrase   22.64  (2.7%)   22.54  (3.0%)   
-0.4% (  -6% -5%)
   LowPhrase  144.10  (3.8%)  143.55  (3.3%)   
-0.4% (  -7% -6%)
  AndHighLow  539.26  (3.7%)  537.25  (3.2%)   
-0.4% (  -7% -6%)
PKLookup  132.96  (3.0%)  132.48  (4.6%)   
-0.4% (  -7% -7%)
   OrHighMed  115.79  (2.7%)  115.49  (3.5%)   
-0.3% (  -6% -6%)
  PrefixConjHighTerm   36.98  (2.8%)   36.93  (3.4%)   
-0.1% (  -6% -6%)
WildcardConjHighTerm   45.79  (3.0%)   45.73  (3.1%)   
-0.1% (  -6% -6%)
   OrHighLow  448.91  (3.7%)  448.70  (6.3%)   
-0.0% (  -9% -   10%)
Wildcard   78.89  (3.2%)   78.95  (3.6%)
0.1% (  -6% -7%)
  IntNRQConjHighTerm   78.35  (2.3%)   78.48  (2.4%)
0.2% (  -4% -4%)
  IntNRQ  100.56  (2.7%)  100.84  (2.8%)
0.3% (  -5% -5%)
OrHighNotLow  732.45  (2.8%)  734.56  (5.3%)
0.3% (  -7% -8%)
   OrHighNotHigh  544.87  (2.8%)  546.47  (4.6%)
0.3% (  -6% -7%)
   IntNRQConjLowTerm  249.20  (4.2%)  249.99  (3.8%)
0.3% (  -7% -8%)
 Respell   73.05  (3.1%)   73.28  (3.4%)
0.3% (  -6% -7%)
  OrHighHigh   35.56  (3.0%)   35.68  (4.2%)
0.3% (  -6% -7%)
OrNotHighLow  695.41  (4.8%)  697.88  (6.5%)
0.4% ( -10% -   12%)
 MedSpanNear   59.99  (3.8%)   60.30  (4.0%)
0.5% (  -7% -8%)
  AndHighMed  190.02  (3.1%)  191.04  (3.6%)
0.5% (  -5% -7%)
 LowSpanNear   12.73  (3.9%)   12.81  (4.2%)
0.6% (  -7% -8%)
   HighTermDayOfYearSort   88.42  (7.0%)   89.09  (7.1%)
0.8% ( -12% -   15%)
   PrefixConjLowTerm   54.95  (3.7%)   55.43  (3.8%)
0.9% (  -6% -8%)
OrHighNotMed  628.44  (3.4%)  634.02  (6.1%)
0.9% (  -8% -   10%)
HighSpanNear   28.86  (3.2%)   29.11  (3.5%)
0.9% (  -5% -7%)
 WildcardConjMedTerm   72.48  (3.4%)   73.19  (4.8%)
1.0% (  -7% -9%)
  Fuzzy2   49.17  (9.9%)   49.68 (11.7%)
1.0% ( -18% -   25%)
 AndHighHigh   63.44  (3.8%)   64.11  (3.8%)
1.1% (  -6% -9%)
  Fuzzy1   79.43  (9.9%)   80.55  (9.7%)
1.4% ( -16% -   23%)
   OrNotHighHigh  574.89  (3.6%)  584.43  (5.5%)
1.7% (  -7% -   11%)
   PrefixConjMedTerm   79.00  (3.2%)   80.50  (3.6%)
1.9% (  -4% -8%)
 WildcardConjLowTerm   90.67  (2.9%)   92.49  (3.7%)
2.0% (  -4% -8%)
   HighTermMonthSort   86.13 (11.8%)   88.79 (12.4%)
3.1% ( -18% -   30%)
{noformat}
I also ran benchmarks with the bitset optimization in place on both ends:

{noformat}
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
  IntNRQ 

[jira] [Comment Edited] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

2019-05-09 Thread Luca Cavanna (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835542#comment-16835542
 ] 

Luca Cavanna edited comment on LUCENE-8796 at 5/9/19 3:21 PM:
--

I have made the change and played with luceneutil to run some benchmark. I 
opened a PR here: [https://github.com/apache/lucene-solr/pull/667] .

Luceneutil does not currently benchmark the queries that should be affected by 
this change, hence I added benchmarks for numeric range queries, prefix queries 
and wildcard queries in conjunction with term queries (low, medium and high 
frequency). See the changes I made to my luceneutil fork: 
[https://github.com/mikemccand/luceneutil/compare/master...javanna:conjunctions]
 .  Also, for the benchmarks I temporarily modified DocIdSetBuilder#grow to 
never call upgradeToBitSet (on both baseline and modified version), so that the 
updated code is exercised as much as possible during the benchmarks run, 
otherwise in many cases we would use bitsets instead and the changed code would 
not be exercised at all.

I ran the wikimedium10m benchmarks a few times, here is probably the run with 
the least noise, results show a little improvement for some queries, and no 
regressions in general:
  

{{{noformat}}}
 Report after iter 19:
 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
 WildcardConjMedTerm 75.49 (2.2%) 72.79 (2.0%) -3.6% ( -7% - 0%)
 OrHighNotMed 607.01 (5.7%) 593.10 (4.4%) -2.3% ( -11% - 8%)
 WildcardConjHighTerm 64.00 (1.7%) 62.55 (1.4%) -2.3% ( -5% - 0%)
 Fuzzy2 20.14 (3.4%) 19.72 (4.6%) -2.1% ( -9% - 6%)
 HighTerm 1174.41 (4.7%) 1150.11 (4.2%) -2.1% ( -10% - 7%)
 OrHighLow 483.40 (5.1%) 473.69 (6.9%) -2.0% ( -13% - 10%)
 OrNotHighLow 526.75 (3.6%) 516.47 (3.6%) -2.0% ( -8% - 5%)
 OrNotHighHigh 600.38 (4.9%) 590.21 (3.7%) -1.7% ( -9% - 7%)
 HighTermMonthSort 110.05 (11.7%) 108.58 (11.5%) -1.3% ( -21% - 24%)
 OrHighMed 107.83 (2.6%) 106.48 (4.7%) -1.3% ( -8% - 6%)
 PrefixConjMedTerm 56.98 (2.5%) 56.33 (1.7%) -1.1% ( -5% - 3%)
 AndHighLow 432.27 (3.6%) 427.46 (3.2%) -1.1% ( -7% - 5%)
 PrefixConjLowTerm 44.43 (2.8%) 43.98 (1.8%) -1.0% ( -5% - 3%)
 MedTerm 1409.97 (5.5%) 1396.33 (4.9%) -1.0% ( -10% - 9%)
 HighSloppyPhrase 11.98 (4.3%) 11.87 (5.1%) -0.9% ( -9% - 8%)
 OrNotHighMed 614.19 (4.6%) 608.74 (3.8%) -0.9% ( -8% - 7%)
 Respell 58.11 (2.4%) 57.61 (2.4%) -0.9% ( -5% - 3%)
 LowTerm 1342.33 (4.8%) 1330.86 (4.0%) -0.9% ( -9% - 8%)
 PrefixConjHighTerm 68.50 (2.9%) 67.93 (1.8%) -0.8% ( -5% - 3%)
 OrHighNotHigh 566.30 (5.2%) 561.88 (4.5%) -0.8% ( -9% - 9%)
 WildcardConjLowTerm 32.75 (2.5%) 32.56 (2.1%) -0.6% ( -5% - 4%)
 PKLookup 131.80 (2.4%) 131.28 (2.3%) -0.4% ( -5% - 4%)
 OrHighHigh 29.90 (3.4%) 29.79 (5.3%) -0.4% ( -8% - 8%)
 OrHighNotLow 497.65 (6.6%) 495.84 (5.2%) -0.4% ( -11% - 12%)
 AndHighMed 175.08 (3.5%) 174.58 (3.0%) -0.3% ( -6% - 6%)
 LowSpanNear 15.17 (1.8%) 15.13 (2.5%) -0.2% ( -4% - 4%)
 Fuzzy1 71.14 (5.9%) 70.97 (6.3%) -0.2% ( -11% - 12%)
 LowSloppyPhrase 35.23 (2.0%) 35.16 (2.6%) -0.2% ( -4% - 4%)
 LowPhrase 74.10 (1.7%) 73.98 (1.8%) -0.2% ( -3% - 3%)
 HighPhrase 34.18 (2.1%) 34.13 (2.0%) -0.1% ( -4% - 3%)
 Prefix3 45.33 (2.3%) 45.28 (2.1%) -0.1% ( -4% - 4%)
 MedPhrase 28.30 (2.1%) 28.27 (1.7%) -0.1% ( -3% - 3%)
 MedSloppyPhrase 6.80 (3.6%) 6.80 (3.2%) -0.0% ( -6% - 6%)
 AndHighHigh 53.79 (3.9%) 53.79 (4.0%) -0.0% ( -7% - 8%)
 MedSpanNear 61.78 (2.2%) 61.83 (1.7%) 0.1% ( -3% - 4%)
 Wildcard 37.83 (2.5%) 37.91 (1.7%) 0.2% ( -3% - 4%)
 IntNRQConjHighTerm 20.17 (3.8%) 20.24 (4.9%) 0.3% ( -8% - 9%)
 HighTermDayOfYearSort 53.55 (7.8%) 53.76 (7.3%) 0.4% ( -13% - 16%)
 HighSpanNear 5.39 (2.6%) 5.42 (2.6%) 0.5% ( -4% - 5%)
 IntNRQConjLowTerm 19.69 (4.3%) 19.86 (4.3%) 0.9% ( -7% - 9%)
 IntNRQConjMedTerm 15.93 (4.5%) 16.12 (5.4%) 1.2% ( -8% - 11%)
 IntNRQ 114.28 (10.3%) 116.41 (14.0%) 1.9% ( -20% - 29%)

 {{{noformat}}}

 


was (Author: lucacavanna):
I have made the change and played with luceneutil to run some benchmark. I 
opened a PR here: https://github.com/apache/lucene-solr/pull/667 .

Luceneutil does not currently benchmark the queries that should be affected by 
this change, hence I added benchmarks for numeric range queries, prefix queries 
and wildcard queries in conjunction with term queries (low, medium and high 
frequency). See the changes I made to my luceneutil fork: 
[https://github.com/mikemccand/luceneutil/compare/master...javanna:conjunctions]
 .  Also, for the benchmarks I temporarily modified DocIdSetBuilder#grow to 
never call upgradeToBitSet (on both baseline and modified version), so that the 
updated code is exercised as much as possible during the benchmarks run, 
otherwise in many cases we would use bitsets instead and the changed code would 
not be exercised at all.

I ran the wikimedium10m benchmarks a few times, here is probably the run with 
the least noise, results show a little improvement for some queries, and no