[
https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461367#comment-17461367
]
Feng Guo edited comment on LUCENE-10319 at 12/17/21, 10:49 AM:
---------------------------------------------------------------
Out of curiosity, I tried to run the luceneutil wikimedium1m for block size =
256, but got an error there:
{code:java}
WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+
WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+
WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+
WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+
WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+
WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+
Traceback (most recent call last):
File
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py",
line 60, in <module>
comp.benchmark("baseline_vs_patch")
File
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py",
line 494, in benchmark
searchBench.run(id, base, challenger,
File
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py",
line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None
sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+',
'query=body:he body:resulting filter=None sort=None groupField=None
hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official
filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs
7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+:
wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None
sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+',
'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong
hitCount: 2394+ vs 3325+'], 1.0)
{code}
I guess this error may be something about Impacts? So i changed the
{{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and
candidate and rerun the benchmark, everything looks good now and i got a
rather good report.
But notice that this report does *not* really make sense since we changed the
{{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right.
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
Fuzzy1 118.73 (11.5%) 114.82
(13.0%) -3.3% ( -24% - 23%) 0.407
LowTerm 2369.88 (9.2%) 2323.31
(5.7%) -2.0% ( -15% - 14%) 0.428
PKLookup 250.07 (5.0%) 245.42
(4.3%) -1.9% ( -10% - 7%) 0.214
Prefix3 306.43 (6.9%) 301.82
(7.0%) -1.5% ( -14% - 13%) 0.502
Wildcard 221.77 (5.2%) 218.64
(4.0%) -1.4% ( -10% - 8%) 0.348
HighTermMonthSort 1161.02 (12.7%) 1156.95
(11.1%) -0.4% ( -21% - 26%) 0.928
BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48
(1.1%) -0.1% ( -2% - 2%) 0.791
Fuzzy2 47.51 (8.9%) 47.57
(7.0%) 0.1% ( -14% - 17%) 0.961
Respell 200.51 (2.7%) 200.82
(1.4%) 0.2% ( -3% - 4%) 0.823
OrHighMed 197.90 (3.0%) 198.36
(3.6%) 0.2% ( -6% - 7%) 0.830
BrowseMonthSSDVFacets 152.24 (2.8%) 152.74
(1.0%) 0.3% ( -3% - 4%) 0.630
OrHighLow 245.11 (3.5%) 245.97
(3.1%) 0.4% ( -6% - 7%) 0.744
AndHighLow 1598.05 (7.2%) 1604.55
(4.6%) 0.4% ( -10% - 13%) 0.836
BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99
(3.3%) 0.5% ( -5% - 7%) 0.603
OrHighHigh 109.37 (4.2%) 110.14
(4.0%) 0.7% ( -7% - 9%) 0.599
BrowseMonthTaxoFacets 30.77 (3.5%) 31.00
(4.1%) 0.8% ( -6% - 8%) 0.541
BrowseDateTaxoFacets 28.71 (3.2%) 28.93
(3.3%) 0.8% ( -5% - 7%) 0.461
HighTermDayOfYearSort 593.30 (13.5%) 599.82
(13.2%) 1.1% ( -22% - 32%) 0.800
AndHighHigh 441.62 (5.0%) 452.99
(4.1%) 2.6% ( -6% - 12%) 0.083
IntNRQ 121.71 (6.2%) 124.89
(4.2%) 2.6% ( -7% - 13%) 0.127
HighTerm 599.78 (4.2%) 615.86
(2.6%) 2.7% ( -3% - 9%) 0.019
MedSloppyPhrase 397.69 (3.1%) 411.46
(3.3%) 3.5% ( -2% - 10%) 0.001
MedSpanNear 75.75 (2.8%) 78.59
(1.5%) 3.7% ( 0% - 8%) 0.000
HighIntervalsOrdered 108.30 (2.8%) 112.66
(2.3%) 4.0% ( 0% - 9%) 0.000
HighSpanNear 23.10 (3.2%) 24.25
(1.5%) 5.0% ( 0% - 9%) 0.000
MedTerm 1001.40 (4.2%) 1055.70
(2.4%) 5.4% ( -1% - 12%) 0.000
LowPhrase 258.65 (2.3%) 278.10
(2.2%) 7.5% ( 2% - 12%) 0.000
HighPhrase 67.81 (3.0%) 72.94
(3.7%) 7.6% ( 0% - 14%) 0.000
HighSloppyPhrase 20.13 (6.0%) 21.69
(5.9%) 7.7% ( -3% - 20%) 0.000
MedPhrase 258.96 (2.6%) 279.48
(3.0%) 7.9% ( 2% - 13%) 0.000
LowIntervalsOrdered 476.40 (3.2%) 516.31
(2.8%) 8.4% ( 2% - 14%) 0.000
MedIntervalsOrdered 112.10 (2.4%) 121.85
(2.9%) 8.7% ( 3% - 14%) 0.000
AndHighMed 784.68 (5.2%) 856.24
(5.1%) 9.1% ( -1% - 20%) 0.000
LowSpanNear 92.93 (1.8%) 101.80
(2.5%) 9.5% ( 5% - 14%) 0.000
LowSloppyPhrase 250.51 (3.0%) 279.69
(3.6%) 11.6% ( 4% - 18%) 0.000
{code}
Then, i deleted the check of TotalHits In LuceneUtil and rerun the benchmark.
As expected, w can see that QPS of tasks with a totalHits diff decreased and
others increased. I post the report here in case some one would be interested
in.
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
AndHighHigh 214.93 (3.8%) 183.83
(2.6%) -14.5% ( -20% - -8%) 0.000
MedTerm 2589.52 (4.5%) 2303.67
(5.5%) -11.0% ( -20% - -1%) 0.000
HighTerm 1750.90 (4.0%) 1560.54
(4.3%) -10.9% ( -18% - -2%) 0.000
HighPhrase 238.61 (2.8%) 218.08
(4.3%) -8.6% ( -15% - -1%) 0.000
OrHighHigh 117.03 (1.9%) 107.52
(4.8%) -8.1% ( -14% - -1%) 0.000
HighTermMonthSort 905.11 (10.5%) 864.34
(9.3%) -4.5% ( -21% - 17%) 0.150
HighTermDayOfYearSort 1095.73 (10.4%) 1056.20
(11.0%) -3.6% ( -22% - 19%) 0.288
PKLookup 249.62 (3.8%) 241.15
(4.6%) -3.4% ( -11% - 5%) 0.011
LowTerm 2761.54 (4.6%) 2681.22
(6.8%) -2.9% ( -13% - 8%) 0.111
Respell 163.65 (3.4%) 159.17
(3.8%) -2.7% ( -9% - 4%) 0.016
Wildcard 587.89 (2.9%) 573.02
(4.8%) -2.5% ( -9% - 5%) 0.044
IntNRQ 654.86 (4.4%) 644.88
(5.4%) -1.5% ( -10% - 8%) 0.328
LowPhrase 596.01 (4.3%) 587.28
(5.5%) -1.5% ( -10% - 8%) 0.349
HighIntervalsOrdered 16.48 (8.9%) 16.26
(6.4%) -1.3% ( -15% - 15%) 0.586
AndHighLow 1665.94 (6.4%) 1649.07
(6.1%) -1.0% ( -12% - 12%) 0.610
BrowseDayOfYearSSDVFacets 142.76 (2.5%) 141.87
(3.3%) -0.6% ( -6% - 5%) 0.507
BrowseDateTaxoFacets 29.49 (4.2%) 29.40
(3.8%) -0.3% ( -8% - 8%) 0.796
MedPhrase 653.42 (4.6%) 652.05
(5.6%) -0.2% ( -9% - 10%) 0.897
Fuzzy1 116.77 (6.3%) 116.59
(10.4%) -0.2% ( -15% - 17%) 0.956
BrowseDayOfYearTaxoFacets 29.58 (4.3%) 29.55
(4.1%) -0.1% ( -8% - 8%) 0.929
Fuzzy2 73.12 (10.4%) 73.04
(10.7%) -0.1% ( -19% - 23%) 0.974
BrowseMonthTaxoFacets 31.65 (5.0%) 31.64
(4.9%) -0.0% ( -9% - 10%) 0.985
BrowseMonthSSDVFacets 155.25 (3.5%) 155.27
(3.8%) 0.0% ( -7% - 7%) 0.991
OrHighMed 267.80 (5.9%) 268.44
(6.2%) 0.2% ( -11% - 13%) 0.900
OrHighLow 820.94 (8.5%) 832.70
(7.8%) 1.4% ( -13% - 19%) 0.579
Prefix3 483.34 (5.8%) 490.76
(7.1%) 1.5% ( -10% - 15%) 0.453
LowSloppyPhrase 268.01 (2.2%) 279.16
(3.9%) 4.2% ( -1% - 10%) 0.000
LowSpanNear 518.44 (3.8%) 542.08
(5.2%) 4.6% ( -4% - 14%) 0.002
MedSloppyPhrase 252.28 (2.4%) 264.31
(2.2%) 4.8% ( 0% - 9%) 0.000
HighSloppyPhrase 157.88 (2.6%) 165.44
(3.1%) 4.8% ( 0% - 10%) 0.000
HighSpanNear 232.57 (2.5%) 243.72
(3.5%) 4.8% ( -1% - 11%) 0.000
LowIntervalsOrdered 697.59 (3.8%) 734.23
(4.8%) 5.3% ( -3% - 14%) 0.000
MedSpanNear 171.60 (3.1%) 181.41
(4.4%) 5.7% ( -1% - 13%) 0.000
MedIntervalsOrdered 356.52 (3.1%) 383.69
(4.1%) 7.6% ( 0% - 15%) 0.000
AndHighMed 555.66 (4.4%) 617.40
(5.7%) 11.1% ( 0% - 22%) 0.000
{code}
was (Author: gf2121):
Out of curiosity, I tried to run the luceneutil wikimedium1m for block size =
256, but got an error there:
{code:java}
WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+
WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+
WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+
WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+
WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+
WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+
Traceback (most recent call last):
File
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py",
line 60, in <module>
comp.benchmark("baseline_vs_patch")
File
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py",
line 494, in benchmark
searchBench.run(id, base, challenger,
File
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py",
line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None
sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+',
'query=body:he body:resulting filter=None sort=None groupField=None
hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official
filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs
7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+:
wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None
sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+',
'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong
hitCount: 2394+ vs 3325+'], 1.0)
{code}
I guess this error may be something about Impacts? So i changed the
{{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and
candidate and rerun the benchmark, everything looks good now and i got a
rather good report. But this report does *not* really makes sense since we
changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, just to verify the results are right.
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
Fuzzy1 118.73 (11.5%) 114.82
(13.0%) -3.3% ( -24% - 23%) 0.407
LowTerm 2369.88 (9.2%) 2323.31
(5.7%) -2.0% ( -15% - 14%) 0.428
PKLookup 250.07 (5.0%) 245.42
(4.3%) -1.9% ( -10% - 7%) 0.214
Prefix3 306.43 (6.9%) 301.82
(7.0%) -1.5% ( -14% - 13%) 0.502
Wildcard 221.77 (5.2%) 218.64
(4.0%) -1.4% ( -10% - 8%) 0.348
HighTermMonthSort 1161.02 (12.7%) 1156.95
(11.1%) -0.4% ( -21% - 26%) 0.928
BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48
(1.1%) -0.1% ( -2% - 2%) 0.791
Fuzzy2 47.51 (8.9%) 47.57
(7.0%) 0.1% ( -14% - 17%) 0.961
Respell 200.51 (2.7%) 200.82
(1.4%) 0.2% ( -3% - 4%) 0.823
OrHighMed 197.90 (3.0%) 198.36
(3.6%) 0.2% ( -6% - 7%) 0.830
BrowseMonthSSDVFacets 152.24 (2.8%) 152.74
(1.0%) 0.3% ( -3% - 4%) 0.630
OrHighLow 245.11 (3.5%) 245.97
(3.1%) 0.4% ( -6% - 7%) 0.744
AndHighLow 1598.05 (7.2%) 1604.55
(4.6%) 0.4% ( -10% - 13%) 0.836
BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99
(3.3%) 0.5% ( -5% - 7%) 0.603
OrHighHigh 109.37 (4.2%) 110.14
(4.0%) 0.7% ( -7% - 9%) 0.599
BrowseMonthTaxoFacets 30.77 (3.5%) 31.00
(4.1%) 0.8% ( -6% - 8%) 0.541
BrowseDateTaxoFacets 28.71 (3.2%) 28.93
(3.3%) 0.8% ( -5% - 7%) 0.461
HighTermDayOfYearSort 593.30 (13.5%) 599.82
(13.2%) 1.1% ( -22% - 32%) 0.800
AndHighHigh 441.62 (5.0%) 452.99
(4.1%) 2.6% ( -6% - 12%) 0.083
IntNRQ 121.71 (6.2%) 124.89
(4.2%) 2.6% ( -7% - 13%) 0.127
HighTerm 599.78 (4.2%) 615.86
(2.6%) 2.7% ( -3% - 9%) 0.019
MedSloppyPhrase 397.69 (3.1%) 411.46
(3.3%) 3.5% ( -2% - 10%) 0.001
MedSpanNear 75.75 (2.8%) 78.59
(1.5%) 3.7% ( 0% - 8%) 0.000
HighIntervalsOrdered 108.30 (2.8%) 112.66
(2.3%) 4.0% ( 0% - 9%) 0.000
HighSpanNear 23.10 (3.2%) 24.25
(1.5%) 5.0% ( 0% - 9%) 0.000
MedTerm 1001.40 (4.2%) 1055.70
(2.4%) 5.4% ( -1% - 12%) 0.000
LowPhrase 258.65 (2.3%) 278.10
(2.2%) 7.5% ( 2% - 12%) 0.000
HighPhrase 67.81 (3.0%) 72.94
(3.7%) 7.6% ( 0% - 14%) 0.000
HighSloppyPhrase 20.13 (6.0%) 21.69
(5.9%) 7.7% ( -3% - 20%) 0.000
MedPhrase 258.96 (2.6%) 279.48
(3.0%) 7.9% ( 2% - 13%) 0.000
LowIntervalsOrdered 476.40 (3.2%) 516.31
(2.8%) 8.4% ( 2% - 14%) 0.000
MedIntervalsOrdered 112.10 (2.4%) 121.85
(2.9%) 8.7% ( 3% - 14%) 0.000
AndHighMed 784.68 (5.2%) 856.24
(5.1%) 9.1% ( -1% - 20%) 0.000
LowSpanNear 92.93 (1.8%) 101.80
(2.5%) 9.5% ( 5% - 14%) 0.000
LowSloppyPhrase 250.51 (3.0%) 279.69
(3.6%) 11.6% ( 4% - 18%) 0.000
{code}
Then, i deleted the check of TotalHits In LuceneUtil and rerun the benchmark.
As expected, I can see that QPS of tasks with a totalHits diff decreased and
others increased. I post the report here in case some one would be interested
in.
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
AndHighHigh 214.93 (3.8%) 183.83
(2.6%) -14.5% ( -20% - -8%) 0.000
MedTerm 2589.52 (4.5%) 2303.67
(5.5%) -11.0% ( -20% - -1%) 0.000
HighTerm 1750.90 (4.0%) 1560.54
(4.3%) -10.9% ( -18% - -2%) 0.000
HighPhrase 238.61 (2.8%) 218.08
(4.3%) -8.6% ( -15% - -1%) 0.000
OrHighHigh 117.03 (1.9%) 107.52
(4.8%) -8.1% ( -14% - -1%) 0.000
HighTermMonthSort 905.11 (10.5%) 864.34
(9.3%) -4.5% ( -21% - 17%) 0.150
HighTermDayOfYearSort 1095.73 (10.4%) 1056.20
(11.0%) -3.6% ( -22% - 19%) 0.288
PKLookup 249.62 (3.8%) 241.15
(4.6%) -3.4% ( -11% - 5%) 0.011
LowTerm 2761.54 (4.6%) 2681.22
(6.8%) -2.9% ( -13% - 8%) 0.111
Respell 163.65 (3.4%) 159.17
(3.8%) -2.7% ( -9% - 4%) 0.016
Wildcard 587.89 (2.9%) 573.02
(4.8%) -2.5% ( -9% - 5%) 0.044
IntNRQ 654.86 (4.4%) 644.88
(5.4%) -1.5% ( -10% - 8%) 0.328
LowPhrase 596.01 (4.3%) 587.28
(5.5%) -1.5% ( -10% - 8%) 0.349
HighIntervalsOrdered 16.48 (8.9%) 16.26
(6.4%) -1.3% ( -15% - 15%) 0.586
AndHighLow 1665.94 (6.4%) 1649.07
(6.1%) -1.0% ( -12% - 12%) 0.610
BrowseDayOfYearSSDVFacets 142.76 (2.5%) 141.87
(3.3%) -0.6% ( -6% - 5%) 0.507
BrowseDateTaxoFacets 29.49 (4.2%) 29.40
(3.8%) -0.3% ( -8% - 8%) 0.796
MedPhrase 653.42 (4.6%) 652.05
(5.6%) -0.2% ( -9% - 10%) 0.897
Fuzzy1 116.77 (6.3%) 116.59
(10.4%) -0.2% ( -15% - 17%) 0.956
BrowseDayOfYearTaxoFacets 29.58 (4.3%) 29.55
(4.1%) -0.1% ( -8% - 8%) 0.929
Fuzzy2 73.12 (10.4%) 73.04
(10.7%) -0.1% ( -19% - 23%) 0.974
BrowseMonthTaxoFacets 31.65 (5.0%) 31.64
(4.9%) -0.0% ( -9% - 10%) 0.985
BrowseMonthSSDVFacets 155.25 (3.5%) 155.27
(3.8%) 0.0% ( -7% - 7%) 0.991
OrHighMed 267.80 (5.9%) 268.44
(6.2%) 0.2% ( -11% - 13%) 0.900
OrHighLow 820.94 (8.5%) 832.70
(7.8%) 1.4% ( -13% - 19%) 0.579
Prefix3 483.34 (5.8%) 490.76
(7.1%) 1.5% ( -10% - 15%) 0.453
LowSloppyPhrase 268.01 (2.2%) 279.16
(3.9%) 4.2% ( -1% - 10%) 0.000
LowSpanNear 518.44 (3.8%) 542.08
(5.2%) 4.6% ( -4% - 14%) 0.002
MedSloppyPhrase 252.28 (2.4%) 264.31
(2.2%) 4.8% ( 0% - 9%) 0.000
HighSloppyPhrase 157.88 (2.6%) 165.44
(3.1%) 4.8% ( 0% - 10%) 0.000
HighSpanNear 232.57 (2.5%) 243.72
(3.5%) 4.8% ( -1% - 11%) 0.000
LowIntervalsOrdered 697.59 (3.8%) 734.23
(4.8%) 5.3% ( -3% - 14%) 0.000
MedSpanNear 171.60 (3.1%) 181.41
(4.4%) 5.7% ( -1% - 13%) 0.000
MedIntervalsOrdered 356.52 (3.1%) 383.69
(4.1%) 7.6% ( 0% - 15%) 0.000
AndHighMed 555.66 (4.4%) 617.40
(5.7%) 11.1% ( 0% - 22%) 0.000
{code}
> Make ForUtil#BLOCK_SIZE changeable
> ----------------------------------
>
> Key: LUCENE-10319
> URL: https://issues.apache.org/jira/browse/LUCENE-10319
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Feng Guo
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In LUCENE-10315, I tried to generate a {{ForUtil}} whose
> {{{}BLOCK_SIZE=512{}}}, I thought it could be simple since it looks like i
> only need to change the BLOCK_SIZE, but it turns out that there are a lot of
> values related to the BLOCK_SIZE but hard coded.
> So this is trying to make all hard code value generated from the BLOCK_SIZE
> in case we need a ForUtil somewhere else or want to change BLOCK_SIZE in
> postings in feature.
> I tried to make the BLOCK_SIZE = 64 / 256 and all tests passed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]