zacharymorn commented on PR #12194:
URL: https://github.com/apache/lucene/pull/12194#issuecomment-1496995523
> Hmm, note that the actual QPS is varying quite a bit every time. In your
luceneutil run, are you fixing the random seed so the same queries are used
every time?
Yeah indeed. I didn't fix the random seed during my luceneutil runs, and
thus the results vary a lot as they may depend on the index and queries under
test.
> It is odd that `PKLookup` performance drops too.
I did a few more testings for this, and have some interesting findings:
#### No changes (comparing baseline with baseline) :
```
Task: AndHighNotMonth: +its -monthPostings:apr # freq=1160703
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
AndHighNotMonth 62.41 (9.4%) 62.45
(7.7%) 0.1% ( -15% - 18%) 0.979
PKLookup 176.62 (28.2%) 177.03
(33.2%) 0.2% ( -47% - 85%) 0.981
```
```
Task: AndHighNotMonth: +its -monthPostings:apr # freq=1160703
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
PKLookup 175.53 (25.3%) 166.38
(26.2%) -5.2% ( -45% - 62%) 0.522
AndHighNotMonth 60.36 (17.1%) 62.29
(9.0%) 3.2% ( -19% - 35%) 0.459
```
PKLookup seems varies a lot as well when there are no changes.
#### With changes (comparing modified with baseline), and also modify task
query:
```
Task: AndHighNotMonth: +its -monthPostings:apr # freq=1160703
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
PKLookup 182.12 (28.8%) 84.26
(8.4%) -53.7% ( -70% - -23%) 0.000
AndHighNotMonth 64.22 (8.4%) 160.51
(59.1%) 149.9% ( 75% - 237%) 0.000
```
```
Task: AndHighNotMonth: +its -monthPostings:jan # freq=1160703
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
PKLookup 81.40 (17.4%) 91.44
(45.6%) 12.3% ( -43% - 91%) 0.258
AndHighNotMonth 116.74 (9.2%) 160.54
(45.8%) 37.5% ( -15% - 101%) 0.000
```
```
Task: AndHighNotMonth: +its -monthPostings:may # freq=1160703
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
PKLookup 80.18 (6.3%) 74.90
(9.4%) -6.6% ( -20% - 9%) 0.009
AndHighNotMonth 92.19 (12.6%) 144.56
(23.6%) 56.8% ( 18% - 106%) 0.000
```
```
No task, and only PKLookup is run
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
PKLookup 128.55 (27.2%) 142.59
(36.9%) 10.9% ( -41% - 103%) 0.286
```
In addition, I noticed adding `-Xbatch` JVM argument will actually make the
-50% slow down go away (and also boost PKLookup's QPS):
`localconstants.py`
```
if 'JAVA_EXE' not in globals():
JAVA_EXE = 'java'
if 'JAVAC_EXE' not in globals():
JAVAC_EXE = 'javac'
if 'JAVA_COMMAND' not in globals():
JAVA_COMMAND = '%s -Xbatch' % JAVA_EXE
```
```
Task: AndHighNotMonth: +its -monthPostings:apr # freq=1160703
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
PKLookup 328.59 (10.2%) 347.16
(7.2%) 5.7% ( -10% - 25%) 0.043
AndHighNotMonth 60.21 (5.4%) 160.46
(41.8%) 166.5% ( 113% - 225%) 0.000
```
I suspect it's indeed JVM compilation that's causing the difference? Below
is the full jvm command line from modified `localconstants` above and printed
out by benchmark in case it will be useful:
```
java -Xbatch
-XX:StartFlightRecording=dumponexit=true,maxsize=250M,settings=/Users/xichen/IdeaProjects/benchmarks/util/src/python/profiling.jfc,filename=/Users/xichen/IdeaProjects/benchmarks/logs/bench-search-baseline_vs_patch-my_modified_version-19.jfr
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -classpath
/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/sandbox/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/misc/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/facet/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/analysis/common/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/analysis/icu/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/queryparser/build/classes
/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/grouping/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/suggest/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/highlighter/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/codecs/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/queries/build/classes/java/main:/Users/xichen/.gradle/caches/modules-2/files-2.1/com.carrotsearch/hppc/0.9.1/4bf4c51e06aec600894d841c4c004566b20dd357/hppc-0.9.1.jar:/Users/xichen/IdeaProjects/benchmarks/util/lib/HdrHistogram.jar:/Users/xichen/IdeaProjects/benchmarks/util/build
perf.SearchPerfTest -dirImpl MMapDirectory -indexPath
/Users/xichen/IdeaProjects/benchmarks/indices/wikimedium10m.lucene_baseline.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.Lucene90.Lucene90.dvfields.sor
t=month:custom.nd10M -facets taxonomy:Date;Date -facets taxonomy:Month;Month
-facets taxonomy:DayOfYear;DayOfYear -facets sortedset:Date;Date -facets
sortedset:Month;Month -facets sortedset:DayOfYear;DayOfYear -analyzer
StandardAnalyzer -taskSource
/Users/xichen/IdeaProjects/benchmarks/util/tasks/wikimedium.10M.nostopwords.tasks
-searchThreadCount 2 -taskRepeatCount 20 -field body -tasksPerCat 1
-staticSeed -2249101 -seed -4093553 -similarity BM25Similarity -commit multi
-hiliteImpl FastVectorHighlighter -log
/Users/xichen/IdeaProjects/benchmarks/logs/baseline_vs_patch.my_modified_version.19
-topN 100 -pk
```
In terms of code, PKLookup will execute this [section of modified
code](https://github.com/apache/lucene/pull/12194/files#diff-900619bac18cb1e2e177533efe157e9b4707d0c855180f535051f0d955828306R530-R543)
when its doing [doc
enumeration](https://github.com/mikemccand/luceneutil/blob/2c8ccdf53e93622761a545c1a54377514c338caa/src/main/perf/PKLookupTask.java#L111),
but reverting changes there didn't solve the issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]