kaivalnp commented on PR #15742: URL: https://github.com/apache/lucene/pull/15742#issuecomment-3954429323
I ran this JMH benchmark on `branch_10x` with Java 24: ```sh java --module-path lucene/benchmark-jmh/build/benchmarks --module org.apache.lucene.benchmark.jmh "VectorUtilBenchmark.binaryHalfByte.*Vector" -p size=1024 ``` ``` openjdk 24.0.2 2025-07-15 OpenJDK Runtime Environment (build 24.0.2+12-54) OpenJDK 64-Bit Server VM (build 24.0.2+12-54, mixed mode, sharing) ``` Baseline: ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector 1024 thrpt 15 0.471 ± 0.004 ops/us VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector 1024 thrpt 15 16.144 ± 0.039 ops/us VectorUtilBenchmark.binaryHalfByteDotProductVector 1024 thrpt 15 20.829 ± 0.101 ops/us VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector 1024 thrpt 15 14.145 ± 0.030 ops/us VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector 1024 thrpt 15 16.309 ± 0.031 ops/us VectorUtilBenchmark.binaryHalfByteSquareVector 1024 thrpt 15 18.687 ± 0.113 ops/us ``` There is a drop in performance of `binaryHalfByteDotProductBothPackedVector` after some warmup iterations: ``` # Warmup Iteration 1: 10.793 ops/us # Warmup Iteration 2: 13.988 ops/us # Warmup Iteration 3: 0.468 ops/us # Warmup Iteration 4: 0.464 ops/us Iteration 1: 0.463 ops/us Iteration 2: 0.466 ops/us Iteration 3: 0.470 ops/us Iteration 4: 0.470 ops/us Iteration 5: 0.469 ops/us ``` Candidate (cherry-pick this PR): ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector 1024 thrpt 15 19.708 ± 0.095 ops/us VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector 1024 thrpt 15 21.649 ± 0.125 ops/us VectorUtilBenchmark.binaryHalfByteDotProductVector 1024 thrpt 15 26.482 ± 0.145 ops/us VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector 1024 thrpt 15 15.783 ± 0.043 ops/us VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector 1024 thrpt 15 16.710 ± 0.056 ops/us VectorUtilBenchmark.binaryHalfByteSquareVector 1024 thrpt 15 19.008 ± 0.151 ops/us ``` The cherry-pick was successful without conflicts, and performance seems to improve in general -- should we target this for 10.5? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
