Re: [PR] Optimize int4 vector computations by avoiding conversions [lucene]

via GitHub Tue, 24 Feb 2026 12:03:27 -0800


kaivalnp commented on PR #15742:
URL: https://github.com/apache/lucene/pull/15742#issuecomment-3954429323


   I ran this JMH benchmark on `branch_10x` with Java 24:
   
   ```sh
   java --module-path lucene/benchmark-jmh/build/benchmarks --module 
org.apache.lucene.benchmark.jmh "VectorUtilBenchmark.binaryHalfByte.*Vector" -p 
size=1024
   ```
   
   ```
   openjdk 24.0.2 2025-07-15
   OpenJDK Runtime Environment (build 24.0.2+12-54)
   OpenJDK 64-Bit Server VM (build 24.0.2+12-54, mixed mode, sharing)
   ```
   
   Baseline:
   
   ```
   Benchmark                                                       (size)   
Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector      1024  
thrpt   15   0.471 ± 0.004  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  
thrpt   15  16.144 ± 0.039  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductVector                1024  
thrpt   15  20.829 ± 0.101  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector          1024  
thrpt   15  14.145 ± 0.030  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  
thrpt   15  16.309 ± 0.031  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareVector                    1024  
thrpt   15  18.687 ± 0.113  ops/us
   ```
   
   There is a drop in performance of `binaryHalfByteDotProductBothPackedVector` 
after some warmup iterations:
   
   ```
   # Warmup Iteration   1: 10.793 ops/us
   # Warmup Iteration   2: 13.988 ops/us
   # Warmup Iteration   3: 0.468 ops/us
   # Warmup Iteration   4: 0.464 ops/us
   Iteration   1: 0.463 ops/us
   Iteration   2: 0.466 ops/us
   Iteration   3: 0.470 ops/us
   Iteration   4: 0.470 ops/us
   Iteration   5: 0.469 ops/us
   ```
   
   Candidate (cherry-pick this PR):
   
   ```
   Benchmark                                                       (size)   
Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector      1024  
thrpt   15  19.708 ± 0.095  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  
thrpt   15  21.649 ± 0.125  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductVector                1024  
thrpt   15  26.482 ± 0.145  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector          1024  
thrpt   15  15.783 ± 0.043  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  
thrpt   15  16.710 ± 0.056  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareVector                    1024  
thrpt   15  19.008 ± 0.151  ops/us
   ```
   
   The cherry-pick was successful without conflicts, and performance seems to 
improve in general -- should we target this for 10.5?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize int4 vector computations by avoiding conversions [lucene]

Reply via email to