Re: [PR] Optimize byte vector scoring: precompute cosine query norm and pack int4 queries [lucene]

via GitHub Thu, 04 Jun 2026 08:19:23 -0700


jegentile commented on PR #16200:
URL: https://github.com/apache/lucene/pull/16200#issuecomment-4623537667


   ## Benchmark Results
   
   Byte vector cosine scoring throughput (ops/μs, higher is better). Measured 
with JMH using `VectorScorerBenchmark` (3 forks, 4 warmup iterations, 5 
measurement iterations, padBytes=0).
   
   | Benchmark | Dim | Baseline (`main`) | This PR | Speedup |
   |---|---|---|---|---|
   | **Default (scalar)** | 128 | 6.35 ± 0.13 | 10.46 ± 0.23 | **+65%** |
   | **Default (scalar)** | 512 | 1.67 ± 0.05 | 3.01 ± 0.07 | **+81%** |
   | **Default (scalar)** | 1024 | 0.79 ± 0.02 | 1.55 ± 0.04 | **+95%** |
   | **MemSeg (Panama SIMD)** | 128 | 19.12 ± 0.41 | 24.31 ± 0.49 | **+27%** |
   | **MemSeg (Panama SIMD)** | 512 | 8.29 ± 0.58 | 10.74 ± 0.34 | **+30%** |
   | **MemSeg (Panama SIMD)** | 1024 | 4.63 ± 0.16 | 6.22 ± 0.20 | **+34%** |
   
   The scalar path improvement (~65-95%) comes from eliminating the query norm 
computation entirely from the loop. The SIMD path improvement (~27-34%) comes 
from removing one accumulator, one multiply, and one `reduceLanes` per vector 
iteration.
   
   **Environment:** OpenJDK 25.0.3, Linux 7.0.10-arch1-1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize byte vector scoring: precompute cosine query norm and pack int4 queries [lucene]

Reply via email to