jegentile commented on PR #16200: URL: https://github.com/apache/lucene/pull/16200#issuecomment-4623537667
## Benchmark Results Byte vector cosine scoring throughput (ops/μs, higher is better). Measured with JMH using `VectorScorerBenchmark` (3 forks, 4 warmup iterations, 5 measurement iterations, padBytes=0). | Benchmark | Dim | Baseline (`main`) | This PR | Speedup | |---|---|---|---|---| | **Default (scalar)** | 128 | 6.35 ± 0.13 | 10.46 ± 0.23 | **+65%** | | **Default (scalar)** | 512 | 1.67 ± 0.05 | 3.01 ± 0.07 | **+81%** | | **Default (scalar)** | 1024 | 0.79 ± 0.02 | 1.55 ± 0.04 | **+95%** | | **MemSeg (Panama SIMD)** | 128 | 19.12 ± 0.41 | 24.31 ± 0.49 | **+27%** | | **MemSeg (Panama SIMD)** | 512 | 8.29 ± 0.58 | 10.74 ± 0.34 | **+30%** | | **MemSeg (Panama SIMD)** | 1024 | 4.63 ± 0.16 | 6.22 ± 0.20 | **+34%** | The scalar path improvement (~65-95%) comes from eliminating the query norm computation entirely from the loop. The SIMD path improvement (~27-34%) comes from removing one accumulator, one multiply, and one `reduceLanes` per vector iteration. **Environment:** OpenJDK 25.0.3, Linux 7.0.10-arch1-1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
