kaivalnp commented on PR #15698:
URL: https://github.com/apache/lucene/pull/15698#issuecomment-3886507731

   JMH benchmarks using:
   
   ```
   java --module-path lucene/benchmark-jmh/build/benchmarks --module 
org.apache.lucene.benchmark.jmh 
"VectorUtilBenchmark.binaryHalfByte.*SinglePacked.*" -p size=1024
   ```
   
   Baseline:
   
   ```
   Benchmark                                                       (size)   
Mode  Cnt  Score   Error   Units
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedScalar    1024  
thrpt   15  2.443 ± 0.016  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  
thrpt   15  2.605 ± 0.010  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedScalar        1024  
thrpt   15  2.020 ± 0.013  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  
thrpt   15  2.587 ± 0.025  ops/us
   ```
   
   Notice that the vector implementations don't provide a large performance 
improvement over the scalar ones.
   
   Here is the JMH run of `binaryHalfByteDotProductSinglePackedVector`:
   
   ```
   # Warmup Iteration   1: 11.063 ops/us
   # Warmup Iteration   2: 15.398 ops/us
   # Warmup Iteration   3: 2.589 ops/us
   # Warmup Iteration   4: 2.586 ops/us
   Iteration   1: 2.598 ops/us
   Iteration   2: 2.606 ops/us
   Iteration   3: 2.596 ops/us
   Iteration   4: 2.602 ops/us
   Iteration   5: 2.606 ops/us
   ```
   
   Notice that we initially get good performance from the vectorized function, 
but I'm guessing it eventually gets deoptimized because of cache line misses?
   
   Candidate:
   
   ```
   Benchmark                                                       (size)   
Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedScalar    1024  
thrpt   15   2.079 ± 0.003  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  
thrpt   15  10.338 ± 0.105  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedScalar        1024  
thrpt   15   2.203 ± 0.002  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  
thrpt   15   9.979 ± 0.103  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to