mccullocht commented on PR #15742:
URL: https://github.com/apache/lucene/pull/15742#issuecomment-3937456120

   It looks like there are some losses on x86 -- binaryHalfByteDotProductVector 
sees a small loss, binaryHalfByteSquareVector sees a fairly large loss. I'll 
investigate a bit.
   
   AMD Ryzen AI 395 (AVX 512)
   Baseline:
   ```
   VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector      1024  
thrpt   15  23.318 ± 0.107  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  
thrpt   15  11.839 ± 0.075  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductVector                1024  
thrpt   15  66.883 ± 0.965  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector          1024  
thrpt   15  29.886 ± 0.167  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  
thrpt   15  12.464 ± 0.374  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareVector                    1024  
thrpt   15  71.097 ± 0.476  ops/us
   ```
   Experiment:
   ```
   Benchmark                                                       (size)   
Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector      1024  
thrpt   15  35.444 ± 0.184  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  
thrpt   15  42.581 ± 0.464  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductVector                1024  
thrpt   15  62.781 ± 0.686  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector          1024  
thrpt   15  33.665 ± 0.254  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  
thrpt   15  41.314 ± 0.367  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareVector                    1024  
thrpt   15  58.312 ± 0.703  ops/us
   ```
   
   Mac M2 (128 bit/NEON):
   Baseline:
   ```
   Benchmark                                                       (size)   
Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector      1024  
thrpt   15  15.866 ± 0.206  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  
thrpt   15   2.746 ± 0.029  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductVector                1024  
thrpt   15  13.612 ± 0.127  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector          1024  
thrpt   15  15.815 ± 0.068  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  
thrpt   15   2.758 ± 0.031  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareVector                    1024  
thrpt   15  13.440 ± 0.088  ops/us
   ```
   Experiment:
   ```
   Benchmark                                                       (size)   
Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector      1024  
thrpt   15  23.285 ± 0.371  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  
thrpt   15  25.559 ± 0.601  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductVector                1024  
thrpt   15  17.269 ± 1.498  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector          1024  
thrpt   15  21.115 ± 0.188  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  
thrpt   15  24.063 ± 0.477  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareVector                    1024  
thrpt   15  17.184 ± 0.077  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to