mccullocht commented on issue #15697:
URL: https://github.com/apache/lucene/issues/15697#issuecomment-3931478158

   Running luceneutil would be interesting but my experience is that typically 
microbenchmarks show much better results than larger scale tests. It does seem 
that this is aarch64 specific. Weird that loading twice is faster, but since 
you are loading the same address twice the second load is almost free.
   
   AMD Ryzen AI 395; AVX 512
   ```
   Benchmark                                                       (size)   
Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedScalar    1024  
thrpt   15   2.523 ± 0.030  ops/us
   VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  
thrpt   15  11.828 ± 0.169  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedScalar        1024  
thrpt   15   2.303 ± 0.011  ops/us
   VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  
thrpt   15  12.519 ± 0.489  ops/us
   ```
   
   Strangely enough I see the same kind of weird performance falloff in warmup 
iterations.
   ```
   INFO: Java vector incubator API enabled; uses preferredBitSize=512; FMA 
enabled
   35.595 ops/us
   # Warmup Iteration   2: 41.108 ops/us
   # Warmup Iteration   3: 12.517 ops/us
   # Warmup Iteration   4: 12.766 ops/us
   Iteration   1: 12.801 ops/us
   Iteration   2: 12.781 ops/us
   Iteration   3: 12.763 ops/us
   Iteration   4: 12.843 ops/us
   Iteration   5: 12.810 ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to