benwtrent commented on PR #13321: URL: https://github.com/apache/lucene/pull/13321#issuecomment-2079695750
OK, I ran on Google's ARM machine (`Tau T2A machine series`) to make sure the ARM performance improvements still exist for int4 (and it wasn't some silly macos thing): ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryDotProductScalar 1024 thrpt 15 2.850 ± 0.002 ops/us VectorUtilBenchmark.binaryDotProductVector 1024 thrpt 15 2.771 ± 0.016 ops/us VectorUtilBenchmark.binaryHalfByteScalar 1024 thrpt 15 2.845 ± 0.009 ops/us VectorUtilBenchmark.binaryHalfByteScalarPacked 1024 thrpt 15 2.128 ± 0.003 ops/us VectorUtilBenchmark.binaryHalfByteVector 1024 thrpt 15 7.667 ± 0.007 ops/us VectorUtilBenchmark.binaryHalfByteVectorPacked 1024 thrpt 15 7.009 ± 0.025 ops/us ``` Something else funny is that this is almost at the same speed as `floatVector` on this hardware. Micro-benchmarks are VERY close to float. This means the reduction in bytes read & parsing will make int4 much faster than float. ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 2.476 ± 0.028 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 8.703 ± 0.300 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org