rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559308415
> Anybody with a real Luceneutil vector-benchmark of the whole luceneutil testsuite? I think the benefits won't look very impressive because it only uses 100 dims. This seems to be out of touch with people screaming for 2048 :) Also when using such non-power-of-two sizes, you can expect perf to not be as good. e.g. 1023 dims will be much slower than 1024. on a 256-bit avx, with float dot product on 1023 dims, we'll work 32 floats at a time (the very fast loop with 4 accumulators), then there's 31 still left over, we'll knock out 24 of them with a slower vector loop with a single accumulator, then the remaining 7 are processed scalar. With 1024 dims, they all just get processed with the very fast loop and nothing is left over. you can see this stuff in benchmarks above where e.g. 128 dims is faster than 100, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
