rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1559308415

   > Anybody with a real Luceneutil vector-benchmark of the whole luceneutil 
testsuite?
   
   I think the benefits won't look very impressive because it only uses 100 
dims. This seems to be out of touch with people screaming for 2048 :)
   
   Also when using such non-power-of-two sizes, you can expect perf to not be 
as good. e.g. 1023 dims will be much slower than 1024.
   
   on a 256-bit avx, with float dot product on 1023 dims, we'll work 32 floats 
at a time (the very fast loop with 4 accumulators), then there's 31 still left 
over, we'll knock out 24 of them with a slower vector loop with a single 
accumulator, then the remaining 7 are processed scalar. With 1024 dims, they 
all just get processed with the very fast loop and nothing is left over.
   
   you can see this stuff in benchmarks above where e.g. 128 dims is faster 
than 100, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to