On Sunday, 25 October 2015 at 19:37:32 UTC, Iakh wrote:
Here is my implementatation of SIMD find. Function returns index of ubyte in static 16 byte array with unique values.

[snip]

You need to be very careful with doing benchmarks on tiny test cases, they can be very misleading.

Be aware that the speed of bsf() and bsr() is very very strongly processor dependent. On some machines, it is utterly pathetic. eg AMD K7, BSR is 23 micro-operations, on original pentium is was up to 73 (!), even on AMD Bobcat it is 11 micro-ops, but on recent Intel it is one micro-op. This fact of 73 can totally screw up your performance comparisons.

Just because it is a single machine instruction does not mean it is fast.

Reply via email to