On 16.10.2017 22:41, Florian Klämpfl wrote:
P.S.: I am currently working on another version of CompareByte that might have 
a slightly higher
latency for very small len but a higher throughput (2 cycles per iteration vs. 
3 cycles on an Intel
Arrandale CPU (Westmere microarchitecture)). But this would need some more 
testing and benchmarking.
I can come up with it here again if this would be of any interest.

Small lengths in terms of matching string or overall lengths?

It is small length in terms of matching string as there is some setup work before the loop.

BTW: I would really like to see a PCMPSTR based implementation :)
PCMPSTR is (at the moment) out of my scope. I thought PCMPSTR is part of SSE4.2. How would you deal with Intel core microarchitecture CPUs that don't have it?
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to