On 16.10.2017 22:41, Florian Klämpfl wrote:
P.S.: I am currently working on another version of CompareByte that might have
a slightly higher
latency for very small len but a higher throughput (2 cycles per iteration vs.
3 cycles on an Intel
Arrandale CPU (Westmere microarchitecture)). But this would need some more
testing and benchmarking.
I can come up with it here again if this would be of any interest.
Small lengths in terms of matching string or overall lengths?
It is small length in terms of matching string as there is some setup
work before the loop.
BTW: I would really like to see a PCMPSTR based implementation :)
PCMPSTR is (at the moment) out of my scope. I thought PCMPSTR is part of
SSE4.2. How would you deal with Intel core microarchitecture CPUs that
don't have it?
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel