Using the code given below as "inner", I measure this: Current Trunk: O0 compare-byte-1 : 196065.112 +/- 896.754 cycles/inner [0.5 %CV 1.6 %R] O1 compare-byte-1 : 196510.158 +/- 577.976 cycles/inner [0.3 %CV 1.1 %R] O3 compare-byte-1 : 187540.922 +/- 706.167 cycles/inner [0.4 %CV 1.5 %R] Patch from 2017-10-21: O0 compare-byte-2 : 175831.632 +/- 965.972 cycles/inner [0.5 %CV 2.1 %R] O1 compare-byte-2 : 176039.560 +/- 527.141 cycles/inner [0.3 %CV 1.0 %R] O3 compare-byte-2 : 158527.167 +/- 661.690 cycles/inner [0.4 %CV 1.5 %R] (%CV: coefficient of variance * 100%. %R: span as % of mean)
CPU: Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz Family 6 Model 60 Stepping 3 (Haswell) true single core clock (measured) 2.83 GHz So the new version is a bit faster, but not by a large margin (10-15%). It is statistically significant though. While I'm at it, i386 could use some love: O1 compare-byte-1 : 755247.183 +/- 8125.671 cycles/inner [1.1 %CV 4.5 %R] That's 3.8 times slower than x64 for exactly the same code. Code: len:=random(100); for j:=0 to len-1 do begin buf1[j]:=random(256); buf2[j]:=random(256); end; for j:=0 to random(10) do buf2[j]:=buf1[j]; for j:=1 to 10000 do CompareBytePatch(buf1,buf2,len); // or System.CompareByte -- Regards, Martok Ceterum censeo b32079 esse sanandam. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel