Compiling and executing the code of Nick Piggin at
http://gcc.gnu.org/ml/gcc/2008-02/msg00601.html

in my old Athlon64 Venice 3200+ 2.0 GHz,
3 GiB DDR400, 32-bit kernel, gcc 3.4.6, i got

$ gcc -O3 -falign-functions=64 -falign-loops=64 -falign-jumps=64
-falign-labels=64 -march=i686 foo.c -o foo
$ ./foo
 no deps,   predictable -- C    code took  10.08ns per iteration
 no deps,   predictable -- cmov code took  11.07ns per iteration
 no deps,   predictable -- jmp  code took  11.25ns per iteration
has deps,   predictable -- C    code took  26.66ns per iteration
has deps,   predictable -- cmov code took  35.44ns per iteration
has deps,   predictable -- jmp  code took  18.89ns per iteration
 no deps, unpredictable -- C    code took  10.17ns per iteration
 no deps, unpredictable -- cmov code took  11.07ns per iteration
 no deps, unpredictable -- jmp  code took  22.51ns per iteration
has deps, unpredictable -- C    code took  104.02ns per iteration
has deps, unpredictable -- cmov code took  107.19ns per iteration
has deps, unpredictable -- jmp  code took  176.18ns per iteration
$

This machine concludes that ( > means slightly better than, >> better )
1. jmp >> C >> cmov when it's predictable and has data dependencies.
2. C > cmov > jmp when it's predictable and has not data dependencies.
3. C > cmov >> jmp when it's unpredictable and has not data dependencies.
4. C > cmov >> jmp when it's unpredictable and has not data dependencies.

* Be careful, jmp is the worst when it's unpredictable
     (with or without data dependencies).
* But conditional jmp is the best when it's
     predictable AND has data dependencies.

   ;)

Reply via email to