Compiling and executing the code of Nick Piggin at http://gcc.gnu.org/ml/gcc/2008-02/msg00601.html
in my old Athlon64 Venice 3200+ 2.0 GHz, 3 GiB DDR400, 32-bit kernel, gcc 3.4.6, i got $ gcc -O3 -falign-functions=64 -falign-loops=64 -falign-jumps=64 -falign-labels=64 -march=i686 foo.c -o foo $ ./foo no deps, predictable -- C code took 10.08ns per iteration no deps, predictable -- cmov code took 11.07ns per iteration no deps, predictable -- jmp code took 11.25ns per iteration has deps, predictable -- C code took 26.66ns per iteration has deps, predictable -- cmov code took 35.44ns per iteration has deps, predictable -- jmp code took 18.89ns per iteration no deps, unpredictable -- C code took 10.17ns per iteration no deps, unpredictable -- cmov code took 11.07ns per iteration no deps, unpredictable -- jmp code took 22.51ns per iteration has deps, unpredictable -- C code took 104.02ns per iteration has deps, unpredictable -- cmov code took 107.19ns per iteration has deps, unpredictable -- jmp code took 176.18ns per iteration $ This machine concludes that ( > means slightly better than, >> better ) 1. jmp >> C >> cmov when it's predictable and has data dependencies. 2. C > cmov > jmp when it's predictable and has not data dependencies. 3. C > cmov >> jmp when it's unpredictable and has not data dependencies. 4. C > cmov >> jmp when it's unpredictable and has not data dependencies. * Be careful, jmp is the worst when it's unpredictable (with or without data dependencies). * But conditional jmp is the best when it's predictable AND has data dependencies. ;)