http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000
Steven Bosscher <steven at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2010.12.18 12:39:26 Ever Confirmed|0 |1 --- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2010-12-18 12:39:26 UTC --- Compiled like so: $ gcc-4.4.2 -S -O2 sha256_4way.i -o sha256_4way-44.s $ gcc-4.5.0 -S -O2 sha256_4way.i -o sha256_4way-45.s $ grep -c call *.s sha256_4way-44.s:0 sha256_4way-45.s:484 $ grep call *.s|head sha256_4way-45.s: call ROTR sha256_4way-45.s: call ROTR sha256_4way-45.s: call ROTR sha256_4way-45.s: call ROTR sha256_4way-45.s: call ROTR sha256_4way-45.s: call ROTR sha256_4way-45.s: call ROTR sha256_4way-45.s: call ROTR sha256_4way-45.s: call ROTR sha256_4way-45.s: call ROTR $ ROTR should have been inlined: static inline __m128i ROTR(__m128i x, const int n) { return _mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n); } This probably explains the slowdown.