http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000
--- Comment #6 from H.J. Lu <hjl.tools at gmail dot com> 2010-12-18 15:40:38 UTC --- (In reply to comment #5) > (In reply to comment #3) > > Compiled like so: > > $ gcc-4.4.2 -S -O2 sha256_4way.i -o sha256_4way-44.s > > $ gcc-4.5.0 -S -O2 sha256_4way.i -o sha256_4way-45.s > > > > $ grep -c call *.s > > sha256_4way-44.s:0 > > sha256_4way-45.s:484 > > $ grep call *.s|head > > sha256_4way-45.s: call ROTR > > sha256_4way-45.s: call ROTR > > sha256_4way-45.s: call ROTR > > sha256_4way-45.s: call ROTR > > sha256_4way-45.s: call ROTR > > sha256_4way-45.s: call ROTR > > sha256_4way-45.s: call ROTR > > sha256_4way-45.s: call ROTR > > sha256_4way-45.s: call ROTR > > sha256_4way-45.s: call ROTR > > $ > > > > ROTR should have been inlined: > > > > static inline __m128i ROTR(__m128i x, const int n) { > > return _mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n); > > } > > > > This probably explains the slowdown. > > This is caused by revision 151511: > > http://gcc.gnu.org/ml/gcc-cvs/2009-09/msg00257.html It is fixed by revision 166517: http://gcc.gnu.org/ml/gcc-cvs/2010-11/msg00405.html