http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089
--- Comment #29 from Oleg Endo <olegendo at gcc dot gnu.org> 2013-02-16 11:36:37 UTC --- Another case taken from CSiBE / bzip2, where reusing the intermediate shift result would be better: void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 ) { n->b[7] = (UChar)((hi32 >> 24) & 0xFF); n->b[6] = (UChar)((hi32 >> 16) & 0xFF); n->b[5] = (UChar)((hi32 >> 8) & 0xFF); n->b[4] = (UChar) (hi32 & 0xFF); /* n->b[3] = (UChar)((lo32 >> 24) & 0xFF); n->b[2] = (UChar)((lo32 >> 16) & 0xFF); n->b[1] = (UChar)((lo32 >> 8) & 0xFF); n->b[0] = (UChar) (lo32 & 0xFF); */ } on rev 196091 with -O2 -m4 compiles to: mov r6,r0 shlr16 r0 shlr8 r0 mov.b r0,@(7,r4) mov r6,r0 shlr16 r0 mov.b r0,@(6,r4) mov r6,r0 shlr8 r0 mov.b r0,@(5,r4) mov r6,r0 mov.b r0,@(4,r4) which would be better as: mov r6,r0 mov.b r0,@(4,r4) shlr8 r0 mov.b r0,@(5,r4) shlr8 r0 mov.b r0,@(6,r4) shlr8 r0 mov.b r0,@(7,r4) this would require reordering of the mem stores, which should be OK to do if the mem is not volatile. Reordering the stores manually: void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 ) { n->b[4] = (UChar) (hi32 & 0xFF); n->b[5] = (UChar)((hi32 >> 8) & 0xFF); n->b[6] = (UChar)((hi32 >> 16) & 0xFF); n->b[7] = (UChar)((hi32 >> 24) & 0xFF); } still results in: mov r6,r0 mov.b r0,@(4,r4) mov r6,r0 shlr8 r0 mov.b r0,@(5,r4) mov r6,r0 shlr16 r0 mov.b r0,@(6,r4) mov r6,r0 shlr16 r0 shlr8 r0 mov.b r0,@(7,r4) ... at least this case should be handled, I think.