https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34011
--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> --- good function: .L3: movdqu (%rdi,%rax), %xmm0 pslld %xmm1, %xmm0 movups %xmm0, (%rsi,%rax) addq $16, %rax cmpq $1024, %rax jne .L3 bad function: .L11: movdqu (%rdi,%rax), %xmm0 movdqu (%rsi,%rax), %xmm2 pslld %xmm1, %xmm0 por %xmm2, %xmm0 movups %xmm0, (%rsi,%rax) addq $16, %rax cmpq $1024, %rax jne .L11 Looks good to me now.