http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43653
--- Comment #16 from Uros Bizjak <ubizjak at gmail dot com> 2011-02-17 21:05:11 UTC --- The assembly from -O1 -ftree-vectorize -msse3 shows another opportunity for enhancement PR19398 (secondary reloads don't consider "m" alternatives): .LFB0: .cfi_startproc subq $416, %rsp .cfi_def_cfa_offset 424 movq .LC1(%rip), %rax leaq (%rsp,%rax), %rax movq %rax, -112(%rsp) (*) movq -112(%rsp), %xmm1 (*) punpcklqdq %xmm1, %xmm1 movdqa %xmm1, %xmm0 leaq -104(%rsp), %rax leaq 408(%rsp), %rdx .L2: Looking at the definition of (define_insn "*vec_dupv2di_sse3" [(set (match_operand:V2DI 0 "register_operand" "=x,x") (vec_duplicate:V2DI (match_operand:DI 1 "nonimmediate_operand" " 0,m")))] "TARGET_SSE3" "@ punpcklqdq\t%0, %0 movddup\t{%1, %0|%0, %1}" [(set_attr "type" "sselog1") (set_attr "mode" "TI,DF")]) the two insns marked with (*) can be substituted with the second alternative: movddup -112(%rsp), %xmm1.