https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812
--- Comment #18 from Uroš Bizjak <ubizjak at gmail dot com> --- One interesting observation: clang is able to do this: 0.09 │ │ vmovddup -0x8(%rdx,%rsi,1),%xmm3 ▒ ... 0.11 │ │ vfmadd231sd %xmm2,%xmm3,%xmm1 ▒ ... 0.74 │ │ vfmadd231pd %xmm2,%xmm3,%xmm0 ▒ It figures out that duplicated V2DFmode value in %xmm3 can also be accessed in the same register as DFmode value. OTOH, current gcc does: vmovsd (%rsi,%rax,8), %xmm1 ... vmovddup %xmm1, %xmm4 ... vfmadd231pd %xmm4, %xmm0, %xmm2 ... vfmadd231sd %xmm1, %xmm0, %xmm3 The above code needs two registers.