https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812

--- Comment #18 from Uroš Bizjak <ubizjak at gmail dot com> ---
One interesting observation:

clang is able to do this:

  0.09 │     │  vmovddup     -0x8(%rdx,%rsi,1),%xmm3              ▒
  ...
  0.11 │     │  vfmadd231sd  %xmm2,%xmm3,%xmm1                    ▒
  ...
  0.74 │     │  vfmadd231pd  %xmm2,%xmm3,%xmm0                    ▒

It figures out that duplicated V2DFmode value in %xmm3 can also be accessed in
the same register as DFmode value.

OTOH, current gcc does:

        vmovsd  (%rsi,%rax,8), %xmm1
        ...
        vmovddup        %xmm1, %xmm4
        ...
        vfmadd231pd     %xmm4, %xmm0, %xmm2
        ...
        vfmadd231sd     %xmm1, %xmm0, %xmm3

The above code needs two registers.

Reply via email to