https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100088

            Bug ID: 100088
           Summary: ymm store split into two xmm stores
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yyc1992 at gmail dot com
  Target Milestone: ---

The following code

```
__attribute__((target("avx2")))
void fill_avx2(double *__restrict__ data, int n, double value)
{
    for (int i = 0; i < n * 16; i++) {
        data[i] = value;
    }
}
```

compiles to

```
fill_avx2:
        sall    $4, %esi
        testl   %esi, %esi
        jle     .L5
        shrl    $2, %esi
        vbroadcastsd    %xmm0, %ymm0
        movl    %esi, %eax
        salq    $5, %rax
        addq    %rdi, %rax
        .p2align 4,,10
        .p2align 3
.L3:
        vmovupd %xmm0, (%rdi)
        vextractf128    $0x1, %ymm0, 16(%rdi)
        addq    $32, %rdi
        cmpq    %rax, %rdi
        jne     .L3
        vzeroupper
.L5:
        ret
```

Note that AFAICT

```
        vmovupd %xmm0, (%rdi)
        vextractf128    $0x1, %ymm0, 16(%rdi)
```

is equivalent to

```
        vmovupd %ymm0, (%rdi)
```

This issue does not exist for sse or avx512f. Setting `-march=haswell` or
`-mtune=haswell` on the command line also seems to fix this but neither of
these works when added to the target attribute.

Reply via email to