https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100088
Bug ID: 100088 Summary: ymm store split into two xmm stores Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code ``` __attribute__((target("avx2"))) void fill_avx2(double *__restrict__ data, int n, double value) { for (int i = 0; i < n * 16; i++) { data[i] = value; } } ``` compiles to ``` fill_avx2: sall $4, %esi testl %esi, %esi jle .L5 shrl $2, %esi vbroadcastsd %xmm0, %ymm0 movl %esi, %eax salq $5, %rax addq %rdi, %rax .p2align 4,,10 .p2align 3 .L3: vmovupd %xmm0, (%rdi) vextractf128 $0x1, %ymm0, 16(%rdi) addq $32, %rdi cmpq %rax, %rdi jne .L3 vzeroupper .L5: ret ``` Note that AFAICT ``` vmovupd %xmm0, (%rdi) vextractf128 $0x1, %ymm0, 16(%rdi) ``` is equivalent to ``` vmovupd %ymm0, (%rdi) ``` This issue does not exist for sse or avx512f. Setting `-march=haswell` or `-mtune=haswell` on the command line also seems to fix this but neither of these works when added to the target attribute.