https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
Uroš Bizjak <ubizjak at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |--- --- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to rsand...@gcc.gnu.org from comment #8) > Fixed for the reduced testcase. Please reopen if there's still a problem > with the SPEC test itself. Please note that when the testcase from the comment #5 is compiled with "-march=skylake -O2 -mavx512f", then a vzeroupper before the call to "foo" is now missing: bar: pushq %rbp movq %rsp, %rbp andq $-32, %rsp subq $32, %rsp vmovdqa x1(%rip), %ymm0 vmovdqa %ymm0, (%rsp) call foo vmovdqa (%rsp), %ymm0 vmovdqa %ymm0, x3(%rip) vzeroupper leave ret gcc-9.2.1 compiles the function to: bar: pushq %rbp movq %rsp, %rbp andq $-32, %rsp subq $32, %rsp vmovdqa x1(%rip), %ymm1 vmovdqa %ymm1, (%rsp) vzeroupper <---- here call foo vmovdqa (%rsp), %ymm1 vmovdqa %ymm1, x3(%rip) vzeroupper leave ret (I would also expect that %ymm 16+ is uses as a temporary, as it is not clobbered by a vzeroupper in "foo").