https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125876
Bug ID: 125876
Summary: [13/14/15/16/17 Regression] x86: register-source
vmovddup spilled to the stack instead of using the
register form
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: Ashwin.Godbole at amd dot com
CC: Sarvesh.Chandra at amd dot com, vekumar at gcc dot gnu.org
Target Milestone: ---
Host: x86_64-linux-gnu
Target: x86_64-linux-gnu
Created attachment 64770
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64770&action=edit
Fix (patchfile)
Since GCC 13, a 256/512 bit movddup (_mm512_movedup_pd / _mm256_movedup_pd,
whose source is already in a register is spilled to the stack and reloaded
via the memory form of vmovddup, instead of the register form
"vmovddup %zmm,%zmm" / "%ymm,%ymm".
GCC 12.x is unaffected.
$ gcc -O2 -mavx512f -S test.c
f512 actual : vmovapd %zmm0,-64(%rsp); vmovddup -64(%rsp),%zmm0 (+ frame)
f512 expected: vmovddup %zmm0,%zmm0 (clang, GCC
12)
Bisected to r13-3587-g4acc4c2be84 "Fix incorrect digit constraint
[PR target/107057]"; the parent r13-3586-g5c5ef2f9ab5 is good. That commit
turned avx512f_movddup512 / avx_movddup256 into define_insns whose operand 1
uses a memory-only "m" constraint, while the predicate is nonimmediate_operand,
so LRA spills a register source to satisfy it. The sibling unpcklpd patterns
use "vm".
Tested on x86_64-linux-gnu (AMD Zen 4).