https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118873
Bug ID: 118873
Summary: -favoid-store-forwarding makes a mess out of a STLF
fail
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
The following testcase created from PR90579 shows that -favoid-store-forwarding
on x86_64 with -O2 -mavx2 doubles the number of STLF fails rather than doing
any good.
typedef int v4si __attribute__((vector_size(16)));
typedef int v8si __attribute__((vector_size(32)));
v8si a;
v4si b;
void foo (int *p)
{
v8si aa = a;
v4si bb = b;
*(v8si *)p = a;
*(v4si *)(p + 8) = b;
a = *(v8si *)(p + 4);
}
code generates to, at -O2 -mavx2
foo:
.LFB0:
.cfi_startproc
vmovdqa a(%rip), %ymm0
vmovdqa %ymm0, (%rdi) <--- store
vmovdqa b(%rip), %xmm0
vmovdqa %xmm0, 32(%rdi) <--- store
vmovdqa 16(%rdi), %ymm0 <--- STLF FAIL
vmovdqa %ymm0, a(%rip)
vzeroupper
and with -favoid-store-forwarding
foo:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
andq $-32, %rsp
vmovdqa a(%rip), %ymm0
vmovdqa %ymm0, (%rdi) <--- original first store
vmovdqa b(%rip), %xmm0
vmovdqa 16(%rdi), %ymm1 <--- STLF fail plus uninit memory read
vmovdqa %ymm1, -32(%rsp)
vmovdqa %xmm0, -16(%rsp)
vmovdqa -32(%rsp), %ymm2 <--- STLF fail newly introduced
vmovdqa %xmm0, 32(%rdi)
vmovdqa %ymm2, a(%rip)
vzeroupper
we introudce a read of uninitialized memory and the attempt to set the
upper part results in a spill:
Store forwarding avoided with bit inserts:
With sequence:
(insn 15 0 0 (set (subreg:V4SI (reg:V8SI 100 [ _3 ]) 16)
(reg:V4SI 104)) 2397 {movv4si_internal}
(nil))
subregs are not the canonical form for vector inserts, instead you are
expected to use (vec_concat (vec_select ...) (...)) IIRC.
The first load with the STLF fail should have been narrowed using knowledge
that reg:V4SI and reg:V8SI use the same register file (are tieable?)