https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116783

            Bug ID: 116783
           Summary: [14/15 Regression] Wrong code at -O2 with late pair
                    fusion pass (wrong alias analysis)
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

Created attachment 59150
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59150&action=edit
Executable reduced testcase for the testsuite

The attached executable reproducer (exec.cc) is reduced from a Debian package
(kf6-ktexttemplate) which is getting miscompiled on AArch64 (see
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1080974).

The problem can be reproduced on aarch64 as follows:

$ g++ exec.cc -O2 -fstack-protector-strong -fno-late-combine-instructions
-mno-late-ldp-fusion
$ ./a.out 
$ g++ exec.cc -O2 -fstack-protector-strong -fno-late-combine-instructions 
$ ./a.out 
Aborted

Note that late-combine hides the problem on the trunk, such that
-fno-late-combine-instructions isn't needed to reproduce the problem with GCC
14 (but is on trunk).

Looking at what's going on in late ldp_fusion, I see only a single pair getting
formed:

fusing pair [L=1] (92,94), base=19, hazards: (-,106), move_range: (94,94)

and we have the following RTL fragment:

  174: x1:DI=sp:DI+0x200
   92: v30:V4SI=[x1:DI-0xb8]
      REG_DEAD x1:DI
  176: x1:DI=x19:DI
  106: [x1:DI]=const_vector
      REG_DEAD x1:DI
  177: x1:DI=sp:DI+0x200
   94: v29:V4SI=[x19:DI+0x10]
      REG_EQUIV [x19:DI+0x10]

now looking back to the last assignment to x19, we have:

  x19:DI=sp:DI+0x148

so substituting through, we have:

  x1 - 0xb8 = sp + 0x200 - 0xb8 = sp + 0x148 = x19

i.e. the load i92 is to the exact same address as the store i106, yet we fail
to detect this aliasing hazard (in the forward direction) and thus form the
load pair at i94, incorrectly re-ordering the load (i92) over the store.

The problem seems to be not necessarily in pair-fusion.cc itself, however,
since memory_modified_in_insn_p fails to return true for the following
arguments:

(rr) pr mem
(mem/c:V4SI (plus:DI (reg:DI 1 x1 [195])
        (const_int -184 [0xffffffffffffff48])) [0 D.5008.d+0 S16 A64])
(rr) pr insn
(insn 106 176 177 5 (set (mem/c:V4SI (reg:DI 1 x1 [198]) [0 MEM <unsigned
char[25]> [(struct Private *)&D.5008]+0 S16 A64])
        (const_vector:V4SI [
                (const_int 0 [0]) repeated x4
            ])) "exec.cc":20:13 discrim 1 1270 {*aarch64_simd_movv4si}
     (expr_list:REG_DEAD (reg:DI 1 x1 [198])
        (nil)))

where (naively) it looks like the MEM_EXPRs alias, so I would have expected the
alias analysis machinery to figure this out.

I'll try to dig into why memory_modified_in_insn_p ends up returning false
here.

Reply via email to