https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100048

            Bug ID: 100048
           Summary: [10/11 Regression] Wrongful CSE'ing of SVE predicates.
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-*

The following testcase

#include "arm_sve.h"

void foo(svfloat16_t in, float16_t *dst) {
  const svbool_t pg_q0 = svdupq_n_b16(1, 0, 1, 0, 0, 0, 0, 0);
  const svbool_t pg_f0 = svdupq_n_b16(1, 0, 0, 0, 0, 0, 0, 0);
  dst[0] = svaddv_f16(pg_f0, in);
  dst[1] = svaddv_f16(pg_q0, in);
}

generates the right code at -O1 with -march=armv8-a+sve but generates wrong
code at -O2.

>From this these expands are created

(insn 22 21 23 2 (set (reg:VNx8BI 100)
        (subreg:VNx8BI (reg:VNx2BI 103) 0))
     (expr_list:REG_EQUAL (const_vector:VNx8BI [
                (const_int 1 [0x1])
                (const_int 0 [0])
                (const_int 1 [0x1])
                (const_int 0 [0]) repeated x5
            ])
        (nil)))

and

(insn 15 14 16 2 (set (reg:VNx8BI 96)
        (subreg:VNx8BI (reg:VNx2BI 99) 0))
     (expr_list:REG_EQUAL (const_vector:VNx8BI [
                (const_int 1 [0x1])
                (const_int 0 [0]) repeated x7
            ])
        (nil)))

where the subregs are paradoxical.  These incorrect paradoxical subregs cause
CSE to think these two predicates are the same.

As such it CSEs them away into

foo:
        pfalse  p2.b
        ptrue   p1.d, all
        trn1    p1.d, p1.d, p2.d
        faddv   h1, p1, z0.h
        str     h1, [x0]
        ptrue   p0.s, all
        trn1    p0.d, p0.d, p2.d
        faddv   h0, p0, z0.h
        str     h0, [x0, 2]
        ret

instead of the expected

foo:
        pfalse  p2.b
        ptrue   p1.d, all
        ptrue   p0.s, all
        trn1    p1.s, p1.s, p2.s
        trn1    p0.s, p0.s, p2.s
        faddv   h1, p1, z0.h
        faddv   h0, p0, z0.h
        str     h1, [x0]
        str     h0, [x0, 2]
        ret

Reply via email to