https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125148

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2026-05-02 00:00:00         |2026-5-11
           Assignee|unassigned at gcc dot gnu.org      |tnfchris at gcc dot 
gnu.org
             Status|WAITING                     |ASSIGNED
           Keywords|needs-source                |
            Summary|[16/17 Regression] Highway  |[16/17 Regression] Highway
                   |SVE256 test fails on GCC    |SVE256 test fails since
                   |16.0.1 but passes on GCC    |g:210d06502f22964c7214586c5
                   |15.2.1                      |4f8eb54a6965bfd

--- Comment #10 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Confirmed.

reproducer

#pragma GCC aarch64 "arm_sve.h"
#pragma GCC target "+sve2"
int a();
bool b() {
  signed char *g = new signed char;
  char c = 0;
  if (a()) {
    g[0] = 1;
    c = 2;
  }
  svint8_t d = svld1_s8(svptrue_pat_b8(SV_VL1), g);
  svbool_t e = svcmpgt_s8(svptrue_b8(), d, svdup_n_s8(0));
  bool f = svptest_any(svptrue_pat_b8(SV_VL1), e);
  if (f && c != 2)
    __builtin_abort();
  return f;
}

compiled with -O2 -std=c++20 generates an invalid constant vector
initialization:

movi    v30.8b, 0x1

instead of

        mov     w0, 1
        strb    w0, [x19]
        ptrue   p7.b, vl1
        ld1b    z30.b, p7/z, [x19]

which was dumb but vaid.

Bisected to

commit 210d06502f22964c7214586c54f8eb54a6965bfd
Author: Jennifer Schmitz <[email protected]>
Date:   Fri Feb 14 00:46:13 2025 -0800

    AArch64: Fold SVE load/store with certain ptrue patterns to LDR/STR.

    SVE loads/stores using predicates that select the bottom 8, 16, 32, 64,
    or 128 bits of a register can be folded to ASIMD LDR/STR, thus avoiding the
    predicate.
    For example,
    svuint8_t foo (uint8_t *x) {
      return svld1 (svwhilelt_b8 (0, 16), x);
    }
    was previously compiled to:
    foo:
            ptrue   p3.b, vl16
            ld1b    z0.b, p3/z,
            ret

    and is now compiled to:
    foo:
            ldr     q0,
            ret


Mine.

Reply via email to