https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115087

            Bug ID: 115087
           Summary: dead block not eliminated in SVE intrinsics code
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The testcase in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 had another
"regression" in that the same loop seems to have been peeled but only for 1
iteration.

However the loops are identical so the peeling is weird.

This was caused by

f5fb9ff2396fd41fdd2e6d35a412e936d2d42f75 is the first bad commit
commit f5fb9ff2396fd41fdd2e6d35a412e936d2d42f75
Author: Jan Hubicka <j...@suse.cz>
Date:   Fri Jul 28 16:18:32 2023 +0200

    loop-split improvements, part 3

    extend tree-ssa-loop-split to understand test of the form
     if (i==0)
    and
     if (i!=0)
    which triggers only during the first iteration.  Naturally we should
    also be able to trigger last iteration or split into 3 cases if
    the test indeed can fire in the middle of the loop.

however the commit is innocent, it looks like we're not below a magic threshold
that causes the issue.

However a simpler testcase:

#include <arm_sve.h>

void test(int size, char uplo, float16_t *p_mat)
{
  int col_stride = uplo == 'u' ? 1 : size;
  auto *a = &p_mat[0];
  auto pg = svptrue_b16();
  for (int j = 0; j < size; ++j) {
    auto *a_j = &a[j];
    if (j > 0) {
      int col_i = j + 1;
      auto v_a_ji_0 = svld1_vnum_f16(pg, (const float16_t *)&a_j[col_i], 0);
      v_a_ji_0 = svcmla_f16_x(pg, v_a_ji_0, v_a_ji_0, v_a_ji_0, 180);
    }

    int col_i = j * col_stride;
    auto v_a_ji_0 = svld1_vnum_f16(pg, (const float16_t *)&a_j[col_i], 0);
    auto v_old_a_jj_0 = svld1_vnum_f16(pg, (const float16_t *)&a_j[j], 0);
    v_a_ji_0 = svmul_f16_x(pg, v_old_a_jj_0, v_a_ji_0);

    svst1_vnum_f16(pg, (float16_t *)&a_j[col_i], 0, v_a_ji_0);
  }
}

shows that the change in the patch is a positive one.

The issue seems to be that GCC does not see the if block as dead code:

    if (j > 0) {
      int col_i = j + 1;
      auto v_a_ji_0 = svld1_vnum_f16(pg, (const float16_t *)&a_j[col_i], 0);
      v_a_ji_0 = svcmla_f16_x(pg, v_a_ji_0, v_a_ji_0, v_a_ji_0, 180);
    }

is dead because v_a_ji_0 is overwritten before use.

  _29 = MEM <__SVFloat16_t> [(__fp16 *)_88 + ivtmp.10_52 * 2];
  svcmla_f16_x ({ -1, 0, ... }, _29, _29, _29, 180);

_29 is dead, but I guess it's not eliminated because it doesn't know what
svcmla_f16_x does. But are these intrinsics not marked as CONST|PURE ?

We finally eliminate it at RTL level but I think we should mark these
intrinsics   as ECF_CONST

Reply via email to