https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115087
Bug ID: 115087 Summary: dead block not eliminated in SVE intrinsics code Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- The testcase in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 had another "regression" in that the same loop seems to have been peeled but only for 1 iteration. However the loops are identical so the peeling is weird. This was caused by f5fb9ff2396fd41fdd2e6d35a412e936d2d42f75 is the first bad commit commit f5fb9ff2396fd41fdd2e6d35a412e936d2d42f75 Author: Jan Hubicka <j...@suse.cz> Date: Fri Jul 28 16:18:32 2023 +0200 loop-split improvements, part 3 extend tree-ssa-loop-split to understand test of the form if (i==0) and if (i!=0) which triggers only during the first iteration. Naturally we should also be able to trigger last iteration or split into 3 cases if the test indeed can fire in the middle of the loop. however the commit is innocent, it looks like we're not below a magic threshold that causes the issue. However a simpler testcase: #include <arm_sve.h> void test(int size, char uplo, float16_t *p_mat) { int col_stride = uplo == 'u' ? 1 : size; auto *a = &p_mat[0]; auto pg = svptrue_b16(); for (int j = 0; j < size; ++j) { auto *a_j = &a[j]; if (j > 0) { int col_i = j + 1; auto v_a_ji_0 = svld1_vnum_f16(pg, (const float16_t *)&a_j[col_i], 0); v_a_ji_0 = svcmla_f16_x(pg, v_a_ji_0, v_a_ji_0, v_a_ji_0, 180); } int col_i = j * col_stride; auto v_a_ji_0 = svld1_vnum_f16(pg, (const float16_t *)&a_j[col_i], 0); auto v_old_a_jj_0 = svld1_vnum_f16(pg, (const float16_t *)&a_j[j], 0); v_a_ji_0 = svmul_f16_x(pg, v_old_a_jj_0, v_a_ji_0); svst1_vnum_f16(pg, (float16_t *)&a_j[col_i], 0, v_a_ji_0); } } shows that the change in the patch is a positive one. The issue seems to be that GCC does not see the if block as dead code: if (j > 0) { int col_i = j + 1; auto v_a_ji_0 = svld1_vnum_f16(pg, (const float16_t *)&a_j[col_i], 0); v_a_ji_0 = svcmla_f16_x(pg, v_a_ji_0, v_a_ji_0, v_a_ji_0, 180); } is dead because v_a_ji_0 is overwritten before use. _29 = MEM <__SVFloat16_t> [(__fp16 *)_88 + ivtmp.10_52 * 2]; svcmla_f16_x ({ -1, 0, ... }, _29, _29, _29, 180); _29 is dead, but I guess it's not eliminated because it doesn't know what svcmla_f16_x does. But are these intrinsics not marked as CONST|PURE ? We finally eliminate it at RTL level but I think we should mark these intrinsics as ECF_CONST