https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106146
Bug ID: 106146 Summary: [instcombine] a redundant movprfx insn compare to llvm Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: zhongyunde at huawei dot com Target Milestone: --- * test case, gcc has a redundant movprfx insn in the kernel loop body, see detail https://gcc.godbolt.org/z/8vG4PzM18. ``` #include <arm_sve.h> #define ARRAY_ALIGNMENT 64 #define LEN_2D 128ll #define LEN_1D 8000ll #define iterations 10000 typedef float real_t; __attribute__((aligned(ARRAY_ALIGNMENT))) real_t a[LEN_1D],b[LEN_1D]; void s113_tuned(void) { for (int nl = 0; nl < 4*iterations; nl++) { int64_t i = 1; svbool_t pg = svwhilelt_b32(i, LEN_1D); svfloat32_t a0v = svdup_f32(a[0]); do { svfloat32_t bv = svld1_f32(pg, &b[i]); svfloat32_t res = svadd_z(pg, bv, a0v); svst1(pg, &a[i], res); i += svcntw(); pg = svwhilelt_b32(i, LEN_1D); } while (svptest_any(svptrue_b32(), pg)); } return; } ``` * gcc's kernel loop ``` .L2: ld1w z0.s, p0/z, [x3, x0, lsl 2] movprfx z0.s, p0/z, z0.s fadd z0.s, p0/m, z0.s, z1.s st1w z0.s, p0, [x1, x0, lsl 2] incw x0 whilelt p0.s, x0, x2 b.any .L2 ``` * llvm's kernel loop: ``` .LBB0_2: // Parent Loop BB0_1 Depth=1 ld1w { z1.s }, p2/z, [x13, x14, lsl #2] fadd z1.s, p2/m, z1.s, z0.s st1w { z1.s }, p2, [x12, x14, lsl #2] add x14, x10, x14 whilelt p2.s, x14, x9 b.ne .LBB0_2 ```