https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114346
Bug ID: 114346 Summary: vectorizer generates the same IV twice Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- The following example: --- double f(int n, double *data, double b) { double res = b; for (int i=0;i<n;i++) { res += data[i] * i; } return res; } --- generates at -Ofast -march=armv9-a this code: cntd x5 mov z28.s, w5 index z30.d, #0, #1 .L4: incw x2 add z1.s, z30.s, z28.s ld1d z25.d, p7/z, [x3, #1, mul vl] mov z26.d, z30.d ld1d z2.d, p7/z, [x3] sxtw z1.d, p7/m, z1.d sxtw z26.d, p7/m, z26.d scvtf z1.d, p7/m, z1.d scvtf z26.d, p7/m, z26.d incb x3, all, mul #2 fmla z29.d, p7/m, z25.d, z1.d incw z30.s fmla z31.d, p7/m, z2.d, z26.d cmp w4, w2 bcs .L4 note that the incw is calculating the vectorized IV of i, initialized and z28 is filled with the VL. so the incw z30.s and the add z1.s, z30.s, z28.s are calculating the same thing. there are other issues with this codegen but this ticket is about the double IVs. The vectorizer genertes: # vect_vec_iv_.7_45 = PHI <_49(6), { 0, 1, 2, ... }(15)> _48 = vect_vec_iv_.7_45 + { POLY_INT_CST [2, 2], ... }; _71 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>(vect_vec_iv_.7_45); _72 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>({ POLY_INT_CST [4, 4], ... }); _73 = _71 + _72; _49 = VIEW_CONVERT_EXPR<vector([2,2]) int>(_73); so it looks like _48 and _49 are the same value, except that _48 is done as 32-bit IV and _49 is calculated as a 64-bit one and truncated to 32?