[Bug tree-optimization/114346] New: vectorizer generates the same IV twice

tnfchris at gcc dot gnu.org via Gcc-bugs Thu, 14 Mar 2024 21:21:45 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114346


            Bug ID: 114346
           Summary: vectorizer generates the same IV twice
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The following example:

---
double f(int n, double *data, double b) {
    double res = b;

    for (int i=0;i<n;i++) {
        res += data[i] * i;
    }

    return res;
}
---

generates at -Ofast -march=armv9-a this code:


        cntd    x5
        mov     z28.s, w5
        index   z30.d, #0, #1
.L4:
        incw    x2
        add     z1.s, z30.s, z28.s
        ld1d    z25.d, p7/z, [x3, #1, mul vl]
        mov     z26.d, z30.d
        ld1d    z2.d, p7/z, [x3]
        sxtw    z1.d, p7/m, z1.d
        sxtw    z26.d, p7/m, z26.d
        scvtf   z1.d, p7/m, z1.d
        scvtf   z26.d, p7/m, z26.d
        incb    x3, all, mul #2
        fmla    z29.d, p7/m, z25.d, z1.d
        incw    z30.s
        fmla    z31.d, p7/m, z2.d, z26.d
        cmp     w4, w2
        bcs     .L4

note that the incw is calculating the vectorized IV of i, initialized and z28
is filled with the VL.

so the incw z30.s and the add z1.s, z30.s, z28.s are calculating the same
thing.

there are other issues with this codegen but this ticket is about the double
IVs.

The vectorizer genertes:

  # vect_vec_iv_.7_45 = PHI <_49(6), { 0, 1, 2, ... }(15)>
  _48 = vect_vec_iv_.7_45 + { POLY_INT_CST [2, 2], ... };
  _71 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>(vect_vec_iv_.7_45);
  _72 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>({ POLY_INT_CST [4, 4],
... });
  _73 = _71 + _72;
  _49 = VIEW_CONVERT_EXPR<vector([2,2]) int>(_73);

so it looks like _48 and _49 are the same value, except that _48 is done as
32-bit IV and _49 is calculated as a 64-bit one and truncated to 32?

[Bug tree-optimization/114346] New: vectorizer generates the same IV twice

Reply via email to