https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111125

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
gcc.dg/vect/pr53773.c is interesting - we vectorize the function to

  <bb 2> [local count: 118111600]:
  _20 = {integral_4(D), decimal_5(D)};
  if (power_ten_6(D) > 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 3> [local count: 955630224]:
  # power_ten_19 = PHI <power_ten_11(3), power_ten_6(D)(2)>
  # vect_integral_15.4_1 = PHI <vect_integral_9.5_12(3), _20(2)>
  vect_integral_9.5_12 = vect_integral_15.4_1 * { 10, 10 };
  power_ten_11 = power_ten_19 + -1;
  if (power_ten_11 != 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 4> [local count: 118111600]:
  # vect_integral_16.7_21 = PHI <vect_integral_9.5_12(3), _20(2)>
  _22 = VIEW_CONVERT_EXPR<vector(2) unsigned int>(vect_integral_16.7_21);
  _23 = .REDUC_PLUS (_22); [tail call]
  _24 = (int) _23;
  return _24;

where loop vectorization fails because

/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr53773.c:9:20: note: 
Analyze phi: integral_15 = PHI <integral_9(6), integral_4(D)(5)>
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr53773.c:9:20: missed: 
Peeling for epilogue is not supported for nonlinear induction except neg when
iteration count is unknown.
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr53773.c:9:20: missed:  not
vectorized: can't create required epilog loop

loop vectorization doesn't try SLP here because we only SLP reduction groups,
not induction groups.

So I think this vectorization is quite nice, possibly even better than
the loop vectorization we expect.  generated code:

foo:
.LFB0:
        .cfi_startproc
        fmov    s31, w0
        ins     v31.s[1], w1
        cmp     w2, 0
        ble     .L2
        movi    v30.2s, 0xa
        .p2align 3,,7
.L3:
        mul     v31.2s, v31.2s, v30.2s
        subs    w2, w2, #1
        bne     .L3
.L2:
        addp    v31.2s, v31.2s, v31.2s
        fmov    w0, s31
        ret

the path for power_ten == 0 is of course sub-optimal.  Note it's again
determined not profitable with costing (we do not try to weight stmts
based on profile, thus in-loop stmts cost the same as out-of-loop stmts).

I'm going to adjust the testcase, disabling BB vectorization.

Reply via email to