[Bug middle-end/111125] [14 Regression] tree-ssa.exp and vect.exp failures after commit r14-3281-g99b5921bfc8f91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25 --- Comment #8 from Thiago Jung Bauermann --- Confirmed. All the failures I reported are fixed in trunk. Thank you!
[Bug middle-end/111125] [14 Regression] tree-ssa.exp and vect.exp failures after commit r14-3281-g99b5921bfc8f91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Richard Biener --- Should be all fixed now.
[Bug middle-end/111125] [14 Regression] tree-ssa.exp and vect.exp failures after commit r14-3281-g99b5921bfc8f91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25 --- Comment #6 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:43da77a4f1636280c4259402c9c2c543e6ec6c0b commit r14-3444-g43da77a4f1636280c4259402c9c2c543e6ec6c0b Author: Richard Biener Date: Thu Aug 24 11:10:43 2023 +0200 tree-optimization/25 - avoid BB vectorization in novector loops When a loop is marked with #pragma GCC novector the following makes sure to also skip BB vectorization for contained blocks. That avoids gcc.dg/vect/bb-slp-29.c failing on aarch64 because of extra BB vectorization therein. I'm not specifically dealing with sub-loops of novector loops, the desired semantics isn't documented. PR tree-optimization/25 * tree-vect-slp.cc (vect_slp_function): Split at novector loop entry, do not push blocks in novector loops.
[Bug middle-end/111125] [14 Regression] tree-ssa.exp and vect.exp failures after commit r14-3281-g99b5921bfc8f91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25 --- Comment #5 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:e80f7c13f64e10c6a3354c5d6b42da60b21ed0b8 commit r14-3440-ge80f7c13f64e10c6a3354c5d6b42da60b21ed0b8 Author: Richard Biener Date: Thu Aug 24 10:30:12 2023 +0200 tree-optimization/25 - properly cost BB reduction remain stmt handling We assume that all root stmts which compose the total reduction chain are vectorized but fail to account for the cost of adding back the scalar defs we are not vectorizing. The following rectifies this, fixing the gcc.dg/tree-ssa/slsr-11.c FAIL on aarch64. PR tree-optimization/25 * tree-vect-slp.cc (vectorizable_bb_reduc_epilogue): Account for the remain_defs processing.
[Bug middle-end/111125] [14 Regression] tree-ssa.exp and vect.exp failures after commit r14-3281-g99b5921bfc8f91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25 --- Comment #4 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:308e716266787f84ba4a47546317dae83be8901c commit r14-3436-g308e716266787f84ba4a47546317dae83be8901c Author: Richard Biener Date: Thu Aug 24 10:55:06 2023 +0200 testsuite/25 - disable BB vectorization for the test The test is for loop vectorization producing non-canonical multiplications. We can now BB vectorize the whole function when the target supports .REDUC_PLUS for V2SImode but we don't have a dejagnu selector for that. Disable BB vectorization like we disabled epilogue vectorization. PR testsuite/25 * gcc.dg/vect/pr53773.c: Disable BB vectorization.
[Bug middle-end/111125] [14 Regression] tree-ssa.exp and vect.exp failures after commit r14-3281-g99b5921bfc8f91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25 --- Comment #3 from Richard Biener --- gcc.dg/vect/pr53773.c is interesting - we vectorize the function to [local count: 118111600]: _20 = {integral_4(D), decimal_5(D)}; if (power_ten_6(D) > 0) goto ; [89.00%] else goto ; [11.00%] [local count: 955630224]: # power_ten_19 = PHI # vect_integral_15.4_1 = PHI vect_integral_9.5_12 = vect_integral_15.4_1 * { 10, 10 }; power_ten_11 = power_ten_19 + -1; if (power_ten_11 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 118111600]: # vect_integral_16.7_21 = PHI _22 = VIEW_CONVERT_EXPR(vect_integral_16.7_21); _23 = .REDUC_PLUS (_22); [tail call] _24 = (int) _23; return _24; where loop vectorization fails because /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr53773.c:9:20: note: Analyze phi: integral_15 = PHI /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr53773.c:9:20: missed: Peeling for epilogue is not supported for nonlinear induction except neg when iteration count is unknown. /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr53773.c:9:20: missed: not vectorized: can't create required epilog loop loop vectorization doesn't try SLP here because we only SLP reduction groups, not induction groups. So I think this vectorization is quite nice, possibly even better than the loop vectorization we expect. generated code: foo: .LFB0: .cfi_startproc fmovs31, w0 ins v31.s[1], w1 cmp w2, 0 ble .L2 moviv30.2s, 0xa .p2align 3,,7 .L3: mul v31.2s, v31.2s, v30.2s subsw2, w2, #1 bne .L3 .L2: addpv31.2s, v31.2s, v31.2s fmovw0, s31 ret the path for power_ten == 0 is of course sub-optimal. Note it's again determined not profitable with costing (we do not try to weight stmts based on profile, thus in-loop stmts cost the same as out-of-loop stmts). I'm going to adjust the testcase, disabling BB vectorization.
[Bug middle-end/111125] [14 Regression] tree-ssa.exp and vect.exp failures after commit r14-3281-g99b5921bfc8f91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25 --- Comment #2 from Richard Biener --- For gcc.dg/vect/bb-slp-29.c we are now vectorizing #pragma GCC novector for (i = 0; i < N/2; i++) { if (dst[i] != A * src[i] + B * src[i+1]) abort (); } in particular the multiplication and the addition (but not the load which had predictive commoning applied). When cost modeling is enabled this vectorization is not deemed profitable (but the vect testsuite runs with -fno-vect-cost-model). I wonder if we want to excempt basic blocks within loops marked with novector from BB vectorization.
[Bug middle-end/111125] [14 Regression] tree-ssa.exp and vect.exp failures after commit r14-3281-g99b5921bfc8f91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2023-08-24 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- For gcc.dg/tree-ssa/slsr-11.c we vectorize the reduction to [local count: 1073741824]: _15 = {s_5(D), s_5(D)}; vect_a3_11.3_16 = _15 * { 6, 4 }; vect__3.4_17 = (vector(2) long int) vect_a3_11.3_16; a1_6 = s_5(D) * 2; _18 = VIEW_CONVERT_EXPR(vect__3.4_17); _19 = .REDUC_PLUS (_18); _20 = (unsigned long) a1_6; _21 = (unsigned long) c_7(D); _29 = _21 * 2; _31 = _19 + _29; _30 = _20 + _21; _27 = _30 + _31; _28 = (long int) _27; return _28; note: Cost model analysis for part in loop 0: Vector cost: 9 Scalar cost: 9 doesn't look profitable. I think there's something off with the scalar accounting, I'll have a look there.
[Bug middle-end/111125] [14 Regression] tree-ssa.exp and vect.exp failures after commit r14-3281-g99b5921bfc8f91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0 Keywords||testsuite-fail Summary|tree-ssa.exp and vect.exp |[14 Regression] |failures after commit |tree-ssa.exp and vect.exp |r14-3281-g99b5921bfc8f91|failures after commit ||r14-3281-g99b5921bfc8f91