On Mon, Aug 7, 2023 at 2:05 AM Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Thu, 3 Aug 2023 at 17:48, Richard Biener <rguent...@suse.de> wrote: > > > > On Thu, 3 Aug 2023, Richard Biener wrote: > > > > > On Thu, 3 Aug 2023, Richard Biener wrote: > > > > > > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote: > > > > > > > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > On Mon, 31 Jul 2023, Jeff Law wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote: > > > > > > > > The following delays sinking of loads within the same innermost > > > > > > > > loop when it was unconditional before. That's a not uncommon > > > > > > > > issue preventing vectorization when masked loads are not > > > > > > > > available. > > > > > > > > > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > > > > > > > > > > > I have a followup patch improving sinking that without this > > > > > > > > would > > > > > > > > cause more of the problematic sinking - now that we have a > > > > > > > > second > > > > > > > > sink pass after loop opts this looks like a reasonable approach? > > > > > > > > > > > > > > > > OK? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Richard. > > > > > > > > > > > > > > > > PR tree-optimization/92335 > > > > > > > > * tree-ssa-sink.cc (select_best_block): Before loop > > > > > > > > optimizations avoid sinking unconditional loads/stores > > > > > > > > in innermost loops to conditional executed places. > > > > > > > > > > > > > > > > * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing. > > > > > > > > * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c, > > > > > > > > expect predictive commoning to happen instead of sinking. > > > > > > > > * gcc.dg/vect/pr65947-3.c: Adjust. > > > > > > > I think it's reasonable -- there's probably going to be cases > > > > > > > where it's not > > > > > > > great, but more often than not I think it's going to be a > > > > > > > reasonable > > > > > > > heuristic. > > > > > > > > > > > > > > If there is undesirable fallout, better to find it over the > > > > > > > coming months than > > > > > > > next spring. So I'd suggest we go forward now to give more time > > > > > > > to find any > > > > > > > pathological cases (if they exist). > > > > > > > > > > > > Agreed, I've pushed this now. > > > > > Hi Richard, > > > > > After this patch (committed in > > > > > 399c8dd44ff44f4b496223c7cc980651c4d6f6a0), > > > > > pr65947-7.c "failed" for aarch64-linux-gnu: > > > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP > > > > > VECTORIZED" > > > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects > > > > > scan-tree-dump-not vect "LOOP VECTORIZED" > > > > > > > > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { > > > > > ! vect_fold_extract_last } } } } */ > > > > > > > > > > With your commit, condition_reduction in pr65947-7.c gets vectorized > > > > > regardless of vect_fold_extract_last, > > > > > which gates the above test (which is an improvement, because the > > > > > function didn't get vectorized before the commit). > > > > > > > > > > The attached patch thus removes the gating on vect_fold_extract_last, > > > > > and the test passes again. > > > > > OK to commit ? > > > > > > > > OK. > > > > > > Or wait - the loop doesn't vectorize on x86_64, so I guess one > > > critical target condition is missing. Can you figure out which? > > > > I see > > > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > > note: vect_is_simple_use: operand last_19 = PHI <last_8(7), 108(15)>, > > type of def: reduction > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > > note: vect_is_simple_use: vectype vector(4) int > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > > missed: multiple types in double reduction or condition reduction or > > fold-left reduction. > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1: > > missed: not vectorized: relevant phi not supported: last_19 = PHI > > <last_8(7), 108(15)> > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > > missed: bad operation or unsupported loop bound. > Hi Richard, > Looking at the aarch64 vect dump, it seems the loop in > condition_reduction gets vectorized with V4HI mode > while fails for other modes in vectorizable_condition: > > if ((double_reduc || reduction_type != TREE_CODE_REDUCTION) > && ncopies > 1) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "multiple types in double reduction or condition " > "reduction or fold-left reduction.\n"); > return false; > } > > From the dump: > foo.c:9:21: note: === vect_analyze_loop_operations === > foo.c:9:21: note: examining phi: last_19 = PHI <last_8(7), 108(15)> > foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of > def: internal > foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int > foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI > <last_8(7), 108(15)>, type of def: reduction > foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int > > For V8HI, VF = 8, and vectype_in = vector(4) int. > Thus ncopies = VF / length(vectype_in) = 2, which is greater than 1, > and thus fails: > foo.c:9:21: missed: multiple types in double reduction or condition > reduction or fold-left reduction. > foo.c:4:1: missed: not vectorized: relevant phi not supported: > last_19 = PHI <last_8(7), 108(15)> > While for V4HI, VF = 4 and thus ncopies = 1, so it succeeds. > > For x86_64, it seems the vectorizer doesn't seem to try V4HI mode. > If I "force" the vectorizer to use V4HI mode, we get the following dump: > foo.c:9:21: note: === vect_analyze_loop_operations === > foo.c:9:21: note: examining phi: last_19 = PHI <last_8(7), 108(15)> > foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of > def: internal > foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int > foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI > <last_8(7), 108(15)>, type of def: reduction > foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int > foo.c:9:21: missed: multiple types in double reduction or condition > reduction or fold-left reduction. > > Not sure tho if this is the only reason for the test to fail to > vectorize on the target. > Will investigate in more details next week.
The odd thing is that you say for (int i = 0; i < N; i++) { aval = a[i]; if (b[i] < min_v) last = aval; } fails to vectorize but for (int i = 0; i < N; i++) { if (b[i] < min_v) last = a[i]; } succeeds? The IL difference should be irrelevant for the reduction vectorization: <bb 3> [local count: 1049367889]: # last_19 = PHI <last_8(7), 108(15)> # i_21 = PHI <i_17(7), 0(15)> # ivtmp_18 = PHI <ivtmp_10(7), 43(15)> _1 = (long unsigned int) i_21; _2 = _1 * 2; _3 = a_12(D) + _2; aval_13 = *_3; _5 = _1 * 4; _6 = b_14(D) + _5; _7 = *_6; last_16 = (int) aval_13; _9 = _7 < min_v_15(D); last_8 = _9 ? last_16 : last_19; i_17 = i_21 + 1; ivtmp_10 = ivtmp_18 - 1; if (ivtmp_10 != 0) goto <bb 7>; [97.68%] vs <bb 3> [local count: 1049367889]: # last_19 = PHI <last_9(7), 108(15)> # i_21 = PHI <i_17(7), 0(15)> # ivtmp_11 = PHI <ivtmp_10(7), 43(15)> _1 = (long unsigned int) i_21; _2 = _1 * 4; _3 = b_13(D) + _2; _4 = *_3; _5 = _4 < min_v_14(D); _6 = _1 * 2; _38 = _37 + _6; _7 = (short int *) _38; _8 = .MASK_LOAD (_7, 16B, _5); last_16 = (int) _8; last_9 = _5 ? last_16 : last_19; i_17 = i_21 + 1; ivtmp_10 = ivtmp_11 - 1; if (ivtmp_10 != 0) goto <bb 7>; [97.68%] maybe since the "mask" is used twice with the .MASK_LOAD we are not actually looking at the def (the comparison) and it's the comparison which would introduce the "multiple types"? That is, I wonder why not sinking the load, avoiding a conditional load, makes a difference to vectorizing the condition/extract last reduction. It doesn't seem to make a difference for x86. That said, the "fix" is probably sticking the correct target on the dump-check, it seems that vect_fold_extract_last is no longer correct here. Richard. > Thanks, > Prathamesh > > > > Richard.