https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351
--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Sorry for the delay, had a few days off.
So looking at this again, it's happening When next_ci gets inlined into
nbnxn_make_pairlist_part, the while loop
while (next_ci(iGrid, nth, ci_block, &ci_x, &ci_y, &ci_b, &ci))
becomes vectorizable with early break.
The code is quite a deep nesting of C++ templated function so it's hard to map
it back to source code.
the ifcvt loop that's getting vectorized is:
<bb 384> [local count: 9554422]:
_1643 = ci_y_1924 + 1;
if (_1643 == _1649)
goto <bb 385>; [5.50%]
else
goto <bb 458>; [94.50%]
<bb 458> [local count: 9028929]:
goto <bb 387>; [100.00%]
<bb 385> [local count: 525493]:
_1646 = ci_x_1920 + 1;
<bb 386> [local count: 1081571]:
# ci_x_1920 = PHI <ci_x_1919(382), _1646(385)>
# ci_y_1923 = PHI <ci_y_1922(382), 0(385)>
_1650 = _1649 * ci_x_1920;
<bb 387> [local count: 10110500]:
# ci_y_1924 = PHI <_1643(458), ci_y_1923(386)>
_1651 = _1650 + ci_y_1924;
_1652 = _1651 + 1;
_1653 = (long unsigned int) _1652;
_1655 = _1653 * 4;
_1656 = _1654 + _1655;
_1657 = *_1656;
if (_1657 <= ci_1918)
goto <bb 384>; [94.50%]
else
goto <bb 388>; [5.50%]
Which doesn't look like it should have been vectorized. BB 384 has a
non-exiting control flow.
BB 385 and 386 should only conditionally be executed if _1643 != 1649
But they're not an exit, The generated code:
504: 05a03984 mov z4.s, w12
508: 25ac0fef whilelo p15.s, wzr, w12
50c: 0b01018e add w14, w12, w1
510: 04a40785 sub z5.s, z28.s, z4.s
514: 25ae0fe7 whilelo p7.s, wzr, w14
518: cb2c4019 sub x25, x0, w12, uxtw
51c: 04a600a7 add z7.s, z5.s, z6.s
520: 25075fe0 not p0.b, p7/z, p15.b
524: 8b190bcd add x13, x30, x25, lsl #2
528: 14000005 b 53c
52c: 91001210 add x16, x16, #0x4
530: 25a0c087 add z7.s, z7.s, #4
534: 25ae0e00 whilelo p0.s, w16, w14
538: 54000480 b.eq 5c8 // b.none
53c: a55041ab ld1w {z11.s}, p0/z, [x13, x16, lsl #2]
540: 248b8181 cmpge p1.s, p0/z, z12.s, z11.s
544: 250e7a22 not p2.b, p14/z, p1.b
548: 2550f840 ptest p14, p2.b
54c: 54ffff00 b.eq 52c // b.none
550: 1e2600f1 fmov w17, s7
Is really broken since it only preserved the exit check. the check for
if (_1643 == _1649)
goto <bb 385>; [5.50%]
else
goto <bb 458>; [94.50%]
seems to have disappeared
and indeed:
pairlist.cpp:2590:5: note: ==> examining statement: if (_1643 == _1649)
pairlist.cpp:2590:5: note: skip.
pairlist.cpp:2590:5: note: ==> examining pattern def stmt: patt_3554 = _1643
== _1649;
pairlist.cpp:2590:5: note: skip.
pairlist.cpp:2590:5: note: ==> examining pattern statement: if (patt_3554 !=
0)
pairlist.cpp:2590:5: note: skip.
the compiler knows it can't vectorize this. so why didn't we stop earlier.
This should have been checked in vect_analyze_loop_form
at:
/* Check if we have any control flow that doesn't leave the loop. */
basic_block *bbs = get_loop_body (loop);
for (unsigned i = 0; i < loop->num_nodes; i++)
if (EDGE_COUNT (bbs[i]->succs) != 1
&& (EDGE_COUNT (bbs[i]->succs) != 2
|| !loop_exits_from_bb_p (bbs[i]->loop_father, bbs[i])))
{
free (bbs);
return opt_result::failure_at (vect_location,
"not vectorized:"
" unsupported control flow in loop.\n");
}
looks like this doesn't recognize this diamond layout.
Indeed it thinks 384 -> 385 is an exit..
pairlist.cpp:2590:5: note: using as main loop exit: 384 -> 385 [AUX: (nil)]
pairlist.cpp:2590:5: note: === get_loop_niters ===
pairlist.cpp:2590:5: note: Loop has 2 exits.
pairlist.cpp:2590:5: note: Analyzing exit 0...
pairlist.cpp:2590:5: note: Analyzing exit 1...
huh, interestingly it doesn't consider 385 as part of the loop..
iterations by profile: 8.347976 (unreliable, maybe flat) entry count:1081571
(estimated locally, freq 3.3915))
{
bb_384 (preds = {bb_387 }, succs = {bb_385 bb_458 })
bb_458 (preds = {bb_384 }, succs = {bb_387 })
bb_387 (preds = {bb_458 bb_386 }, succs = {bb_384 bb_388 })
}
But it clearly is..
;; basic block 385, loop depth 2, count 525493 (estimated locally, freq
1.6478), maybe hot
;; prev block 458, next block 386, flags: (NEW, REACHABLE, VISITED)
;; pred: 384 [5.5% (guessed)] count:525493 (estimated locally, freq
1.6478) (TRUE_VALUE,EXECUTABLE)
# RANGE [irange] int [1, +INF]
_1646 = ci_x_1920 + 1;
;; succ: 386 [always] count:525493 (estimated locally, freq 1.6478)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;; basic block 386, loop depth 2, count 1081571 (estimated locally, freq
3.3915), maybe hot
;; prev block 385, next block 387, flags: (NEW, REACHABLE, VISITED)
;; pred: 382 [always] count:556078 (estimated locally, freq 1.7437)
(FALLTHRU,EXECUTABLE)
;; 385 [always] count:525493 (estimated locally, freq 1.6478)
(FALLTHRU,DFS_BACK,EXECUTABLE)
# .MEM_1725 = PHI <.MEM_171(382), .MEM_1725(385)>
# RANGE [irange] int [0, +INF]
# ci_x_1920 = PHI <ci_x_1919(382), _1646(385)>
# RANGE [irange] int [0, +INF]
# ci_y_1923 = PHI <ci_y_1922(382), 0(385)>
_1650 = _1649 * ci_x_1920;
;; succ: 387 [always] count:1081571 (estimated locally, freq 3.3915)
(FALLTHRU,EXECUTABLE)
Oh I see..
BB 385 is at a loop depth of 2, but BB 384 is at a loop depth of 3.
So it does leave the loop, but then re-enters it into the header, skipping the
pre-header.
The exit has a phi that's within a loop iteration set by another loop:
;; basic block 387, loop depth 3, count 10110500 (estimated locally, freq
31.7034), maybe hot
;; prev block 386, next block 388, flags: (NEW, REACHABLE, VISITED)
;; pred: 458 [always] count:9028929 (estimated locally, freq 28.3119)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;; 386 [always] count:1081571 (estimated locally, freq 3.3915)
(FALLTHRU,EXECUTABLE)
# RANGE [irange] int [0, +INF]
so this can't really be vectorized with early exits.
So we need something like this
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bb1138bfcfb..05ae7beb1cc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1883,6 +1883,12 @@ vect_analyze_loop_form (class loop *loop, gimple
*loop_vectorized_call,
return opt_result::failure_at (vect_location,
"not vectorized:"
" abnormal loop exit edge.\n");
+
+ for (auto pred : e->src->preds)
+ if (pred->src->loop_father != loop)
+ return opt_result::failure_at (vect_location,
+ "not vectorized:"
+ " abnormal loop incoming edge.\n");
}
info->conds
but should also be ok if one of the preds is the loop pre-header (so the patch
above is too strict).
This weirdness is happening because next_ci itself is a loop, that gets inlined
into the while condition.
Working on a small reproducer and patch.