On Tue, Nov 16, 2021 at 7:05 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Mon, Nov 15, 2021 at 3:00 PM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > >> > On Fri, Nov 12, 2021 at 7:05 PM Richard Sandiford via Gcc-patches > >> > <gcc-patches@gcc.gnu.org> wrote: > >> >> > >> >> This patch is a prerequisite for a later one. At the moment, > >> >> if-conversion converts predicated POINTER_PLUS_EXPRs into > >> >> non-wrapping forms, which for: > >> >> > >> >> … = base + offset > >> >> > >> >> becomes: > >> >> > >> >> tmp = (unsigned long) base > >> >> … = tmp + offset > >> >> > >> >> It then hoists these conversions out of the loop where possible. > >> >> > >> >> However, because “base” is a valid gimple operand, there can be > >> >> multiple POINTER_PLUS_EXPRs with the same base, which can in turn > >> >> lead to multiple instances of the same conversion. The later VN pass > >> >> is (and I think needs to be) restricted to the new if-converted code, > >> >> whereas here we're deliberately inserting the conversions before the > >> >> .LOOP_VECTORIZED condition: > >> >> > >> >> /* If we versioned loop then make sure to insert invariant > >> >> stmts before the .LOOP_VECTORIZED check since the vectorizer > >> >> will re-use that for things like runtime alias versioning > >> >> whose condition can end up using those invariants. */ > >> >> > >> >> We can therefore enter the vectoriser with redundant conversions. > >> >> > >> >> The easiest fix seemed to be to defer the hoisting until after VN. > >> >> This catches other hoisting opportunities too. > >> >> > >> >> Hoisting the code from the (artificial) loop in pr99102.c means > >> >> that it's no longer worth vectorising. The patch forces vectorisation > >> >> instead of relying on the cost model. > >> >> > >> >> The patch also reverts pr87007-4.c and pr87007-5.c back to their > >> >> original forms, undoing changes in 783dc66f9ccb0019c3dad. > >> >> The code at the time the tests were added was: > >> >> > >> >> testl %edi, %edi > >> >> je .L10 > >> >> vxorps %xmm1, %xmm1, %xmm1 > >> >> vsqrtsd d3(%rip), %xmm1, %xmm0 > >> >> vsqrtsd d2(%rip), %xmm1, %xmm1 > >> >> ... > >> >> .L10: > >> >> ret > >> >> > >> >> with the operations being hoisted, and the vxorps was specifically > >> >> wanted (compared to the previous code). This patch restores the code > >> >> to that form, with the hoisted operations and the vxorps. > >> >> > >> >> Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > >> >> > >> >> Richard > >> >> > >> >> > >> >> gcc/ > >> >> * tree-if-conv.c: Include tree-eh.h. > >> >> (predicate_statements): Remove pe argument. Don't hoist > >> >> statements here. > >> >> (combine_blocks): Remove pe argument. > >> >> (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions. > >> >> (ifcvt_hoist_invariants): Likewise. > >> >> (tree_if_conversion): Update call to combine_blocks. Call > >> >> ifcvt_hoist_invariants after VN. > >> >> > >> >> gcc/testsuite/ > >> >> * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model. > >> >> > >> >> Revert: > >> >> > >> >> 2020-09-09 Richard Biener <rguent...@suse.de> > >> >> > >> >> * gcc.target/i386/pr87007-4.c: Adjust. > >> >> * gcc.target/i386/pr87007-5.c: Likewise. > >> >> --- > >> >> gcc/testsuite/gcc.dg/vect/pr99102.c | 2 +- > >> >> gcc/testsuite/gcc.target/i386/pr87007-4.c | 2 +- > >> >> gcc/testsuite/gcc.target/i386/pr87007-5.c | 2 +- > >> >> gcc/tree-if-conv.c | 122 ++++++++++++++++++++-- > >> >> 4 files changed, 114 insertions(+), 14 deletions(-) > >> >> > >> >> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c > >> >> b/gcc/testsuite/gcc.dg/vect/pr99102.c > >> >> index 6c1a13f0783..0d030d15c86 100644 > >> >> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c > >> >> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c > >> >> @@ -1,4 +1,4 @@ > >> >> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ > >> >> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model > >> >> -fdump-tree-vect-details" } */ > >> >> /* { dg-additional-options "-msve-vector-bits=256" { target > >> >> aarch64_sve256_hw } } */ > >> >> long a[44]; > >> >> short d, e = -7; > >> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c > >> >> b/gcc/testsuite/gcc.target/i386/pr87007-4.c > >> >> index 9c4b8005af3..e91bdcbac44 100644 > >> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c > >> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c > >> >> @@ -15,4 +15,4 @@ foo (int n, int k) > >> >> d1 = ceil (d3); > >> >> } > >> >> > >> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } > >> >> } */ > >> >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } > >> >> } */ > >> >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c > >> >> b/gcc/testsuite/gcc.target/i386/pr87007-5.c > >> >> index e4d956a5d7f..20d13cf650b 100644 > >> >> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c > >> >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c > >> >> @@ -15,4 +15,4 @@ foo (int n, int k) > >> >> d1 = sqrt (d3); > >> >> } > >> >> > >> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } > >> >> } */ > >> >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } > >> >> } */ > >> >> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c > >> >> index e88ddc9f788..0ad557a2f4d 100644 > >> >> --- a/gcc/tree-if-conv.c > >> >> +++ b/gcc/tree-if-conv.c > >> >> @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3. If not see > >> >> #include "tree-cfgcleanup.h" > >> >> #include "tree-ssa-dse.h" > >> >> #include "tree-vectorizer.h" > >> >> +#include "tree-eh.h" > >> >> > >> >> /* Only handle PHIs with no more arguments unless we are asked to by > >> >> simd pragma. */ > >> >> @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, > >> >> tree cond, > >> >> */ > >> >> > >> >> static void > >> >> -predicate_statements (loop_p loop, edge pe) > >> >> +predicate_statements (loop_p loop) > >> >> { > >> >> unsigned int i, orig_loop_num_nodes = loop->num_nodes; > >> >> auto_vec<int, 1> vect_sizes; > >> >> @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe) > >> >> { > >> >> gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2)); > >> >> gsi_remove (&gsi2, false); > >> >> - /* Make sure to move invariant conversions out of the > >> >> - loop. */ > >> >> - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code > >> >> (stmt2)) > >> >> - && expr_invariant_in_loop_p (loop, > >> >> - gimple_assign_rhs1 > >> >> (stmt2))) > >> >> - gsi_insert_on_edge_immediate (pe, stmt2); > >> >> - else if (first) > >> >> + if (first) > >> >> { > >> >> gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT); > >> >> first = false; > >> >> @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop) > >> >> blocks. Replace PHI nodes with conditional modify expressions. */ > >> >> > >> >> static void > >> >> -combine_blocks (class loop *loop, edge pe) > >> >> +combine_blocks (class loop *loop) > >> >> { > >> >> basic_block bb, exit_bb, merge_target_bb; > >> >> unsigned int orig_loop_num_nodes = loop->num_nodes; > >> >> @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe) > >> >> predicate_all_scalar_phis (loop); > >> >> > >> >> if (need_to_predicate || need_to_rewrite_undefined) > >> >> - predicate_statements (loop, pe); > >> >> + predicate_statements (loop); > >> >> > >> >> /* Merge basic blocks. */ > >> >> exit_bb = NULL; > >> >> @@ -3181,6 +3176,109 @@ ifcvt_local_dce (class loop *loop) > >> >> } > >> >> } > >> >> > >> >> +/* Return true if STMT can be hoisted from if-converted loop LOOP. */ > >> >> + > >> >> +static bool > >> >> +ifcvt_can_hoist (class loop *loop, gimple *stmt) > >> >> +{ > >> >> + if (auto *call = dyn_cast<gcall *> (stmt)) > >> >> + { > >> >> + if (gimple_call_internal_p (call) > >> >> + && internal_fn_mask_index (gimple_call_internal_fn (call)) >= > >> >> 0) > >> >> + return false; > >> >> + } > >> >> + else if (auto *assign = dyn_cast<gassign *> (stmt)) > >> >> + { > >> >> + if (gimple_assign_rhs_code (assign) == COND_EXPR) > >> >> + return false; > >> >> + } > >> >> + else > >> >> + return false; > >> >> + > >> >> + if (gimple_has_side_effects (stmt) > >> >> + || gimple_could_trap_p (stmt) > >> >> + || stmt_could_throw_p (cfun, stmt) > >> >> + || gimple_vdef (stmt) > >> >> + || gimple_vuse (stmt)) > >> >> + return false; > >> >> + > >> >> + int num_args = gimple_num_args (stmt); > >> >> + for (int i = 0; i < num_args; ++i) > >> >> + if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i))) > >> >> + return false; > >> >> + > >> >> + return true; > >> >> +} > >> >> + > >> >> +/* PE is the preferred hoisting edge selected by tree_if_conversion, > >> >> which > >> >> + s known to be different from (and to dominate) the preheader edge > >> >> of the > >> >> + if-converted loop. We already know that STMT can be inserted on > >> >> the loop > >> >> + preheader edge. Return true if we prefer to insert it on PE > >> >> instead. */ > >> >> + > >> >> +static bool > >> >> +ifcvt_can_hoist_further (edge pe, gimple *stmt) > >> >> +{ > >> >> + /* As explained in tree_if_conversion, we want to hoist invariant > >> >> + conversions further so that they can be reused by alias analysis. > >> >> */ > >> >> + auto *assign = dyn_cast<gassign *> (stmt); > >> >> + if (assign > >> >> + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign))) > >> >> + { > >> >> + tree rhs = gimple_assign_rhs1 (assign); > >> >> + if (is_gimple_min_invariant (rhs)) > >> >> + return true; > >> >> + > >> >> + if (TREE_CODE (rhs) == SSA_NAME) > >> >> + { > >> >> + basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs)); > >> >> + if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, > >> >> def_bb)) > >> >> + return true; > >> >> + } > >> >> + } > >> >> + return false; > >> >> +} > >> >> + > >> >> +/* Hoist invariant statements from LOOP. PE is the preferred edge for > >> >> + hoisting conversions, as selected by tree_if_conversion; see there > >> >> + for details. */ > >> >> + > >> >> +static void > >> >> +ifcvt_hoist_invariants (class loop *loop, edge pe) > >> >> +{ > >> >> + gimple_stmt_iterator hoist_gsi = {}; > >> >> + gimple_stmt_iterator hoist_gsi_pe = {}; > >> >> + unsigned int num_blocks = loop->num_nodes; > >> >> + basic_block *body = get_loop_body (loop); > >> >> + for (unsigned int i = 0; i < num_blocks; ++i) > >> >> + for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p > >> >> (gsi);) > >> >> + { > >> >> + gimple *stmt = gsi_stmt (gsi); > >> >> + if (ifcvt_can_hoist (loop, stmt)) > >> >> + { > >> >> + /* Once we've hoisted one statement, insert other statements > >> >> + after it. */ > >> >> + edge e = loop_preheader_edge (loop); > >> >> + gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi; > >> >> + if (e != pe && ifcvt_can_hoist_further (pe, stmt)) > >> > > >> > One issue with hoisting to loop_preheader_edge instead of 'pe' > >> > is that eventually the loop versioning code will pick up the defs > >> > through the versioning condition and that will break because the > >> > versioning condition will be inserted in place of the > >> > IFN_VECTORIZED_LOOP, which is _before_ the preheader. > >> > The code basically expects the preheader to be empty. > >> > >> Do you mean that the versioning-for-aliases code assumes that all > >> loop-invariant definitions are available in the versioning condition? > > > > Yes. Or well, it assumes it can use the .LOOP_VECTORIZED condition > > as versioning condition and there simply use insert the condition without > > further checks. And the checks are generated utlimatively from SCEV > > which just expands things up to the loops preheader. > > > >> I hadn't realised that's what the comment was saying. It sounded more > >> like an optimisation. > > > > The hoisting itself is, but that we don't just hoist to the loops preheader > > is for correctness, not an optimization ... it's indeed a bit messy though. > > Ah, OK. > > How does this version look? I was in two minds about whether to keep > the expr_invariant_in_loop_p path, but it seemed like the more natural > test for the case that it handles. I'd be happy to drop it and use > ifcvt_available_on_edge_p instead if you prefer though. > > Tested as before.
OK. Thanks, Richard. > Thanks, > Richard > > > gcc/ > * tree-if-conv.c: Include tree-eh.h. > (predicate_statements): Remove pe argument. Don't hoist > statements here. > (combine_blocks): Remove pe argument. > (ifcvt_available_on_edge_p, ifcvt_can_hoist): New functions. > (ifcvt_hoist_invariants): Likewise. > (tree_if_conversion): Update call to combine_blocks. Call > ifcvt_hoist_invariants after VN. > > gcc/testsuite/ > * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model. > > Revert: > > 2020-09-09 Richard Biener <rguent...@suse.de> > > * gcc.target/i386/pr87007-4.c: Adjust. > * gcc.target/i386/pr87007-5.c: Likewise. > --- > gcc/testsuite/gcc.dg/vect/pr99102.c | 2 +- > gcc/testsuite/gcc.target/i386/pr87007-4.c | 2 +- > gcc/testsuite/gcc.target/i386/pr87007-5.c | 2 +- > gcc/tree-if-conv.c | 112 +++++++++++++++++++--- > 4 files changed, 104 insertions(+), 14 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c > b/gcc/testsuite/gcc.dg/vect/pr99102.c > index 6c1a13f0783..0d030d15c86 100644 > --- a/gcc/testsuite/gcc.dg/vect/pr99102.c > +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c > @@ -1,4 +1,4 @@ > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ > +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model > -fdump-tree-vect-details" } */ > /* { dg-additional-options "-msve-vector-bits=256" { target > aarch64_sve256_hw } } */ > long a[44]; > short d, e = -7; > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c > b/gcc/testsuite/gcc.target/i386/pr87007-4.c > index 9c4b8005af3..e91bdcbac44 100644 > --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c > +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c > @@ -15,4 +15,4 @@ foo (int n, int k) > d1 = ceil (d3); > } > > -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c > b/gcc/testsuite/gcc.target/i386/pr87007-5.c > index e4d956a5d7f..20d13cf650b 100644 > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c > @@ -15,4 +15,4 @@ foo (int n, int k) > d1 = sqrt (d3); > } > > -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ > diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c > index e88ddc9f788..d0ca04600ce 100644 > --- a/gcc/tree-if-conv.c > +++ b/gcc/tree-if-conv.c > @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-cfgcleanup.h" > #include "tree-ssa-dse.h" > #include "tree-vectorizer.h" > +#include "tree-eh.h" > > /* Only handle PHIs with no more arguments unless we are asked to by > simd pragma. */ > @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond, > */ > > static void > -predicate_statements (loop_p loop, edge pe) > +predicate_statements (loop_p loop) > { > unsigned int i, orig_loop_num_nodes = loop->num_nodes; > auto_vec<int, 1> vect_sizes; > @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe) > { > gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2)); > gsi_remove (&gsi2, false); > - /* Make sure to move invariant conversions out of the > - loop. */ > - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2)) > - && expr_invariant_in_loop_p (loop, > - gimple_assign_rhs1 > (stmt2))) > - gsi_insert_on_edge_immediate (pe, stmt2); > - else if (first) > + if (first) > { > gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT); > first = false; > @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop) > blocks. Replace PHI nodes with conditional modify expressions. */ > > static void > -combine_blocks (class loop *loop, edge pe) > +combine_blocks (class loop *loop) > { > basic_block bb, exit_bb, merge_target_bb; > unsigned int orig_loop_num_nodes = loop->num_nodes; > @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe) > predicate_all_scalar_phis (loop); > > if (need_to_predicate || need_to_rewrite_undefined) > - predicate_statements (loop, pe); > + predicate_statements (loop); > > /* Merge basic blocks. */ > exit_bb = NULL; > @@ -3181,6 +3176,99 @@ ifcvt_local_dce (class loop *loop) > } > } > > +/* Return true if VALUE is already available on edge PE. */ > + > +static bool > +ifcvt_available_on_edge_p (edge pe, tree value) > +{ > + if (is_gimple_min_invariant (value)) > + return true; > + > + if (TREE_CODE (value) == SSA_NAME) > + { > + basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (value)); > + if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb)) > + return true; > + } > + > + return false; > +} > + > +/* Return true if STMT can be hoisted from if-converted loop LOOP to > + edge PE. */ > + > +static bool > +ifcvt_can_hoist (class loop *loop, edge pe, gimple *stmt) > +{ > + if (auto *call = dyn_cast<gcall *> (stmt)) > + { > + if (gimple_call_internal_p (call) > + && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0) > + return false; > + } > + else if (auto *assign = dyn_cast<gassign *> (stmt)) > + { > + if (gimple_assign_rhs_code (assign) == COND_EXPR) > + return false; > + } > + else > + return false; > + > + if (gimple_has_side_effects (stmt) > + || gimple_could_trap_p (stmt) > + || stmt_could_throw_p (cfun, stmt) > + || gimple_vdef (stmt) > + || gimple_vuse (stmt)) > + return false; > + > + int num_args = gimple_num_args (stmt); > + if (pe != loop_preheader_edge (loop)) > + { > + for (int i = 0; i < num_args; ++i) > + if (!ifcvt_available_on_edge_p (pe, gimple_arg (stmt, i))) > + return false; > + } > + else > + { > + for (int i = 0; i < num_args; ++i) > + if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i))) > + return false; > + } > + > + return true; > +} > + > +/* Hoist invariant statements from LOOP to edge PE. */ > + > +static void > +ifcvt_hoist_invariants (class loop *loop, edge pe) > +{ > + gimple_stmt_iterator hoist_gsi = {}; > + unsigned int num_blocks = loop->num_nodes; > + basic_block *body = get_loop_body (loop); > + for (unsigned int i = 0; i < num_blocks; ++i) > + for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p > (gsi);) > + { > + gimple *stmt = gsi_stmt (gsi); > + if (ifcvt_can_hoist (loop, pe, stmt)) > + { > + /* Once we've hoisted one statement, insert other statements > + after it. */ > + gsi_remove (&gsi, false); > + if (hoist_gsi.ptr) > + gsi_insert_after (&hoist_gsi, stmt, GSI_NEW_STMT); > + else > + { > + gsi_insert_on_edge_immediate (pe, stmt); > + hoist_gsi = gsi_for_stmt (stmt); > + } > + continue; > + } > + gsi_next (&gsi); > + } > + free (body); > +} > + > /* If-convert LOOP when it is legal. For the moment this pass has no > profitability analysis. Returns non-zero todo flags when something > changed. */ > @@ -3275,7 +3363,7 @@ tree_if_conversion (class loop *loop, vec<gimple *> > *preds) > /* Now all statements are if-convertible. Combine all the basic > blocks into one huge basic block doing the if-conversion > on-the-fly. */ > - combine_blocks (loop, pe); > + combine_blocks (loop); > > /* Perform local CSE, this esp. helps the vectorizer analysis if loads > and stores are involved. CSE only the loop body, not the entry > @@ -3297,6 +3385,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> > *preds) > ifcvt_local_dce (loop); > BITMAP_FREE (exit_bbs); > > + ifcvt_hoist_invariants (loop, pe); > + > todo |= TODO_cleanup_cfg; > > cleanup: > -- > 2.25.1 >