This places two hacks to avoid an old compile-time issue when vectorizing large permuted SLP groups with gaps where we end up emitting loads and IV adjustments for the gap as well and those have quite a high cost until they are eventually cleaned up.
The first hack is to fold the auto-inc style IV updates early in the vectorizer rather than in the next forwprop pass which shortens the SSA use-def chains of the used IV. The second hack is to remove the unused loads after we've picked all that we possibly use. Bootstrap / regtest running on x86_64-unknown-linux-gnu. I wonder if this is too gross (and I have to check the one or two bug duplicates), but it should be at least easy to backport ... Thanks, Richard. 2021-06-18 Richard Biener <rguent...@suse.de> PR tree-optimization/101120 * tree-vect-data-refs.c (bump_vector_ptr): Fold the built increment. * tree-vect-stmts.c (vectorizable_load): Remove unused loads in the DR chain for SLP. --- gcc/tree-vect-data-refs.c | 12 +++++++++++- gcc/tree-vect-stmts.c | 12 ++++++++++++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index bb086c6ac1c..be067c8923b 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-hash-traits.h" #include "vec-perm-indices.h" #include "internal-fn.h" +#include "gimple-fold.h" /* Return true if load- or store-lanes optab OPTAB is implemented for COUNT vectors of type VECTYPE. NAME is the name of OPTAB. */ @@ -5026,7 +5027,7 @@ bump_vector_ptr (vec_info *vinfo, struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); tree vectype = STMT_VINFO_VECTYPE (stmt_info); tree update = TYPE_SIZE_UNIT (vectype); - gassign *incr_stmt; + gimple *incr_stmt; ssa_op_iter iter; use_operand_p use_p; tree new_dataref_ptr; @@ -5041,6 +5042,15 @@ bump_vector_ptr (vec_info *vinfo, incr_stmt = gimple_build_assign (new_dataref_ptr, POINTER_PLUS_EXPR, dataref_ptr, update); vect_finish_stmt_generation (vinfo, stmt_info, incr_stmt, gsi); + /* Fold the increment, avoiding excessive chains use-def chains of + those, leading to compile-time issues for passes until the next + forwprop pass which would do this as well. */ + gimple_stmt_iterator fold_gsi = gsi_for_stmt (incr_stmt); + if (fold_stmt (&fold_gsi, follow_all_ssa_edges)) + { + incr_stmt = gsi_stmt (fold_gsi); + update_stmt (incr_stmt); + } /* Copy the points-to information if it exists. */ if (DR_PTR_INFO (dr)) diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index eeef96a2eb6..1636e6716df 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -9765,6 +9765,18 @@ vectorizable_load (vec_info *vinfo, bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, gsi, vf, false, &n_perms); gcc_assert (ok); + /* For SLP we know we've seen all possible uses of dr_chain. + See to remove stmts we didn't need. + ??? This is a hack to prevent compile-time issues as seen + in PR101120 and friends. */ + for (tree op : dr_chain) + if (has_zero_uses (op)) + { + gimple *stmt = SSA_NAME_DEF_STMT (op); + gimple_stmt_iterator rgsi = gsi_for_stmt (stmt); + gsi_remove (&rgsi, true); + release_defs (stmt); + } } else { -- 2.26.2