On Mon, Oct 21, 2019 at 12:08 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Biener <richard.guent...@gmail.com> writes: > > On October 20, 2019 2:54:48 PM GMT+02:00, Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >>Richard Biener <richard.guent...@gmail.com> writes: > >>> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford > >><richard.sandif...@arm.com> wrote: > >>>>After the previous patch, it seems more natural to apply the > >>>>PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what > >>>>the region is, rather than delaying it to vect_slp_analyze_bb_1. > >>>>(But rather than carve out the biggest region possible and then > >>>>reject it, wouldn't it be better to stop when the region gets > >>>>too big, so we at least have a chance of vectorising something?) > >>>> > >>>>It also seems more natural for vect_slp_bb_region to create the > >>>>bb_vec_info itself rather than (a) having to pass bits of data down > >>>>for the initialisation and (b) forcing vect_slp_analyze_bb_1 to free > >>>>on every failure return. > >>>> > >>>>Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > >>> > >>> Ok. But I wonder what the reason was for this limit? Dependency > >>analysis was greatly simplified, being no longer quadratic here. Can > >>you check where the limit originally came from? But indeed splitting > >>the region makes more sense then, but at dataref group boundaries I'd > >>say. > >> > >>Yeah, looks it was the complexity of dependence analysis: > >> > >> https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html > > > > OK. We no longer > > Call compute dependence between all memory refs but only verify we can do > > those code-motions we need to do. That's of course much harder to check a > > bound on upfront (it's still quadratic in the worst case). I'm also not > > sure this is ever a problem, but we might instead count the number of stmts > > involving memory? > > OK, here's what I tried. It exposes quadratic behaviour for > gcc.dg/pr87985.c on x86_64 though. To make things worse, this is > all in the name of "vectorising" using V1DI. :-) > > In that test we have N data references in which the even elements appear > to be vectorisable but the odd elements don't. First time round, we hit: > > /* For basic block SLP, try to break the group up into multiples of the > vector size. */ > unsigned HOST_WIDE_INT const_nunits; > if (is_a <bb_vec_info> (vinfo) > && STMT_VINFO_GROUPED_ACCESS (stmt_info) > && DR_GROUP_FIRST_ELEMENT (stmt_info) > && nunits.is_constant (&const_nunits)) > > This breaks off the leading even element and recurses, discards the > leading odd element, and then recurses on the rest. We then come round > to the same code when recursing on the rest. > > It looks like matches is valid for all elements, so I guess we should > divide the group based on all false elements, not just the first?
What do you mean with "valid for all elements"? matches[i] is true or false for all elements? If it is true we shouldn't split at all. But indeed the recursion is bad if we split out one vector a step, we should split the whole group in "half" instead. Still the code should ensure we have a "half" that works OK and one that possibly doesn't. OK, I see we have {true, true, false <repeats ... times> } for matches but the 2nd iteration gets {true, false, ... } and doesn't split for me. Of course if you have V1DI then you'll notice that matches[0] is _always_ true ... I guess we should simply never try splitting for const_nunits == 1? Or even never do BB vectorization with such a vector type. Richard. > Richard > > > 2019-10-21 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > * params.def (PARAM_SLP_MAX_INSNS_IN_BB): Replace with... > (PARAM_SLP_MAX_MEMREFS_IN_BB): ...this new --param. > * doc/invoke.texi: Update accordingly. > * tree-vect-slp.c (vect_slp_bb): Compute region_end during the > search loop. Stop after finding PARAM_SLP_MAX_MEMREFS_IN_BB > data references. > > Index: gcc/params.def > =================================================================== > --- gcc/params.def 2019-10-16 10:50:00.064243704 +0100 > +++ gcc/params.def 2019-10-21 10:48:45.769347466 +0100 > @@ -1030,10 +1030,10 @@ DEFPARAM (PARAM_PROFILE_FUNC_INTERNAL_ID > 0, 0, 1) > > /* Avoid SLP vectorization of large basic blocks. */ > -DEFPARAM (PARAM_SLP_MAX_INSNS_IN_BB, > - "slp-max-insns-in-bb", > - "Maximum number of instructions in basic block to be considered for > " > - "SLP vectorization.", 1000, 0, 0) > +DEFPARAM (PARAM_SLP_MAX_MEMREFS_IN_BB, > + "slp-max-memrefs-in-bb", > + "Maximum number of memory references to be considered for " > + "SLP vectorization.", 1024, 0, 0) > > DEFPARAM (PARAM_MIN_INSN_TO_PREFETCH_RATIO, > "min-insn-to-prefetch-ratio", > Index: gcc/doc/invoke.texi > =================================================================== > --- gcc/doc/invoke.texi 2019-10-16 10:49:59.340248881 +0100 > +++ gcc/doc/invoke.texi 2019-10-21 10:48:45.769347466 +0100 > @@ -12248,9 +12248,8 @@ by the copy loop headers pass. > @item vect-epilogues-nomask > Enable loop epilogue vectorization using smaller vector size. > > -@item slp-max-insns-in-bb > -Maximum number of instructions in basic block to be > -considered for SLP vectorization. > +@item slp-max-memrefs-in-bb > +Maximum number of memory references to be considered for SLP vectorization. > > @item avoid-fma-max-bits > Maximum number of bits for which we avoid creating FMAs. > Index: gcc/tree-vect-slp.c > =================================================================== > --- gcc/tree-vect-slp.c 2019-10-21 07:41:32.997886232 +0100 > +++ gcc/tree-vect-slp.c 2019-10-21 10:48:45.769347466 +0100 > @@ -3075,12 +3075,19 @@ vect_slp_bb (basic_block bb) > while (!gsi_end_p (gsi)) > { > gimple_stmt_iterator region_begin = gsi; > + gimple_stmt_iterator region_end; > vec<data_reference_p> datarefs = vNULL; > + unsigned int max_datarefs = PARAM_VALUE (PARAM_SLP_MAX_MEMREFS_IN_BB); > int insns = 0; > > - for (; !gsi_end_p (gsi); gsi_next (&gsi)) > + while (1) > { > + region_end = gsi; > + if (gsi_end_p (gsi) || datarefs.length () >= max_datarefs) > + break; > + > gimple *stmt = gsi_stmt (gsi); > + gsi_next (&gsi); > if (is_gimple_debug (stmt)) > continue; > insns++; > @@ -3089,33 +3096,17 @@ vect_slp_bb (basic_block bb) > vect_location = stmt; > > if (!vect_find_stmt_data_reference (NULL, stmt, &datarefs)) > + /* Don't include the statement in either this region or > + the next one. */ > break; > } > > /* Skip leading unhandled stmts. */ > - if (gsi_stmt (region_begin) == gsi_stmt (gsi)) > - { > - gsi_next (&gsi); > - continue; > - } > + if (insns == 0) > + continue; > > - gimple_stmt_iterator region_end = gsi; > - > - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: too many instructions in " > - "basic block.\n"); > - } > - else if (vect_slp_bb_region (region_begin, region_end, datarefs, > insns)) > + if (vect_slp_bb_region (region_begin, region_end, datarefs, insns)) > any_vectorized = true; > - > - if (gsi_end_p (region_end)) > - break; > - > - /* Skip the unhandled stmt. */ > - gsi_next (&gsi); > } > > return any_vectorized;