On Mon, Oct 21, 2019 at 12:08 PM Richard Sandiford
<[email protected]> wrote:
>
> Richard Biener <[email protected]> writes:
> > On October 20, 2019 2:54:48 PM GMT+02:00, Richard Sandiford
> > <[email protected]> wrote:
> >>Richard Biener <[email protected]> writes:
> >>> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford
> >><[email protected]> wrote:
> >>>>After the previous patch, it seems more natural to apply the
> >>>>PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what
> >>>>the region is, rather than delaying it to vect_slp_analyze_bb_1.
> >>>>(But rather than carve out the biggest region possible and then
> >>>>reject it, wouldn't it be better to stop when the region gets
> >>>>too big, so we at least have a chance of vectorising something?)
> >>>>
> >>>>It also seems more natural for vect_slp_bb_region to create the
> >>>>bb_vec_info itself rather than (a) having to pass bits of data down
> >>>>for the initialisation and (b) forcing vect_slp_analyze_bb_1 to free
> >>>>on every failure return.
> >>>>
> >>>>Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
> >>>
> >>> Ok. But I wonder what the reason was for this limit? Dependency
> >>analysis was greatly simplified, being no longer quadratic here. Can
> >>you check where the limit originally came from? But indeed splitting
> >>the region makes more sense then, but at dataref group boundaries I'd
> >>say.
> >>
> >>Yeah, looks it was the complexity of dependence analysis:
> >>
> >> https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html
> >
> > OK. We no longer
> > Call compute dependence between all memory refs but only verify we can do
> > those code-motions we need to do. That's of course much harder to check a
> > bound on upfront (it's still quadratic in the worst case). I'm also not
> > sure this is ever a problem, but we might instead count the number of stmts
> > involving memory?
>
> OK, here's what I tried. It exposes quadratic behaviour for
> gcc.dg/pr87985.c on x86_64 though. To make things worse, this is
> all in the name of "vectorising" using V1DI. :-)
>
> In that test we have N data references in which the even elements appear
> to be vectorisable but the odd elements don't. First time round, we hit:
>
> /* For basic block SLP, try to break the group up into multiples of the
> vector size. */
> unsigned HOST_WIDE_INT const_nunits;
> if (is_a <bb_vec_info> (vinfo)
> && STMT_VINFO_GROUPED_ACCESS (stmt_info)
> && DR_GROUP_FIRST_ELEMENT (stmt_info)
> && nunits.is_constant (&const_nunits))
>
> This breaks off the leading even element and recurses, discards the
> leading odd element, and then recurses on the rest. We then come round
> to the same code when recursing on the rest.
>
> It looks like matches is valid for all elements, so I guess we should
> divide the group based on all false elements, not just the first?
What do you mean with "valid for all elements"? matches[i] is
true or false for all elements? If it is true we shouldn't split
at all.
But indeed the recursion is bad if we split out one vector a step,
we should split the whole group in "half" instead.
Still the code should ensure we have a "half" that works OK
and one that possibly doesn't.
OK, I see we have {true, true, false <repeats ... times> } for
matches but the 2nd iteration gets {true, false, ... } and doesn't
split for me. Of course if you have V1DI then you'll notice
that matches[0] is _always_ true ... I guess we should simply
never try splitting for const_nunits == 1? Or even never
do BB vectorization with such a vector type.
Richard.
> Richard
>
>
> 2019-10-21 Richard Sandiford <[email protected]>
>
> gcc/
> * params.def (PARAM_SLP_MAX_INSNS_IN_BB): Replace with...
> (PARAM_SLP_MAX_MEMREFS_IN_BB): ...this new --param.
> * doc/invoke.texi: Update accordingly.
> * tree-vect-slp.c (vect_slp_bb): Compute region_end during the
> search loop. Stop after finding PARAM_SLP_MAX_MEMREFS_IN_BB
> data references.
>
> Index: gcc/params.def
> ===================================================================
> --- gcc/params.def 2019-10-16 10:50:00.064243704 +0100
> +++ gcc/params.def 2019-10-21 10:48:45.769347466 +0100
> @@ -1030,10 +1030,10 @@ DEFPARAM (PARAM_PROFILE_FUNC_INTERNAL_ID
> 0, 0, 1)
>
> /* Avoid SLP vectorization of large basic blocks. */
> -DEFPARAM (PARAM_SLP_MAX_INSNS_IN_BB,
> - "slp-max-insns-in-bb",
> - "Maximum number of instructions in basic block to be considered for
> "
> - "SLP vectorization.", 1000, 0, 0)
> +DEFPARAM (PARAM_SLP_MAX_MEMREFS_IN_BB,
> + "slp-max-memrefs-in-bb",
> + "Maximum number of memory references to be considered for "
> + "SLP vectorization.", 1024, 0, 0)
>
> DEFPARAM (PARAM_MIN_INSN_TO_PREFETCH_RATIO,
> "min-insn-to-prefetch-ratio",
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi 2019-10-16 10:49:59.340248881 +0100
> +++ gcc/doc/invoke.texi 2019-10-21 10:48:45.769347466 +0100
> @@ -12248,9 +12248,8 @@ by the copy loop headers pass.
> @item vect-epilogues-nomask
> Enable loop epilogue vectorization using smaller vector size.
>
> -@item slp-max-insns-in-bb
> -Maximum number of instructions in basic block to be
> -considered for SLP vectorization.
> +@item slp-max-memrefs-in-bb
> +Maximum number of memory references to be considered for SLP vectorization.
>
> @item avoid-fma-max-bits
> Maximum number of bits for which we avoid creating FMAs.
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-10-21 07:41:32.997886232 +0100
> +++ gcc/tree-vect-slp.c 2019-10-21 10:48:45.769347466 +0100
> @@ -3075,12 +3075,19 @@ vect_slp_bb (basic_block bb)
> while (!gsi_end_p (gsi))
> {
> gimple_stmt_iterator region_begin = gsi;
> + gimple_stmt_iterator region_end;
> vec<data_reference_p> datarefs = vNULL;
> + unsigned int max_datarefs = PARAM_VALUE (PARAM_SLP_MAX_MEMREFS_IN_BB);
> int insns = 0;
>
> - for (; !gsi_end_p (gsi); gsi_next (&gsi))
> + while (1)
> {
> + region_end = gsi;
> + if (gsi_end_p (gsi) || datarefs.length () >= max_datarefs)
> + break;
> +
> gimple *stmt = gsi_stmt (gsi);
> + gsi_next (&gsi);
> if (is_gimple_debug (stmt))
> continue;
> insns++;
> @@ -3089,33 +3096,17 @@ vect_slp_bb (basic_block bb)
> vect_location = stmt;
>
> if (!vect_find_stmt_data_reference (NULL, stmt, &datarefs))
> + /* Don't include the statement in either this region or
> + the next one. */
> break;
> }
>
> /* Skip leading unhandled stmts. */
> - if (gsi_stmt (region_begin) == gsi_stmt (gsi))
> - {
> - gsi_next (&gsi);
> - continue;
> - }
> + if (insns == 0)
> + continue;
>
> - gimple_stmt_iterator region_end = gsi;
> -
> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
> - {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "not vectorized: too many instructions in "
> - "basic block.\n");
> - }
> - else if (vect_slp_bb_region (region_begin, region_end, datarefs,
> insns))
> + if (vect_slp_bb_region (region_begin, region_end, datarefs, insns))
> any_vectorized = true;
> -
> - if (gsi_end_p (region_end))
> - break;
> -
> - /* Skip the unhandled stmt. */
> - gsi_next (&gsi);
> }
>
> return any_vectorized;