On Mon, Oct 21, 2019 at 12:08 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Richard Biener <richard.guent...@gmail.com> writes:
> > On October 20, 2019 2:54:48 PM GMT+02:00, Richard Sandiford 
> > <richard.sandif...@arm.com> wrote:
> >>Richard Biener <richard.guent...@gmail.com> writes:
> >>> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford
> >><richard.sandif...@arm.com> wrote:
> >>>>After the previous patch, it seems more natural to apply the
> >>>>PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what
> >>>>the region is, rather than delaying it to vect_slp_analyze_bb_1.
> >>>>(But rather than carve out the biggest region possible and then
> >>>>reject it, wouldn't it be better to stop when the region gets
> >>>>too big, so we at least have a chance of vectorising something?)
> >>>>
> >>>>It also seems more natural for vect_slp_bb_region to create the
> >>>>bb_vec_info itself rather than (a) having to pass bits of data down
> >>>>for the initialisation and (b) forcing vect_slp_analyze_bb_1 to free
> >>>>on every failure return.
> >>>>
> >>>>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>>
> >>> Ok. But I wonder what the reason was for this limit? Dependency
> >>analysis was greatly simplified, being no longer quadratic here. Can
> >>you check where the limit originally came from? But indeed splitting
> >>the region makes more sense then, but at dataref group boundaries I'd
> >>say.
> >>
> >>Yeah, looks it was the complexity of dependence analysis:
> >>
> >>  https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html
> >
> > OK. We no longer
> > Call compute dependence between all memory refs but only verify we can do 
> > those code-motions we need to do. That's of course much harder to check a 
> > bound on upfront (it's still quadratic in the worst case). I'm also not 
> > sure this is ever a problem, but we might instead count the number of stmts 
> > involving memory?
>
> OK, here's what I tried.  It exposes quadratic behaviour for
> gcc.dg/pr87985.c on x86_64 though.  To make things worse, this is
> all in the name of "vectorising" using V1DI. :-)
>
> In that test we have N data references in which the even elements appear
> to be vectorisable but the odd elements don't.  First time round, we hit:
>
>   /* For basic block SLP, try to break the group up into multiples of the
>      vector size.  */
>   unsigned HOST_WIDE_INT const_nunits;
>   if (is_a <bb_vec_info> (vinfo)
>       && STMT_VINFO_GROUPED_ACCESS (stmt_info)
>       && DR_GROUP_FIRST_ELEMENT (stmt_info)
>       && nunits.is_constant (&const_nunits))
>
> This breaks off the leading even element and recurses, discards the
> leading odd element, and then recurses on the rest.  We then come round
> to the same code when recursing on the rest.
>
> It looks like matches is valid for all elements, so I guess we should
> divide the group based on all false elements, not just the first?

What do you mean with "valid for all elements"?  matches[i] is
true or false for all elements?  If it is true we shouldn't split
at all.

But indeed the recursion is bad if we split out one vector a step,
we should split the whole group in "half" instead.

Still the code should ensure we have a "half" that works OK
and one that possibly doesn't.

OK, I see we have {true, true, false <repeats ... times> } for
matches but the 2nd iteration gets {true, false, ... } and doesn't
split for me.  Of course if you have V1DI then you'll notice
that matches[0] is _always_ true ... I guess we should simply
never try splitting for const_nunits == 1?  Or even never
do BB vectorization with such a vector type.

Richard.

> Richard
>
>
> 2019-10-21  Richard Sandiford  <richard.sandif...@arm.com>
>
> gcc/
>         * params.def (PARAM_SLP_MAX_INSNS_IN_BB): Replace with...
>         (PARAM_SLP_MAX_MEMREFS_IN_BB): ...this new --param.
>         * doc/invoke.texi: Update accordingly.
>         * tree-vect-slp.c (vect_slp_bb): Compute region_end during the
>         search loop.  Stop after finding PARAM_SLP_MAX_MEMREFS_IN_BB
>         data references.
>
> Index: gcc/params.def
> ===================================================================
> --- gcc/params.def      2019-10-16 10:50:00.064243704 +0100
> +++ gcc/params.def      2019-10-21 10:48:45.769347466 +0100
> @@ -1030,10 +1030,10 @@ DEFPARAM (PARAM_PROFILE_FUNC_INTERNAL_ID
>           0, 0, 1)
>
>  /* Avoid SLP vectorization of large basic blocks.  */
> -DEFPARAM (PARAM_SLP_MAX_INSNS_IN_BB,
> -         "slp-max-insns-in-bb",
> -         "Maximum number of instructions in basic block to be considered for 
> "
> -         "SLP vectorization.", 1000, 0, 0)
> +DEFPARAM (PARAM_SLP_MAX_MEMREFS_IN_BB,
> +         "slp-max-memrefs-in-bb",
> +         "Maximum number of memory references to be considered for "
> +         "SLP vectorization.", 1024, 0, 0)
>
>  DEFPARAM (PARAM_MIN_INSN_TO_PREFETCH_RATIO,
>           "min-insn-to-prefetch-ratio",
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi 2019-10-16 10:49:59.340248881 +0100
> +++ gcc/doc/invoke.texi 2019-10-21 10:48:45.769347466 +0100
> @@ -12248,9 +12248,8 @@ by the copy loop headers pass.
>  @item vect-epilogues-nomask
>  Enable loop epilogue vectorization using smaller vector size.
>
> -@item slp-max-insns-in-bb
> -Maximum number of instructions in basic block to be
> -considered for SLP vectorization.
> +@item slp-max-memrefs-in-bb
> +Maximum number of memory references to be considered for SLP vectorization.
>
>  @item avoid-fma-max-bits
>  Maximum number of bits for which we avoid creating FMAs.
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-10-21 07:41:32.997886232 +0100
> +++ gcc/tree-vect-slp.c 2019-10-21 10:48:45.769347466 +0100
> @@ -3075,12 +3075,19 @@ vect_slp_bb (basic_block bb)
>    while (!gsi_end_p (gsi))
>      {
>        gimple_stmt_iterator region_begin = gsi;
> +      gimple_stmt_iterator region_end;
>        vec<data_reference_p> datarefs = vNULL;
> +      unsigned int max_datarefs = PARAM_VALUE (PARAM_SLP_MAX_MEMREFS_IN_BB);
>        int insns = 0;
>
> -      for (; !gsi_end_p (gsi); gsi_next (&gsi))
> +      while (1)
>         {
> +         region_end = gsi;
> +         if (gsi_end_p (gsi) || datarefs.length () >= max_datarefs)
> +           break;
> +
>           gimple *stmt = gsi_stmt (gsi);
> +         gsi_next (&gsi);
>           if (is_gimple_debug (stmt))
>             continue;
>           insns++;
> @@ -3089,33 +3096,17 @@ vect_slp_bb (basic_block bb)
>             vect_location = stmt;
>
>           if (!vect_find_stmt_data_reference (NULL, stmt, &datarefs))
> +           /* Don't include the statement in either this region or
> +              the next one.  */
>             break;
>         }
>
>        /* Skip leading unhandled stmts.  */
> -      if (gsi_stmt (region_begin) == gsi_stmt (gsi))
> -       {
> -         gsi_next (&gsi);
> -         continue;
> -       }
> +      if (insns == 0)
> +       continue;
>
> -      gimple_stmt_iterator region_end = gsi;
> -
> -      if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
> -       {
> -         if (dump_enabled_p ())
> -           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                            "not vectorized: too many instructions in "
> -                            "basic block.\n");
> -       }
> -      else if (vect_slp_bb_region (region_begin, region_end, datarefs, 
> insns))
> +      if (vect_slp_bb_region (region_begin, region_end, datarefs, insns))
>         any_vectorized = true;
> -
> -      if (gsi_end_p (region_end))
> -       break;
> -
> -      /* Skip the unhandled stmt.  */
> -      gsi_next (&gsi);
>      }
>
>    return any_vectorized;

Reply via email to