Richard Biener <richard.guent...@gmail.com> writes:
> On October 20, 2019 2:54:48 PM GMT+02:00, Richard Sandiford 
> <richard.sandif...@arm.com> wrote:
>>Richard Biener <richard.guent...@gmail.com> writes:
>>> On October 19, 2019 5:06:40 PM GMT+02:00, Richard Sandiford
>><richard.sandif...@arm.com> wrote:
>>>>After the previous patch, it seems more natural to apply the
>>>>PARAM_SLP_MAX_INSNS_IN_BB threshold as soon as we know what
>>>>the region is, rather than delaying it to vect_slp_analyze_bb_1.
>>>>(But rather than carve out the biggest region possible and then
>>>>reject it, wouldn't it be better to stop when the region gets
>>>>too big, so we at least have a chance of vectorising something?)
>>>>
>>>>It also seems more natural for vect_slp_bb_region to create the
>>>>bb_vec_info itself rather than (a) having to pass bits of data down
>>>>for the initialisation and (b) forcing vect_slp_analyze_bb_1 to free
>>>>on every failure return.
>>>>
>>>>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>>
>>> Ok. But I wonder what the reason was for this limit? Dependency
>>analysis was greatly simplified, being no longer quadratic here. Can
>>you check where the limit originally came from? But indeed splitting
>>the region makes more sense then, but at dataref group boundaries I'd
>>say. 
>>
>>Yeah, looks it was the complexity of dependence analysis:
>>
>>  https://gcc.gnu.org/ml/gcc-patches/2009-05/msg01303.html
>
> OK. We no longer
> Call compute dependence between all memory refs but only verify we can do 
> those code-motions we need to do. That's of course much harder to check a 
> bound on upfront (it's still quadratic in the worst case). I'm also not sure 
> this is ever a problem, but we might instead count the number of stmts 
> involving memory? 

OK, here's what I tried.  It exposes quadratic behaviour for
gcc.dg/pr87985.c on x86_64 though.  To make things worse, this is
all in the name of "vectorising" using V1DI. :-)

In that test we have N data references in which the even elements appear
to be vectorisable but the odd elements don't.  First time round, we hit:

  /* For basic block SLP, try to break the group up into multiples of the
     vector size.  */
  unsigned HOST_WIDE_INT const_nunits;
  if (is_a <bb_vec_info> (vinfo)
      && STMT_VINFO_GROUPED_ACCESS (stmt_info)
      && DR_GROUP_FIRST_ELEMENT (stmt_info)
      && nunits.is_constant (&const_nunits))

This breaks off the leading even element and recurses, discards the
leading odd element, and then recurses on the rest.  We then come round
to the same code when recursing on the rest.

It looks like matches is valid for all elements, so I guess we should
divide the group based on all false elements, not just the first?

Richard


2019-10-21  Richard Sandiford  <richard.sandif...@arm.com>

gcc/
        * params.def (PARAM_SLP_MAX_INSNS_IN_BB): Replace with...
        (PARAM_SLP_MAX_MEMREFS_IN_BB): ...this new --param.
        * doc/invoke.texi: Update accordingly.
        * tree-vect-slp.c (vect_slp_bb): Compute region_end during the
        search loop.  Stop after finding PARAM_SLP_MAX_MEMREFS_IN_BB
        data references.

Index: gcc/params.def
===================================================================
--- gcc/params.def      2019-10-16 10:50:00.064243704 +0100
+++ gcc/params.def      2019-10-21 10:48:45.769347466 +0100
@@ -1030,10 +1030,10 @@ DEFPARAM (PARAM_PROFILE_FUNC_INTERNAL_ID
          0, 0, 1)
 
 /* Avoid SLP vectorization of large basic blocks.  */
-DEFPARAM (PARAM_SLP_MAX_INSNS_IN_BB,
-         "slp-max-insns-in-bb",
-         "Maximum number of instructions in basic block to be considered for "
-         "SLP vectorization.", 1000, 0, 0)
+DEFPARAM (PARAM_SLP_MAX_MEMREFS_IN_BB,
+         "slp-max-memrefs-in-bb",
+         "Maximum number of memory references to be considered for "
+         "SLP vectorization.", 1024, 0, 0)
 
 DEFPARAM (PARAM_MIN_INSN_TO_PREFETCH_RATIO,
          "min-insn-to-prefetch-ratio",
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi 2019-10-16 10:49:59.340248881 +0100
+++ gcc/doc/invoke.texi 2019-10-21 10:48:45.769347466 +0100
@@ -12248,9 +12248,8 @@ by the copy loop headers pass.
 @item vect-epilogues-nomask
 Enable loop epilogue vectorization using smaller vector size.
 
-@item slp-max-insns-in-bb
-Maximum number of instructions in basic block to be
-considered for SLP vectorization.
+@item slp-max-memrefs-in-bb
+Maximum number of memory references to be considered for SLP vectorization.
 
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c 2019-10-21 07:41:32.997886232 +0100
+++ gcc/tree-vect-slp.c 2019-10-21 10:48:45.769347466 +0100
@@ -3075,12 +3075,19 @@ vect_slp_bb (basic_block bb)
   while (!gsi_end_p (gsi))
     {
       gimple_stmt_iterator region_begin = gsi;
+      gimple_stmt_iterator region_end;
       vec<data_reference_p> datarefs = vNULL;
+      unsigned int max_datarefs = PARAM_VALUE (PARAM_SLP_MAX_MEMREFS_IN_BB);
       int insns = 0;
 
-      for (; !gsi_end_p (gsi); gsi_next (&gsi))
+      while (1)
        {
+         region_end = gsi;
+         if (gsi_end_p (gsi) || datarefs.length () >= max_datarefs)
+           break;
+
          gimple *stmt = gsi_stmt (gsi);
+         gsi_next (&gsi);
          if (is_gimple_debug (stmt))
            continue;
          insns++;
@@ -3089,33 +3096,17 @@ vect_slp_bb (basic_block bb)
            vect_location = stmt;
 
          if (!vect_find_stmt_data_reference (NULL, stmt, &datarefs))
+           /* Don't include the statement in either this region or
+              the next one.  */
            break;
        }
 
       /* Skip leading unhandled stmts.  */
-      if (gsi_stmt (region_begin) == gsi_stmt (gsi))
-       {
-         gsi_next (&gsi);
-         continue;
-       }
+      if (insns == 0)
+       continue;
 
-      gimple_stmt_iterator region_end = gsi;
-
-      if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
-       {
-         if (dump_enabled_p ())
-           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                            "not vectorized: too many instructions in "
-                            "basic block.\n");
-       }
-      else if (vect_slp_bb_region (region_begin, region_end, datarefs, insns))
+      if (vect_slp_bb_region (region_begin, region_end, datarefs, insns))
        any_vectorized = true;
-
-      if (gsi_end_p (region_end))
-       break;
-
-      /* Skip the unhandled stmt.  */
-      gsi_next (&gsi);
     }
 
   return any_vectorized;

Reply via email to