On Thu, 10 Oct 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> After all the discussions and respins I now believe this patch is close to
> what we envisioned.
> 
> This patch achieves two things when vect-epilogues-nomask=1:
> 1) It analyzes the original loop for each supported vector size and saves this
> analysis per loop, as well as the vector sizes we know we can vectorize the
> loop for.
> 2) When loop versioning it uses the 'skip_vector' code path to vectorize the
> epilogue, and uses the lowest versioning threshold between the main and
> epilogue's.
> 
> As side effects of this patch I also changed ensure_base_align to only update
> the alignment if the new alignment is lower than the current one.  This
> function already did that if the object was a symbol, now it behaves this way
> for any object.
> 
> I bootstrapped this patch with both vect-epilogues-nomask turned on and off on
> x86_64 (AVX512) and aarch64.  Regression tests looked good.
> 
> Is this OK for trunk?

+
+  /* Keep track of vector sizes we know we can vectorize the epilogue 
with.  */
+  vector_sizes epilogue_vsizes;
 };

please don't enlarge struct loop, instead track this somewhere
in the vectorizer (in loop_vinfo?  I see you already have
epilogue_vinfos there - so the loop_vinfo simply lacks 
convenient access to the vector_size?)  I don't see any
use that could be trivially adjusted to look at a loop_vinfo
member instead.

For the vect_update_inits_of_drs this means that we'd possibly
do less CSE.  Not sure if really an issue.

You use LOOP_VINFO_EPILOGUE_P sometimes and sometimes
LOOP_VINFO_ORIG_LOOP_INFO, please change predicates to
LOOP_VINFO_EPILOGUE_P.

@@ -2466,15 +2461,62 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
niters, tree nitersm1,
   else
     niters_prolog = build_int_cst (type, 0);
     
+  loop_vec_info epilogue_vinfo = NULL;
+  if (vect_epilogues)
+    { 
...
+       vect_epilogues = false;
+    }
+

I don't understand what all this does - it clearly needs a comment.
Maybe the overall comment of the function should be amended with
an overview of how we handle [multiple] epilogue loop vectorization?

+
+      if (epilogue_any_upper_bound && prolog_peeling >= 0)
+       {
+         epilog->any_upper_bound = true;
+         epilog->nb_iterations_upper_bound = eiters + 1;
+       }
+

comment missing.  How can prolog_peeling be < 0?  We likely
didn't set the upper bound because we don't know it in the
case we skipped the vector loop (skip_vector)?  So make sure
to not introduce wrong-code issues here - maybe do this
optimization as followup?

 class loop *
-vect_loop_versioning (loop_vec_info loop_vinfo,
-                     unsigned int th, bool check_profitability,
-                     poly_uint64 versioning_threshold)
+vect_loop_versioning (loop_vec_info loop_vinfo)
 { 
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo), *nloop;
   class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
@@ -2988,10 +3151,15 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
   bool version_align = LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT 
(loop_vinfo);
   bool version_alias = LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo);
   bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo);
+  poly_uint64 versioning_threshold
+    = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
   tree version_simd_if_cond
     = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo);
+  unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);

-  if (check_profitability)
+  if (th >= vect_vf_for_cost (loop_vinfo)
+      && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+      && !ordered_p (th, versioning_threshold))
     cond_expr = fold_build2 (GE_EXPR, boolean_type_node, 
scalar_loop_iters,
                             build_int_cst (TREE_TYPE (scalar_loop_iters),
                                            th - 1));

split out this refactoring - preapproved.

@@ -1726,7 +1729,13 @@ vect_analyze_loop_costing (loop_vec_info 
loop_vinfo)
       return 0;
     }

-  HOST_WIDE_INT estimated_niter = estimated_stmt_executions_int (loop);
+  HOST_WIDE_INT estimated_niter = -1;
+
+  if (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
+    estimated_niter
+      = vect_vf_for_cost (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)) - 1;
+  if (estimated_niter == -1)
+    estimated_niter = estimated_stmt_executions_int (loop);
   if (estimated_niter == -1)
     estimated_niter = likely_max_stmt_executions_int (loop);
   if (estimated_niter != -1

it's clearer if the old code is completely in a else {} path
even though vect_vf_for_cost - 1 should never be -1.

+/* Decides whether we need to create an epilogue loop to handle
+   remaining scalar iterations and sets PEELING_FOR_NITERS accordingly.  
*/
+      
+void                  
+determine_peel_for_niter (loop_vec_info loop_vinfo)
+{   
+  

extra vertical space

+  unsigned HOST_WIDE_INT const_vf;
+  HOST_WIDE_INT max_niter 

if it's a 1:1 copy outlined then split it out - preapproved
(so further reviews get smaller patches ;))  I'd add a
LOOP_VINFO_PEELING_FOR_NITER () = false as final else
since that's what we do by default?

-  if (LOOP_REQUIRES_VERSIONING (loop_vinfo))
+  if (LOOP_REQUIRES_VERSIONING (loop_vinfo)
+      || ((orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
+         && LOOP_REQUIRES_VERSIONING (orig_loop_vinfo)))

not sure why we need to do this for epilouges?

+
+      /*  Use the same condition as vect_transform_loop to decide when to 
use
+         the cost to determine a versioning threshold.  */
+      if (th >= vect_vf_for_cost (loop_vinfo)
+         && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+         && ordered_p (th, niters_th))
+       niters_th = ordered_max (poly_uint64 (th), niters_th);

that's an independent change, right?  Please split out, it's
pre-approved if it tests OK separately.

+static tree
+replace_ops (tree op, hash_map<tree, tree> &mapping)
+{

I'm quite sure I've seen such beast elsewhere ;)  simplify_replace_tree
comes up first (not a 1:1 match but hints at a possible tree
sharing issue in your variant).

@@ -8497,11 +8588,11 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   if (th >= vect_vf_for_cost (loop_vinfo)
       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
-      if (dump_enabled_p ())
-       dump_printf_loc (MSG_NOTE, vect_location,
-                        "Profitability threshold is %d loop 
iterations.\n",
-                         th);
-      check_profitability = true;
+       if (dump_enabled_p ())
+         dump_printf_loc (MSG_NOTE, vect_location,
+                          "Profitability threshold is %d loop 
iterations.\n",
+                          th);
+       check_profitability = true;
     }

   /* Make sure there exists a single-predecessor exit bb.  Do this before

obvious (separate)

+  tree advance;
   epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, 
&niters_vector,
                              &step_vector, &niters_vector_mult_vf, th,
-                             check_profitability, niters_no_overflow);
+                             check_profitability, niters_no_overflow,
+                             &advance);
+
+  if (epilogue)
+    {
+      basic_block *orig_bbs = get_loop_body (loop);
+      loop_vec_info epilogue_vinfo = loop_vec_info_for_loop (epilogue);
...

please record this in vect_do_peeling itself and store the
orig_stmts/drs/etc. in the epilogue loop_vinfo and ...

+      /* We are done vectorizing the main loop, so now we update the 
epilogues
+        stmt_vec_info's.  At the same time we set the gimple UID of each
+        statement in the epilogue, as these are used to look them up in 
the
+        epilogues loop_vec_info later.  We also keep track of what
...

split this out to a new function.  I wonder why you need to record
the DRs, are they not available via ->datarefs and lookup_dr ()?

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 
601a6f55fbff388c89f88d994e790aebf2bf960e..201549da6c0cbae0797a23ae1b8967b9895505e9
 
100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -6288,7 +6288,7 @@ ensure_base_align (dr_vec_info *dr_info)

       if (decl_in_symtab_p (base_decl))
        symtab_node::get (base_decl)->increase_alignment (align_base_to);
-      else
+      else if (DECL_ALIGN (base_decl) < align_base_to)
        {
          SET_DECL_ALIGN (base_decl, align_base_to);
           DECL_USER_ALIGN (base_decl) = 1;

split out - preapproved.

Still have to go over the main loop doing the analysis/transform.

Thanks, it looks really promising (albeit exepectedly ugly due to
the data rewriting).

Richard.


> gcc/ChangeLog:
> 2019-10-10  Andre Vieira  <andre.simoesdiasvie...@arm.com>
> 
>     PR 88915
>     * cfgloop.h (loop): Add epilogue_vsizes member.
>     * cfgloop.c (flow_loop_free): Release epilogue_vsizes.
>     (alloc_loop): Initialize epilogue_vsizes.
>     * gentype.c (main): Add poly_uint64 type and vector_sizes to
>     generator.
>     * tree-vect-loop.c (vect_get_loop_niters): Make externally visible.
>     (_loop_vec_info): Initialize epilogue_vinfos.
>     (~_loop_vec_info): Release epilogue_vinfos.
>     (vect_analyze_loop_costing): Use knowledge of main VF to estimate
>     number of iterations of epilogue.
>     (determine_peel_for_niter): New. Outlined code to re-use in two
>     places.
>     (vect_analyze_loop_2): Adapt to analyse main loop for all supported
>     vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
>     versioning threshold needed for main loop.
>     (vect_analyze_loop): Likewise.
>     (replace_ops): New helper function.
>     (vect_transform_loop): When vectorizing epilogues re-use analysis done
>     on main loop and update necessary information.
>     * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
>     stmts on loop preheader edge.
>     (vect_do_peeling): Enable skip-vectors when doing loop versioning if
>     we decided to vectorize epilogues.  Update epilogues NITERS and
>     construct ADVANCE to update epilogues data references where needed.
>     (vect_loop_versioning): Moved decision to check_profitability
>     based on cost model.
>     * tree-vect-stmts.c (ensure_base_align): Only update alignment
>     if new alignment is lower.
>     * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos member.
>     (vect_loop_versioning, vect_do_peeling, vect_get_loop_niters,
>     vect_update_inits_of_drs, determine_peel_for_niter,
>     vect_analyze_loop): Add or update declarations.
>     * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
>     create loop_vec_info's for epilogues when available.  Otherwise analyse
>     epilogue separately.
> 
> 
> 
> Cheers,
> Andre
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Reply via email to