RE: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks

Richard Biener Fri, 24 Nov 2023 04:39:05 -0800

On Fri, 24 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> Here's an updated patch, which takes a slightly different approach but makes 
> things much easier later on.
> 
> Peeling for early breaks works by redirecting all early break exits to a
> single "early break" block and combine them and the normal exit edge together
> later in a different block which then goes into the epilog preheader.
> 
> This allows us to re-use all the existing code for IV updates, Additionally 
> this
> also enables correct linking for multiple vector epilogues.
> 
> flush_pending_stmts cannot be used in this scenario since it updates the PHI
> nodes in the order that they are in the exit destination blocks.  This means
> they are in CFG visit order.  With a single exit this doesn't matter but with
> multiple exits with different live values through the different exits the 
> order
> usually does not line up.
> 
> Additionally the vectorizer helper functions expect to be able to iterate over
> the nodes in the order that they occur in the loop header blocks.  This is an
> invariant we must maintain.  To do this we just inline the work
> flush_pending_stmts but maintain the order by using the header blocks to guide
> the work.
> 
> The way peeling is done result in LIM noticing that in some cases the 
> condition
> and the results are loop invariant and tries to move them out of the loop.
> 
> While the resulting code is operationally sound, moving the compare out of the
> gcond results in generating code that no longer branches, so cbranch is no
> longer applicable.  As such I now add code to check during this motion to see
> if the target supports flag setting vector comparison as general operation.
> 
> Because of the change in peeling I now also have to update the BB counts for
> the loop exit intermediate block.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       * gcc/tree-ssa-loop-im.cc (compute_invariantness): Import insn-codes.h
>       and optabs-tree.h and check for vector compare motion out of gcond.
>       * tree-vect-loop-manip.cc
>       (slpeel_tree_duplicate_loop_to_edge_cfg): Peel using intermediate 
> blocks.
>       (vect_update_ivs_after_vectorizer): Drop assert.
>       (vect_do_peeling): Correct BB count for new intermediate block.
>       * tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
>       (slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
> index 
> 396963b6754c7671e2e5404302a69129918555e2..92a9318a1ca0a2da50ff2f29cf271d2e78fddd77
>  100644
> --- a/gcc/tree-ssa-loop-im.cc
> +++ b/gcc/tree-ssa-loop-im.cc
> @@ -48,6 +48,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dfa.h"
>  #include "tree-ssa.h"
>  #include "dbgcnt.h"
> +#include "insn-codes.h"
> +#include "optabs-tree.h"
>  
>  /* TODO:  Support for predicated code motion.  I.e.
>  
> @@ -1138,6 +1140,24 @@ compute_invariantness (basic_block bb)
>           continue;
>         }
>  
> +     /* Check if one of the depedent statement is a vector compare whether
> +        the target supports it,  otherwise it's invalid to hoist it out of
> +        the gcond it belonged to.  */
> +     for (auto dep_stmt : lim_data->depends)
> +       {
> +          if (is_gimple_assign (dep_stmt)
> +              && VECTOR_TYPE_P (TREE_TYPE (gimple_assign_lhs (dep_stmt))))
> +             {
> +               tree type = TREE_TYPE (gimple_assign_lhs (dep_stmt));
> +               auto code = gimple_assign_rhs_code (dep_stmt);
> +               if (!target_supports_op_p (type, code, optab_vector))
> +                 pos = MOVE_IMPOSSIBLE;
> +             }
> +       }


I think it's more natural to handle this in determine_max_movement
where we specifically look at the condition we are going to turn
into a COND_EXPR condition.

I think it's also independent of this series - the issue should
be latent, but possibly only triggerable with a GIMPLE testcase.

Can you split it out?

The rest of the patch is OK.

Thanks,
Richard.

> +
> +     if (pos == MOVE_IMPOSSIBLE)
> +       continue;
> +
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         {
>           print_gimple_stmt (dump_file, stmt, 2);
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> b9161274ce401a7307f3e61ad23aa036701190d7..0b042b2baf976572af962dd40d5dc311a419ee60
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1403,13 +1403,16 @@ vect_set_loop_condition (class loop *loop, edge 
> loop_e, loop_vec_info loop_vinfo
>     copies remains the same.
>  
>     If UPDATED_DOMS is not NULL it is update with the list of basic blocks 
> whoms
> -   dominators were updated during the peeling.  */
> +   dominators were updated during the peeling.  When doing early break 
> vectorization
> +   then LOOP_VINFO needs to be provided and is used to keep track of any 
> newly created
> +   memory references that need to be updated should we decide to vectorize.  
> */
>  
>  class loop *
>  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>                                       class loop *scalar_loop,
>                                       edge scalar_exit, edge e, edge *new_e,
> -                                     bool flow_loops)
> +                                     bool flow_loops,
> +                                     vec<basic_block> *updated_doms)
>  {
>    class loop *new_loop;
>    basic_block *new_bbs, *bbs, *pbbs;
> @@ -1526,7 +1529,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>        }
>  
>    auto loop_exits = get_loop_exit_edges (loop);
> +  bool multiple_exits_p = loop_exits.length () > 1;
>    auto_vec<basic_block> doms;
> +  class loop *update_loop = NULL;
>  
>    if (at_exit) /* Add the loop copy at exit.  */
>      {
> @@ -1536,39 +1541,65 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>         flush_pending_stmts (new_exit);
>       }
>  
> +      bool multiple_exits_p = loop_exits.length () > 1;
> +      basic_block main_loop_exit_block = new_preheader;
> +      basic_block alt_loop_exit_block = NULL;
> +      /* Create intermediate edge for main exit.  But only useful for early
> +      exits.  */
> +      if (multiple_exits_p)
> +     {
> +       edge loop_e = single_succ_edge (new_preheader);
> +       new_preheader = split_edge (loop_e);
> +     }
> +
>        auto_vec <gimple *> new_phis;
>        hash_map <tree, tree> new_phi_args;
>        /* First create the empty phi nodes so that when we flush the
>        statements they can be filled in.   However because there is no order
>        between the PHI nodes in the exits and the loop headers we need to
>        order them base on the order of the two headers.  First record the new
> -      phi nodes.  */
> -      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> +      phi nodes. Then redirect the edges and flush the changes.  This writes
> +      out the new SSA names.  */
> +      for (auto gsi_from = gsi_start_phis (loop_exit->dest);
>          !gsi_end_p (gsi_from); gsi_next (&gsi_from))
>       {
>         gimple *from_phi = gsi_stmt (gsi_from);
>         tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> -       gphi *res = create_phi_node (new_res, new_preheader);
> +       gphi *res = create_phi_node (new_res, main_loop_exit_block);
>         new_phis.safe_push (res);
>       }
>  
> -      /* Then redirect the edges and flush the changes.  This writes out the 
> new
> -      SSA names.  */
> -      for (edge exit : loop_exits)
> +      for (auto exit : loop_exits)
>       {
> -       edge temp_e = redirect_edge_and_branch (exit, new_preheader);
> -       flush_pending_stmts (temp_e);
> +       basic_block dest = main_loop_exit_block;
> +       if (exit != loop_exit)
> +         {
> +           if (!alt_loop_exit_block)
> +             {
> +               alt_loop_exit_block = split_edge (exit);
> +               edge res = redirect_edge_and_branch (
> +                             single_succ_edge (alt_loop_exit_block),
> +                             new_preheader);
> +               flush_pending_stmts (res);
> +               continue;
> +             }
> +           dest = alt_loop_exit_block;
> +         }
> +       edge e = redirect_edge_and_branch (exit, dest);
> +       flush_pending_stmts (e);
>       }
> +
>        /* Record the new SSA names in the cache so that we can skip 
> materializing
>        them again when we fill in the rest of the LCSSA variables.  */
>        for (auto phi : new_phis)
>       {
> -       tree new_arg = gimple_phi_arg (phi, 0)->def;
> +       tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
>  
>         if (!SSA_VAR_P (new_arg))
>           continue;
> +
>         /* If the PHI MEM node dominates the loop then we shouldn't create
> -           a new LC-SSSA PHI for it in the intermediate block.   */
> +          a new LC-SSSA PHI for it in the intermediate block.   */
>         /* A MEM phi that consitutes a new DEF for the vUSE chain can either
>            be a .VDEF or a PHI that operates on MEM. And said definition
>            must not be inside the main loop.  Or we must be a parameter.
> @@ -1584,6 +1615,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>             remove_phi_node (&gsi, true);
>             continue;
>           }
> +
> +       /* If we decide to remove the PHI node we should also not
> +          rematerialize it later on.  */
>         new_phi_args.put (new_arg, gimple_phi_result (phi));
>  
>         if (TREE_CODE (new_arg) != SSA_NAME)
> @@ -1595,34 +1629,68 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>        preheader block and still find the right LC nodes.  */
>        edge loop_entry = single_succ_edge (new_preheader);
>        if (flow_loops)
> -     for (auto gsi_from = gsi_start_phis (loop->header),
> -          gsi_to = gsi_start_phis (new_loop->header);
> -          !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> -          gsi_next (&gsi_from), gsi_next (&gsi_to))
> -       {
> -         gimple *from_phi = gsi_stmt (gsi_from);
> -         gimple *to_phi = gsi_stmt (gsi_to);
> -         tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> -                                               loop_latch_edge (loop));
> +     {
> +       /* Link through the main exit first.  */
> +       for (auto gsi_from = gsi_start_phis (loop->header),
> +            gsi_to = gsi_start_phis (new_loop->header);
> +            !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> +            gsi_next (&gsi_from), gsi_next (&gsi_to))
> +         {
> +           gimple *from_phi = gsi_stmt (gsi_from);
> +           gimple *to_phi = gsi_stmt (gsi_to);
> +           tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> +                                                 loop_latch_edge (loop));
> +
> +           /* Check if we've already created a new phi node during edge
> +              redirection.  If we have, only propagate the value
> +              downwards.  */
> +           if (tree *res = new_phi_args.get (new_arg))
> +             {
> +               if (multiple_exits_p)
> +                 new_arg = *res;
> +               else
> +                 {
> +                   adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
> +                   continue;
> +                 }
> +             }
>  
> -         /* Check if we've already created a new phi node during edge
> -            redirection.  If we have, only propagate the value downwards.  */
> -         if (tree *res = new_phi_args.get (new_arg))
> -           {
> -             adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
> -             continue;
> -           }
> +           tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> +           gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
>  
> -         tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> -         gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
> +           /* Main loop exit should use the final iter value.  */
> +           SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
>  
> -         /* Main loop exit should use the final iter value.  */
> -         add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
> +           adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
> +         }
>  
> -         adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
> -       }
> +       set_immediate_dominator (CDI_DOMINATORS, main_loop_exit_block,
> +                                loop_exit->src);
> +
> +       /* Now link the alternative exits.  */
> +       if (multiple_exits_p)
> +         {
> +           set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> +                                    main_loop_exit_block);
> +           for (auto gsi_from = gsi_start_phis (loop->header),
> +                gsi_to = gsi_start_phis (new_preheader);
> +                !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> +                gsi_next (&gsi_from), gsi_next (&gsi_to))
> +             {
> +               gimple *from_phi = gsi_stmt (gsi_from);
> +               gimple *to_phi = gsi_stmt (gsi_to);
> +
> +               tree alt_arg = gimple_phi_result (from_phi);
> +               edge main_e = single_succ_edge (alt_loop_exit_block);
> +               for (edge e : loop_exits)
> +                 if (e != loop_exit)
> +                   SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
> +             }
>  
> -      set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
> +           set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> +                                    loop->header);
> +         }
> +     }
>  
>        if (was_imm_dom || duplicate_outer_loop)
>       set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
> @@ -1634,6 +1702,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>        delete_basic_block (preheader);
>        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
>                              loop_preheader_edge (scalar_loop)->src);
> +
> +      /* Finally after wiring the new epilogue we need to update its main 
> exit
> +      to the original function exit we recorded.  Other exits are already
> +      correct.  */
> +      if (multiple_exits_p)
> +     {
> +       update_loop = new_loop;
> +       for (edge e : get_loop_exit_edges (loop))
> +         doms.safe_push (e->dest);
> +       doms.safe_push (exit_dest);
> +
> +       /* Likely a fall-through edge, so update if needed.  */
> +       if (single_succ_p (exit_dest))
> +         doms.safe_push (single_succ (exit_dest));
> +     }
>      }
>    else /* Add the copy at entry.  */
>      {
> @@ -1681,6 +1764,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>        delete_basic_block (new_preheader);
>        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
>                              loop_preheader_edge (new_loop)->src);
> +
> +      if (multiple_exits_p)
> +     update_loop = loop;
> +    }
> +
> +  if (multiple_exits_p)
> +    {
> +      for (edge e : get_loop_exit_edges (update_loop))
> +     {
> +       edge ex;
> +       edge_iterator ei;
> +       FOR_EACH_EDGE (ex, ei, e->dest->succs)
> +         {
> +           /* Find the first non-fallthrough block as fall-throughs can't
> +              dominate other blocks.  */
> +           if (single_succ_p (ex->dest))
> +             {
> +               doms.safe_push (ex->dest);
> +               ex = single_succ_edge (ex->dest);
> +             }
> +           doms.safe_push (ex->dest);
> +         }
> +       doms.safe_push (e->dest);
> +     }
> +
> +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> +      if (updated_doms)
> +     updated_doms->safe_splice (doms);
>      }
>  
>    free (new_bbs);
> @@ -2050,7 +2161,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info 
> loop_vinfo,
>  
>    /* Make sure there exists a single-predecessor exit bb:  */
>    gcc_assert (single_pred_p (exit_bb));
> -  gcc_assert (single_succ_edge (exit_bb) == update_e);
>  
>    for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis 
> (update_bb);
>         !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> @@ -3138,6 +3248,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> niters, tree nitersm1,
>        epilog->force_vectorize = false;
>        bb_before_epilog = loop_preheader_edge (epilog)->src;
>  
> +      /* Fixup the probabities of the new intermediate blocks that we use to 
> connect
> +      to the merge block.  The rest are dealt with via bb_before_epilog
> +      adjustments. */
> +     e->dest->count = e->count ();
> +
>        /* Scalar version loop may be preferred.  In this case, add guard
>        and skip to epilog.  Note this only happens when the number of
>        iterations of loop is unknown at compile time, otherwise this
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> b5e27d1c46d9cb3dfe5b44f1b49c9e4204572ff1..39aa4d1250efe308acccf484d370f8adfd1ba843
>  100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
>  {
>    if (bb == (bb->loop_father)->header)
>      return true;
> -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> +
>    return false;
>  }
>  
> @@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class 
> loop *, const_edge,
>                                        const_edge);
>  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
>                                                   class loop *, edge,
> -                                                 edge, edge *, bool = true);
> +                                                 edge, edge *, bool = true,
> +                                                 vec<basic_block> * = NULL);
>  class loop *vect_loop_versioning (loop_vec_info, gimple *);
>  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
>                                   tree *, tree *, tree *, int, bool, bool,
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks

Reply via email to