Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

Richard Biener Thu, 12 Nov 2015 02:58:48 -0800

On Wed, Nov 11, 2015 at 9:38 PM, Jeff Law <[email protected]> wrote:
> On 09/04/2015 11:36 AM, Ajit Kumar Agarwal wrote:
>
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index 6b66f8f..20ddf3d 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
>>>           NEXT_PASS (pass_ccp);
>>>           /* After CCP we rewrite no longer addressed locals into SSA
>>>              form if possible.  */
>>> +          NEXT_PASS (pass_path_split);
>>>           NEXT_PASS (pass_forwprop);
>>>           NEXT_PASS (pass_sra_early);
>>
>> I can't recall if we've discussed the location of the pass at all.  I'm
>> not objecting to this location, but would like to hear why you chose
>> this particular location in the optimization pipeline.
>
> So returning to the question of where this would live in the optimization
> pipeline and how it interacts with if-conversion and vectorization.


Note that adding passes to the early pipeline that do code duplication
is a no-no.
The early pipeline should be exclusively for things making functions
more suitable
for inlining.

> The concern with moving it to late in the pipeline was that we'd miss
> VRP/DCE/CSE opportunities.  I'm not sure if you're aware, but we actually
> run those passes more than once.  So it would be possible to run path
> splitting after if-conversion & vectorization, but before the second passs
> of VRP & DOM.  But trying that seems to result in something scrambling the
> loop enough that the path splitting opportunity is missed.  That might be
> worth deeper investigation if we can't come up with some kind of heuristics
> to fire or suppress path splitting.

As I still think it is a transform similar to tracer just put it next to that.

But IIRC you mentioned it should enable vectorization or so?  In this case
that's obviously too late.

Richard.

> Other random notes as I look over the code:
>
> Call the pass "path-split", not "path_split".  I don't think we have any
> passes with underscores in their names, dump files, etc.
>
> You factored out the code for transform_duplicate.  When you create new
> functions, they should all have a block comment indicating what they do,
> return values, etc.
>
> I asked you to trim down the #includes in tree-ssa-path-split.c  Most were
> ultimately unnecessary.  The trimmed list is just 11 headers.
>
> Various functions in tree-ssa-path-split.c were missing their block
> comments.  There were several places in tree-ssa-path-split that I felt
> deserved a comment.  While you are familiar with the code, it's likely
> someone else will have to look at and modify this code at some point in the
> future.  The comments help make that easier.
>
> In find_trace_loop_latch_same_as_join_blk, we find the immediate dominator
> of the latch and verify it ends in a conditional.  That's fine.  Then we
> look at the predecessors of the latch to see if one is succeeded only by the
> latch and falls through to the latch.  That is the block we'll end up
> redirecting to a copy of the latch.  Also fine.
>
> Note how there is no testing for the relationship between the immediate
> dominator of the latch and the predecessors of the latch.  ISTM that we can
> have a fairly arbitrary region in the THEN/ELSE arms of the conditional.
> Was this intentional?  Would it be advisable to verify that the THEN/ELSE
> arms are single blocks?  Do we want to verify that neither the THEN/ELSE
> arms transfer control other than to the latch?  Do we want to verify the
> predecessors of the latch are immediate successors of the latch's immediate
> dominator?
>
> The is_feasible_trace routine was still checking if the block had a
> conversion and rejecting it.  I removed that check.  It does seem to me that
> we need an upper limit on the number of statements.  I wonder if we should
> factor out the maximum statements to copy code from jump threading and use
> it for both jump threading and path splitting.
>
> Instead of creating loop with multiple latches, what ever happened to the
> idea of duplicating the latch block twice -- once into each path. Remove the
> control statement in each duplicate.  Then remove everything but the control
> statement in the original latch.
>
>
> I added some direct dump support.  Essentially anytime we split the path, we
> output something like this:
>
> Split path in loop: latch block 9, predecessor 7.
>
> That allows tests in the testsuite to look for the "Split path in loop"
> string rather than inferring the information from the SSA graph update's
> replacement table.  It also allows us to do things like count how many paths
> get split if we have more complex tests.
>
> On the topic of tests.  Is the one you provided something where path
> splitting results in a significant improvement?  From looking at the x86_64
> output, I can see the path splitting transformation occur, but not any
> improvement in the final code.
>
> While the existing test is useful, testing on code that actually improves as
> a result of path splitting is better.  Ideally we'd test both that path
> splitting occurred and that the secondary optimizations we wanted triggered.
>
> The tests should go into gcc.dg/tree-ssa rather than just gcc.dg.
>
> ANyway, here's my work-in-progress.  Your thoughts on the various questions,
> concerns, ideas noted above would be appreciated.  Obviously I'd like to
> wrap things up quickly and include this patch in gcc6.
>
> Note, I haven't bootstrapped or regression tested this version.
>
>
>
>
>
>
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 34d2356..6613e83 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1474,6 +1474,7 @@ OBJS = \
>         tree-ssa-loop.o \
>         tree-ssa-math-opts.o \
>         tree-ssa-operands.o \
> +       tree-ssa-path-split.o \
>         tree-ssa-phionlycprop.o \
>         tree-ssa-phiopt.o \
>         tree-ssa-phiprop.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 757ce85..9cc94e2 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2403,6 +2403,10 @@ ftree-vrp
>  Common Report Var(flag_tree_vrp) Init(0) Optimization
>  Perform Value Range Propagation on trees.
>
> +ftree-path-split
> +Common Report Var(flag_tree_path_split) Init(0) Optimization
> +Perform Path Splitting on trees for loop backedges
> +
>  funit-at-a-time
>  Common Report Var(flag_unit_at_a_time) Init(1)
>  Compile whole compilation unit at a time.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 213a9d0..b1e95da 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -354,6 +354,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fdump-tree-fre@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-vtable-verify @gol
>  -fdump-tree-vrp@r{[}-@var{n}@r{]} @gol
> +-fdump-tree-path-split@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-storeccp@r{[}-@var{n}@r{]} @gol
>  -fdump-final-insns=@var{file} @gol
>  -fcompare-debug@r{[}=@var{opts}@r{]}  -fcompare-debug-second @gol
> @@ -462,7 +463,7 @@ Objective-C and Objective-C++ Dialects}.
>  -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta
> @gol
>  -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
>  -ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
> --ftree-vectorize -ftree-vrp @gol
> +-ftree-vectorize -ftree-vrp @gol -ftree-path-split @gol
>  -funit-at-a-time -funroll-all-loops -funroll-loops @gol
>  -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops
> @gol
>  -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
> @@ -7169,6 +7170,11 @@ output on to @file{stderr}. If two conflicting dump
> filenames are
>  given for the same pass, then the latter option overrides the earlier
>  one.
>
> +@item path-split
> +@opindex fdump-tree-path-split
> +Dump each function after path splitting.  The file name is made by
> +appending @file{.path-split} to the source file name
> +
>  @item all
>  Turn on all options, except @option{raw}, @option{slim}, @option{verbose}
>  and @option{lineno}.
> @@ -7811,6 +7817,7 @@ also turns on the following optimization flags:
>  -ftree-switch-conversion -ftree-tail-merge @gol
>  -ftree-pre @gol
>  -ftree-vrp @gol
> +-ftree-path-split @gol
>  -fipa-ra}
>
>  Please note the warning under @option{-fgcse} about
> @@ -8819,7 +8826,7 @@ currently enabled, but may be enabled by @option{-O2}
> in the future.
>
>  @item -ftree-sink
>  @opindex ftree-sink
> -Perform forward store motion  on trees.  This flag is
> +Perform forward store motion on trees.  This flag is
>  enabled by default at @option{-O} and higher.
>
>  @item -ftree-bit-ccp
> @@ -9125,6 +9132,13 @@ enabled by default at @option{-O2} and higher.  Null
> pointer check
>  elimination is only done if @option{-fdelete-null-pointer-checks} is
>  enabled.
>
> +@item -ftree-path-split
> +@opindex ftree-path-split
> +Perform Path Splitting on trees.  When the two execution paths of a
> +if-then-else merge at the loop latch node, try to duplicate the
> +merge node into two paths. This is enabled by default at @option{-O2}
> +and above.
> +
>  @item -fsplit-ivs-in-unroller
>  @opindex fsplit-ivs-in-unroller
>  Enables expression of values of induction variables in later iterations
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 9a3fbb3..9a0b27c 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -509,6 +509,7 @@ static const struct default_options
> default_options_table[] =
>      { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1
> },
>      { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
>      { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 },
>
>      /* -O3 optimizations.  */
>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> diff --git a/gcc/passes.def b/gcc/passes.def
> index c0ab6b9..e0c1cd8 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_remove_cgraph_callee_edges);
>           NEXT_PASS (pass_object_sizes);
>           NEXT_PASS (pass_ccp);
> +         NEXT_PASS (pass_path_split);
>           /* After CCP we rewrite no longer addressed locals into SSA
>              form if possible.  */
>           NEXT_PASS (pass_forwprop);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c
> new file mode 100644
> index 0000000..4b8637b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-path-split-details " } */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +#define RGBMAX 255
> +
> +int
> +test()
> +{
> +  int i, Pels;
> +  unsigned char sum = 0;
> +  unsigned char xr, xg, xb;
> +  unsigned char xc, xm, xy, xk;
> +  unsigned char *ReadPtr, *EritePtr;
> +
> +  ReadPtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100);
> +  EritePtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100);
> +
> +  for (i = 0; i < 100;i++)
> +     {
> +       ReadPtr[i] = 100 - i;
> +     }
> +
> +  for (i = 0; i < 100; i++)
> +     {
> +       xr = *ReadPtr++;
> +       xg = *ReadPtr++;
> +       xb = *ReadPtr++;
> +
> +       xc = (unsigned char) (RGBMAX - xr);
> +       xm = (unsigned char) (RGBMAX - xg);
> +       xy = (unsigned char) (RGBMAX - xb);
> +
> +       if (xc < xm)
> +         {
> +           xk = (unsigned char) (xc < xy ? xc : xy);
> +         }
> +       else
> +        {
> +          xk = (unsigned char) (xm < xy ? xm : xy);
> +        }
> +
> +       xc = (unsigned char) (xc - xk);
> +       xm = (unsigned char) (xm - xk);
> +       xy = (unsigned char) (xy - xk);
> +
> +       *EritePtr++ = xc;
> +       *EritePtr++ = xm;
> +       *EritePtr++ = xy;
> +       *EritePtr++ = xk;
> +       sum += *EritePtr;
> +    }
> +  return sum;
> +}
> +
> +int
> +main()
> +{
> +  if (test() != 33)
> +    abort();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "Split path in loop:" "path-split" } } */
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index b429faf..3dba68b 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -300,3 +300,4 @@ DEFTIMEVAR (TV_LINK              , "link JIT code")
>  DEFTIMEVAR (TV_LOAD                 , "load JIT result")
>  DEFTIMEVAR (TV_JIT_ACQUIRING_MUTEX   , "acquiring JIT mutex")
>  DEFTIMEVAR (TV_JIT_CLIENT_CODE   , "JIT client code")
> +DEFTIMEVAR (TV_TREE_PATH_SPLIT  , "tree path split")
> diff --git a/gcc/tracer.c b/gcc/tracer.c
> index 941dc20..c7b5150 100644
> --- a/gcc/tracer.c
> +++ b/gcc/tracer.c
> @@ -51,9 +51,9 @@
>  #include "tree-inline.h"
>  #include "cfgloop.h"
>  #include "fibonacci_heap.h"
> +#include "tracer.h"
>
>  static int count_insns (basic_block);
> -static bool ignore_bb_p (const_basic_block);
>  static bool better_p (const_edge, const_edge);
>  static edge find_best_successor (basic_block);
>  static edge find_best_predecessor (basic_block);
> @@ -85,7 +85,7 @@ bb_seen_p (basic_block bb)
>  }
>
>  /* Return true if we should ignore the basic block for purposes of tracing.
> */
> -static bool
> +bool
>  ignore_bb_p (const_basic_block bb)
>  {
>    if (bb->index < NUM_FIXED_BLOCKS)
> @@ -226,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace)
>    return i;
>  }
>
> +/* Duplicate block BB2, placing it after BB in the CFG.  Return the
> +   newly created block.  */
> +basic_block
> +transform_duplicate (basic_block bb, basic_block bb2)
> +{
> +  edge e;
> +  basic_block copy;
> +
> +  e = find_edge (bb, bb2);
> +
> +  copy = duplicate_block (bb2, e, bb);
> +  flush_pending_stmts (e);
> +
> +  add_phi_args_after_copy (&copy, 1, NULL);
> +
> +  return (copy);
> +}
> +
>  /* Look for basic blocks in frequency order, construct traces and tail
> duplicate
>     if profitable.  */
>
> @@ -321,17 +339,8 @@ tail_duplicate (void)
>                  entries or at least rotate the loop.  */
>               && bb2->loop_father->header != bb2)
>             {
> -             edge e;
> -             basic_block copy;
> -
> -             nduplicated += counts [bb2->index];
> -
> -             e = find_edge (bb, bb2);
> -
> -             copy = duplicate_block (bb2, e, bb);
> -             flush_pending_stmts (e);
> -
> -             add_phi_args_after_copy (&copy, 1, NULL);
> +              nduplicated += counts [bb2->index];
> +              basic_block copy = transform_duplicate (bb, bb2);
>
>               /* Reconsider the original copy of block we've duplicated.
>                  Removing the most common predecessor may make it to be
> diff --git a/gcc/tracer.h b/gcc/tracer.h
> new file mode 100644
> index 0000000..454d3b7
> --- /dev/null
> +++ b/gcc/tracer.h
> @@ -0,0 +1,26 @@
> +/* Header file for Tracer.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> + for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_TRACER_H
> +#define GCC_TRACER_H
> +
> +extern basic_block transform_duplicate (basic_block bb, basic_block bb2);
> +extern bool ignore_bb_p (const_basic_block bb);
> +
> +#endif /* GCC_TRaCER_H */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 49e22a9..6963acc 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -390,6 +390,7 @@ extern gimple_opt_pass *make_pass_tree_loop_done
> (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_path_split (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-path-split.c b/gcc/tree-ssa-path-split.c
> new file mode 100644
> index 0000000..33dbc4d
> --- /dev/null
> +++ b/gcc/tree-ssa-path-split.c
> @@ -0,0 +1,254 @@
> +/* Support routines for Path Splitting.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +   Contributed by Ajit Kumar Agarwal <[email protected]>.
> +
> + This file is part of GCC.
> +
> + GCC is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3, or (at your option)
> + any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "cfganal.h"
> +#include "cfgloop.h"
> +#include "gimple-iterator.h"
> +#include "tracer.h"
> +
> +/* Given LOOP, if its latch has an immediate dominator that ends
> +   in a simple conditional, then search for a block that is a
> +   predecessor of the latch that falls through to the latch and
> +   has no other successors.
> +
> +   When found, put that block into TRACE[0] and the latch block into
> +   TRACE[1].  Otherwise do nothing.  */
> +
> +static void
> +find_trace_loop_latch_same_as_join_blk (loop_p loop, basic_block *trace)
> +{
> +  basic_block latch = loop->latch;
> +
> +  /* We don't support path splitting if the latch has more than two
> +     predecessors.  */
> +  if (EDGE_COUNT (latch->preds) == 2)
> +    {
> +      basic_block bb = get_immediate_dominator (CDI_DOMINATORS, latch);
> +      gimple *last = gsi_stmt (gsi_last_bb (bb));
> +
> +      /* The immediate dominator of the latch must end in a conditional.
> */
> +      if (last && gimple_code (last) != GIMPLE_COND)
> +       return;
> +
> +      /* This looks for a predecessor of the latch which has a single
> +        fallthru successor (that is the latch).  Only one of the blocks
> +        can match that pattern.
> +
> +        Do we need to check that E->src is a successor of BB?  */
> +      edge_iterator ei;
> +      edge e;
> +      FOR_EACH_EDGE (e, ei, latch->preds)
> +       {
> +         if (!single_succ_p (e->src)
> +             || !(single_succ_edge (e->src)->flags & EDGE_FALLTHRU))
> +           break;
> +         else
> +           {
> +             trace[0] = e->src;
> +             trace[1] = latch;
> +             break;
> +           }
> +       }
> +    }
> +}
> +
> +/* Return TRUE if BB is a reasonable block to duplicate by examining
> +   its size, false otherwise.  BB will always be a loop latch block.
> +
> +   Should this use the same tests as we do for jump threading?  */
> +
> +static bool
> +is_feasible_trace (basic_block bb)
> +{
> +  int num_stmt = 0;
> +  gimple_stmt_iterator gsi;
> +
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *stmt = gsi_stmt (gsi);
> +      if (!is_gimple_debug (stmt))
> +       num_stmt++;
> +    }
> +
> +  /* We may want to limit how many statements we copy.  */
> +  if (num_stmt > 1)
> +    return true;
> +
> +  return false;
> +}
> +
> +/* If the immediate dominator of the latch of the loop is
> +   block with conditional branch, then the loop latch  is
> +   duplicated to its predecessors path preserving the SSA
> +   semantics.
> +
> +   CFG before transformation.
> +
> +   <bb 6>:
> +      xk_35 = MIN_EXPR <xy_34, xc_32>;
> +      goto <bb 8>;
> +
> +   <bb 7>:
> +      xk_36 = MIN_EXPR <xy_34, xm_33>;
> +
> +   <bb 8>:
> +      # xk_4 = PHI <xk_35(6), xk_36(7)>
> +      xc_37 = xc_32 - xk_4;
> +      xm_38 = xm_33 - xk_4;
> +      xy_39 = xy_34 - xk_4;
> +
> +   CFG After Path Splitting transformation
> +   before cleanup phase.
> +
> +   <bb 7>:
> +     xk_35 = MIN_EXPR <xy_34, xc_32>;
> +
> +   <bb 8>:
> +     # xk_29 = PHI <xk_35(7)>
> +     xc_56 = xc_32 - xk_29;
> +     xm_57 = xm_33 - xk_29;
> +     xy_58 = xy_34 - xk_29;
> +     goto <bb 11>;
> +
> +   <bb 9>:
> +     xk_36 = MIN_EXPR <xy_34, xm_33>;
> +
> +   <bb 10>:
> +     # xk_4 = PHI <xk_36(9)>
> +     xc_37 = xc_32 - xk_4;
> +     xm_38 = xm_33 - xk_4;
> +     xy_39 = xy_34 - xk_4;
> +
> +  <bb 11>: .......  */
> +
> +static bool
> +perform_path_splitting ()
> +{
> +  bool changed = false;
> +  basic_block trace[2] = {NULL, NULL};
> +  loop_p loop;
> +
> +  loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
> +  initialize_original_copy_tables ();
> +  calculate_dominance_info (CDI_DOMINATORS);
> +
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      /* If we're optimizing for size, or should not duplicate
> +        the latch block for some other reason, then ignore this
> +        loop.  */
> +      if (ignore_bb_p (loop->latch))
> +       continue;
> +
> +      /* Get the latch node and its predecessor, storing them into
> +        trace[0] and trace[1] respectively.  */
> +      find_trace_loop_latch_same_as_join_blk (loop, trace);
> +
> +      if (trace[0] && trace[1] && is_feasible_trace (trace[1]))
> +       {
> +          if (dump_file && (dump_flags & TDF_DETAILS))
> +            fprintf (dump_file,
> +                    "Split path in loop: latch block %d, predecessor
> %d.\n",
> +                    trace[1]->index, trace[0]->index);
> +         transform_duplicate (trace[0], trace[1]);
> +         trace[0] = NULL;
> +         trace[1] = NULL;
> +         changed = true;
> +       }
> +    }
> +
> +  loop_optimizer_finalize ();
> +  free_original_copy_tables ();
> +  return changed;
> +}
> +
> +/* Main entry point for path splitting.  Returns TODO_cleanup_cfg if any
> +   paths where split, otherwise return zero.  */
> +
> +static unsigned int
> +execute_path_split (void)
> +{
> +  /* If we don't have at least 2 real blocks and backedges in the
> +     CFG, then there's no point in trying to perform path splitting.  */
> +  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1
> +      || !mark_dfs_back_edges ())
> +    return 0;
> +
> +  bool changed = perform_path_splitting();
> +  if (changed)
> +    {
> +      free_dominance_info (CDI_DOMINATORS);
> +      /* If we changed the CFG schedule loops for fixup by cleanup_cfg.  */
> +      if (current_loops)
> +       loops_state_set (LOOPS_NEED_FIXUP);
> +    }
> +
> +  return changed ? TODO_cleanup_cfg : 0;
> +
> +}
> +
> +static bool
> +gate_path_split(void)
> +{
> +  return flag_tree_path_split != 0;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_path_split =
> +{
> +  GIMPLE_PASS, /* type */
> +  "path-split", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_TREE_PATH_SPLIT, /* tv_id */
> +  PROP_ssa, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa, /* todo_flags_finish */
> +};
> +
> +class pass_path_split : public gimple_opt_pass
> +{
> +   public:
> +    pass_path_split (gcc::context *ctxt)
> +      : gimple_opt_pass (pass_data_path_split, ctxt)
> +    {}
> +   /* opt_pass methods: */
> +   opt_pass * clone () { return new pass_path_split (m_ctxt); }
> +   virtual bool gate (function *) { return gate_path_split (); }
> +   virtual unsigned int execute (function *) { return execute_path_split
> (); }
> +
> +}; // class pass_path_split
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_path_split (gcc::context *ctxt)
> +{
> +  return new pass_path_split (ctxt);
> +}
>

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

Reply via email to