Re: [Patch] Improving jump-thread pass for PR 54742

Richard Biener Tue, 02 Dec 2014 02:15:49 -0800

On Mon, Dec 1, 2014 at 10:06 PM, Jeff Law <l...@redhat.com> wrote:
> On 11/25/14 14:16, Sebastian Pop wrote:
>>
>> Sebastian Pop wrote:
>>>
>>> >I will bootstrap and regression test this patch on x86_64-linux and
>>> >powerpc64-linux.  I will also run it on our internal benchmarks,
>>> > coremark, and
>>> >the llvm test-suite.
>>> >
>>> >I will also include a longer testcase that makes sure we do not regress
>>> > on
>>> >coremark.
>>
>> Done all the above.  Attached is the new patch with a new testcase.  I
>> have also
>> added verify_seme inspired by the recent patch adding verify_sese.
>>
>> Sebastian
>>
>>
>> 0001-extend-jump-thread-for-finite-state-automata-PR-5474.patch
>>
>>
>>  From ca222d5222fb976c7aa258d3e3c04e593f42f7a2 Mon Sep 17 00:00:00 2001
>> From: Sebastian Pop<s....@samsung.com>
>> Date: Fri, 26 Sep 2014 14:54:20 -0500
>> Subject: [PATCH] extend jump thread for finite state automata PR 54742
>>
>> Adapted from a patch from James Greenhalgh.
>>
>>         * params.def (max-fsm-thread-path-insns, max-fsm-thread-length,
>>         max-fsm-thread-paths): New.
>>
>>         * doc/invoke.texi (max-fsm-thread-path-insns,
>> max-fsm-thread-length,
>>         max-fsm-thread-paths): Documented.
>>
>>         * tree-cfg.c (split_edge_bb_loc): Export.
>>         * tree-cfg.h (split_edge_bb_loc): Declared extern.
>>
>>         * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore
>> the
>>         original value of cond when simplification fails.
>>         (fsm_find_thread_path): New.
>>         (fsm_find_control_statement_thread_paths): New.
>>         (fsm_thread_through_normal_block): Call
>> find_control_statement_thread_paths.
>>
>>         * tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print
>>         EDGE_START_FSM_THREAD.
>>         (verify_seme): New.
>>         (duplicate_seme_region): New.
>>         (thread_through_all_blocks): Generate code for
>> EDGE_START_FSM_THREAD edges
>>         calling gimple_duplicate_sese_region.
>>
>>         * tree-ssa-threadupdate.h (jump_thread_edge_type): Add
>> EDGE_START_FSM_THREAD.
>>
>>         * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New.
>>         * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c: New.
>> ---
>>   gcc/doc/invoke.texi                              |   12 ++
>>   gcc/params.def                                   |   15 ++
>>   gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c |   43 +++++
>>   gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c |  127 +++++++++++++
>>   gcc/tree-cfg.c                                   |    2 +-
>>   gcc/tree-cfg.h                                   |    1 +
>>   gcc/tree-ssa-threadedge.c                        |  215
>> +++++++++++++++++++++-
>>   gcc/tree-ssa-threadupdate.c                      |  201
>> +++++++++++++++++++-
>>   gcc/tree-ssa-threadupdate.h                      |    1 +
>>   9 files changed, 614 insertions(+), 3 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
>>
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 89edddb..074183f 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -10624,6 +10624,18 @@ large and significantly increase compile time at
>> optimization level
>>   @option{-O1} and higher.  This parameter is a maximum nubmer of
>> statements
>>   in a single generated constructor.  Default value is 5000.
>>
>> +@item max-fsm-thread-path-insns
>> +Maximum number of instructions to copy when duplicating blocks on a
>> +finite state automaton jump thread path.  The default is 100.
>> +
>> +@item max-fsm-thread-length
>> +Maximum number of basic blocks on a finite state automaton jump thread
>> +path.  The default is 10.
>> +
>> +@item max-fsm-thread-paths
>> +Maximum number of new jump thread paths to create for a finite state
>> +automaton.  The default is 50.
>
> Has there been any tuning on these defaults.  I don't have any strong
> opinions about what they ought to be, this is more to get any such
> information recorded on the lists for historical purposes.
>
> I think it's worth a note in the debug dump anytime you abort threading when
> you hit a limit.
>
> I'm a bit worried about compile-time impacts of the all the recursion, but
> I'm willing to wait and see if it turns out to be a problem in practice.


Please consider restricting it to -fexpensive-optimizations (-O2+).

Richard.

>
>> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
>> index 8b0b7b8..c9fe212 100644
>> --- a/gcc/tree-ssa-threadedge.c
>> +++ b/gcc/tree-ssa-threadedge.c
>> @@ -56,6 +56,8 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "params.h"
>>   #include "tree-ssa-threadedge.h"
>>   #include "builtins.h"
>> +#include "cfg.h"
>> +#include "cfganal.h"
>>
>>   /* To avoid code explosion due to jump threading, we limit the
>>      number of statements we are going to copy.  This variable
>> @@ -661,6 +663,7 @@ simplify_control_stmt_condition (edge e,
>>        rather than use a relational operator.  These are simpler to
>> handle.  */
>>     if (TREE_CODE (cond) == SSA_NAME)
>>       {
>> +      tree original_lhs = cond;
>>         cached_lhs = cond;
>>
>>         /* Get the variable's current value from the equivalence chains.
>> @@ -689,6 +692,12 @@ simplify_control_stmt_condition (edge e,
>>          pass specific callback to try and simplify it further.  */
>>         if (cached_lhs && ! is_gimple_min_invariant (cached_lhs))
>>           cached_lhs = (*simplify) (stmt, stmt);
>> +
>> +      /* We couldn't find an invariant.  But, callers of this
>> +        function may be able to do something useful with the
>> +        unmodified destination.  */
>> +      if (!cached_lhs)
>> +       cached_lhs = original_lhs;
>>       }
>>     else
>>       cached_lhs = NULL;
>
> Can't you just use COND rather than stuffing its value away into
> ORIGINAL_LHS?    CACHED_LHS may be better in some cases if it's an SSA_NAME
> (and it should be), but I doubt it matters in practice.
>
> Or is it the case that you have to have the original condition -- without
> any context sensitive equivalences used to "simplify" the condition.
>
>
>> @@ -948,6 +957,188 @@ thread_around_empty_blocks (edge taken_edge,
>>     return false;
>>   }
>>
>> +/* Return true if there is at least one path from START_BB to END_BB.
>> +   VISITED_BBS is used to make sure we don't fall into an infinite loop.
>> */
>> +
>> +static bool
>> +fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
>> +                     vec<basic_block, va_gc> *&path,
>> +                     hash_set<basic_block> *visited_bbs, int n_insns)
>> +{
>> +  if (start_bb == end_bb)
>> +    {
>> +      vec_safe_push (path, start_bb);
>> +      return true;
>> +    }
>> +
>> +  if (!visited_bbs->add (start_bb))
>> +    {
>> +      edge e;
>> +      edge_iterator ei;
>> +      FOR_EACH_EDGE (e, ei, start_bb->succs)
>> +       if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs,
>> n_insns))
>> +         {
>> +           vec_safe_push (path, start_bb);
>> +           return true;
>> +         }
>> +    }
>> +
>> +  return false;
>
> Update comment to indicate how PATH is used to return a path from START_BB
> to END_BB.
>
>
>
>> +                 for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
>> +                      gsi_next (&gsi))
>> +                   ++n_insns;
>
> Probably don't want to count labels and GIMPLE_NOPs.  Probably do want to
> count non-virtual PHIs since those may end up as a copies or constant
> initializations.
>
>> +                     if (j == 0)
>> +                       kind = EDGE_START_FSM_THREAD;
>> +                     else if (single_pred_p (e->src))
>> +                       kind = EDGE_NO_COPY_SRC_BLOCK;
>> +                     else {
>> +                       kind = EDGE_COPY_SRC_JOINER_BLOCK;
>> +                       ++joiners;
>> +                     }
>
> Presumably the mis-formatting was added when you tracked the # joiners.
> AFAICT that is a write-only variable and ought to be removed.  Along with
> the braces on the final ELSE which should restore proper formatting.
>
>
>> @@ -2343,6 +2493,55 @@ thread_through_all_blocks (bool
>> may_peel_loop_headers)
>>     threaded_blocks = BITMAP_ALLOC (NULL);
>>     memset (&thread_stats, 0, sizeof (thread_stats));
>>
>> +  for (i = 0; i < paths.length ();)
>
> Comment before this loop.  I can see what you're doing, but I'm already very
> familiar with this code.  Basically what are you looking for in this loop
> and what do you do?
>
> Overall I think this is very very close and I really like the overall
> direction.  There's a few minor issues noted above and with those addressed,
> I think we should be ready to go.
>
> Looking further out....
>
>
> Removing most of tree-ssa-threadupdate.c and using SEME duplication would be
> a huge step forward for making this code more understandable. I look forward
> to any work you do in this space in the future.
>
> Similarly moving towards a backwards dataflow driven model is definitely on
> my long term plan for this code.  Ideally with some kind of knob that says
> "optimize the trivial jump threads you can find and do so very quickly" (say
> by restricting the lookup to a single block) and a more expensive version.
>
> The simple version could run early which would solve some problems Jan has
> run into.  Running the simple version early would also help DOM/VRP.
>
> Ideally I want to disentangle threading from VRP and DOM -- most threading
> opportunities are fairly simple to find and exploit.  Yet right now we have
> to run DOM or VRP which are insanely expensive.
>
> Jeff

Re: [Patch] Improving jump-thread pass for PR 54742

Reply via email to