On 11/25/14 14:16, Sebastian Pop wrote:
Sebastian Pop wrote:
>I will bootstrap and regression test this patch on x86_64-linux and
>powerpc64-linux.  I will also run it on our internal benchmarks, coremark, and
>the llvm test-suite.
>
>I will also include a longer testcase that makes sure we do not regress on
>coremark.
Done all the above.  Attached is the new patch with a new testcase.  I have also
added verify_seme inspired by the recent patch adding verify_sese.

Sebastian


0001-extend-jump-thread-for-finite-state-automata-PR-5474.patch


 From ca222d5222fb976c7aa258d3e3c04e593f42f7a2 Mon Sep 17 00:00:00 2001
From: Sebastian Pop<s....@samsung.com>
Date: Fri, 26 Sep 2014 14:54:20 -0500
Subject: [PATCH] extend jump thread for finite state automata PR 54742

Adapted from a patch from James Greenhalgh.

        * params.def (max-fsm-thread-path-insns, max-fsm-thread-length,
        max-fsm-thread-paths): New.

        * doc/invoke.texi (max-fsm-thread-path-insns, max-fsm-thread-length,
        max-fsm-thread-paths): Documented.

        * tree-cfg.c (split_edge_bb_loc): Export.
        * tree-cfg.h (split_edge_bb_loc): Declared extern.

        * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the
        original value of cond when simplification fails.
        (fsm_find_thread_path): New.
        (fsm_find_control_statement_thread_paths): New.
        (fsm_thread_through_normal_block): Call 
find_control_statement_thread_paths.

        * tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print
        EDGE_START_FSM_THREAD.
        (verify_seme): New.
        (duplicate_seme_region): New.
        (thread_through_all_blocks): Generate code for EDGE_START_FSM_THREAD 
edges
        calling gimple_duplicate_sese_region.

        * tree-ssa-threadupdate.h (jump_thread_edge_type): Add 
EDGE_START_FSM_THREAD.

        * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New.
        * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c: New.
---
  gcc/doc/invoke.texi                              |   12 ++
  gcc/params.def                                   |   15 ++
  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c |   43 +++++
  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c |  127 +++++++++++++
  gcc/tree-cfg.c                                   |    2 +-
  gcc/tree-cfg.h                                   |    1 +
  gcc/tree-ssa-threadedge.c                        |  215 +++++++++++++++++++++-
  gcc/tree-ssa-threadupdate.c                      |  201 +++++++++++++++++++-
  gcc/tree-ssa-threadupdate.h                      |    1 +
  9 files changed, 614 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 89edddb..074183f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10624,6 +10624,18 @@ large and significantly increase compile time at 
optimization level
  @option{-O1} and higher.  This parameter is a maximum nubmer of statements
  in a single generated constructor.  Default value is 5000.

+@item max-fsm-thread-path-insns
+Maximum number of instructions to copy when duplicating blocks on a
+finite state automaton jump thread path.  The default is 100.
+
+@item max-fsm-thread-length
+Maximum number of basic blocks on a finite state automaton jump thread
+path.  The default is 10.
+
+@item max-fsm-thread-paths
+Maximum number of new jump thread paths to create for a finite state
+automaton.  The default is 50.
Has there been any tuning on these defaults. I don't have any strong opinions about what they ought to be, this is more to get any such information recorded on the lists for historical purposes.

I think it's worth a note in the debug dump anytime you abort threading when you hit a limit.

I'm a bit worried about compile-time impacts of the all the recursion, but I'm willing to wait and see if it turns out to be a problem in practice.


diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 8b0b7b8..c9fe212 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -56,6 +56,8 @@ along with GCC; see the file COPYING3.  If not see
  #include "params.h"
  #include "tree-ssa-threadedge.h"
  #include "builtins.h"
+#include "cfg.h"
+#include "cfganal.h"

  /* To avoid code explosion due to jump threading, we limit the
     number of statements we are going to copy.  This variable
@@ -661,6 +663,7 @@ simplify_control_stmt_condition (edge e,
       rather than use a relational operator.  These are simpler to handle.  */
    if (TREE_CODE (cond) == SSA_NAME)
      {
+      tree original_lhs = cond;
        cached_lhs = cond;

        /* Get the variable's current value from the equivalence chains.
@@ -689,6 +692,12 @@ simplify_control_stmt_condition (edge e,
         pass specific callback to try and simplify it further.  */
        if (cached_lhs && ! is_gimple_min_invariant (cached_lhs))
          cached_lhs = (*simplify) (stmt, stmt);
+
+      /* We couldn't find an invariant.  But, callers of this
+        function may be able to do something useful with the
+        unmodified destination.  */
+      if (!cached_lhs)
+       cached_lhs = original_lhs;
      }
    else
      cached_lhs = NULL;
Can't you just use COND rather than stuffing its value away into ORIGINAL_LHS? CACHED_LHS may be better in some cases if it's an SSA_NAME (and it should be), but I doubt it matters in practice.

Or is it the case that you have to have the original condition -- without any context sensitive equivalences used to "simplify" the condition.


@@ -948,6 +957,188 @@ thread_around_empty_blocks (edge taken_edge,
    return false;
  }

+/* Return true if there is at least one path from START_BB to END_BB.
+   VISITED_BBS is used to make sure we don't fall into an infinite loop.  */
+
+static bool
+fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
+                     vec<basic_block, va_gc> *&path,
+                     hash_set<basic_block> *visited_bbs, int n_insns)
+{
+  if (start_bb == end_bb)
+    {
+      vec_safe_push (path, start_bb);
+      return true;
+    }
+
+  if (!visited_bbs->add (start_bb))
+    {
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, start_bb->succs)
+       if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, n_insns))
+         {
+           vec_safe_push (path, start_bb);
+           return true;
+         }
+    }
+
+  return false;
Update comment to indicate how PATH is used to return a path from START_BB to END_BB.



+                 for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+                      gsi_next (&gsi))
+                   ++n_insns;
Probably don't want to count labels and GIMPLE_NOPs. Probably do want to count non-virtual PHIs since those may end up as a copies or constant initializations.

+                     if (j == 0)
+                       kind = EDGE_START_FSM_THREAD;
+                     else if (single_pred_p (e->src))
+                       kind = EDGE_NO_COPY_SRC_BLOCK;
+                     else {
+                       kind = EDGE_COPY_SRC_JOINER_BLOCK;
+                       ++joiners;
+                     }
Presumably the mis-formatting was added when you tracked the # joiners. AFAICT that is a write-only variable and ought to be removed. Along with the braces on the final ELSE which should restore proper formatting.


@@ -2343,6 +2493,55 @@ thread_through_all_blocks (bool may_peel_loop_headers)
    threaded_blocks = BITMAP_ALLOC (NULL);
    memset (&thread_stats, 0, sizeof (thread_stats));

+  for (i = 0; i < paths.length ();)
Comment before this loop. I can see what you're doing, but I'm already very familiar with this code. Basically what are you looking for in this loop and what do you do?

Overall I think this is very very close and I really like the overall direction. There's a few minor issues noted above and with those addressed, I think we should be ready to go.

Looking further out....


Removing most of tree-ssa-threadupdate.c and using SEME duplication would be a huge step forward for making this code more understandable. I look forward to any work you do in this space in the future.

Similarly moving towards a backwards dataflow driven model is definitely on my long term plan for this code. Ideally with some kind of knob that says "optimize the trivial jump threads you can find and do so very quickly" (say by restricting the lookup to a single block) and a more expensive version.

The simple version could run early which would solve some problems Jan has run into. Running the simple version early would also help DOM/VRP.

Ideally I want to disentangle threading from VRP and DOM -- most threading opportunities are fairly simple to find and exploit. Yet right now we have to run DOM or VRP which are insanely expensive.

Jeff

Reply via email to