On 11/25/14 14:16, Sebastian Pop wrote:
Sebastian Pop wrote:
>I will bootstrap and regression test this patch on x86_64-linux and
>powerpc64-linux. I will also run it on our internal benchmarks, coremark, and
>the llvm test-suite.
>
>I will also include a longer testcase that makes sure we do not regress on
>coremark.
Done all the above. Attached is the new patch with a new testcase. I have also
added verify_seme inspired by the recent patch adding verify_sese.
Sebastian
0001-extend-jump-thread-for-finite-state-automata-PR-5474.patch
From ca222d5222fb976c7aa258d3e3c04e593f42f7a2 Mon Sep 17 00:00:00 2001
From: Sebastian Pop<s....@samsung.com>
Date: Fri, 26 Sep 2014 14:54:20 -0500
Subject: [PATCH] extend jump thread for finite state automata PR 54742
Adapted from a patch from James Greenhalgh.
* params.def (max-fsm-thread-path-insns, max-fsm-thread-length,
max-fsm-thread-paths): New.
* doc/invoke.texi (max-fsm-thread-path-insns, max-fsm-thread-length,
max-fsm-thread-paths): Documented.
* tree-cfg.c (split_edge_bb_loc): Export.
* tree-cfg.h (split_edge_bb_loc): Declared extern.
* tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the
original value of cond when simplification fails.
(fsm_find_thread_path): New.
(fsm_find_control_statement_thread_paths): New.
(fsm_thread_through_normal_block): Call
find_control_statement_thread_paths.
* tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print
EDGE_START_FSM_THREAD.
(verify_seme): New.
(duplicate_seme_region): New.
(thread_through_all_blocks): Generate code for EDGE_START_FSM_THREAD
edges
calling gimple_duplicate_sese_region.
* tree-ssa-threadupdate.h (jump_thread_edge_type): Add
EDGE_START_FSM_THREAD.
* testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New.
* testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c: New.
---
gcc/doc/invoke.texi | 12 ++
gcc/params.def | 15 ++
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 43 +++++
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 127 +++++++++++++
gcc/tree-cfg.c | 2 +-
gcc/tree-cfg.h | 1 +
gcc/tree-ssa-threadedge.c | 215 +++++++++++++++++++++-
gcc/tree-ssa-threadupdate.c | 201 +++++++++++++++++++-
gcc/tree-ssa-threadupdate.h | 1 +
9 files changed, 614 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 89edddb..074183f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10624,6 +10624,18 @@ large and significantly increase compile time at
optimization level
@option{-O1} and higher. This parameter is a maximum nubmer of statements
in a single generated constructor. Default value is 5000.
+@item max-fsm-thread-path-insns
+Maximum number of instructions to copy when duplicating blocks on a
+finite state automaton jump thread path. The default is 100.
+
+@item max-fsm-thread-length
+Maximum number of basic blocks on a finite state automaton jump thread
+path. The default is 10.
+
+@item max-fsm-thread-paths
+Maximum number of new jump thread paths to create for a finite state
+automaton. The default is 50.
Has there been any tuning on these defaults. I don't have any strong
opinions about what they ought to be, this is more to get any such
information recorded on the lists for historical purposes.
I think it's worth a note in the debug dump anytime you abort threading
when you hit a limit.
I'm a bit worried about compile-time impacts of the all the recursion,
but I'm willing to wait and see if it turns out to be a problem in practice.
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 8b0b7b8..c9fe212 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -56,6 +56,8 @@ along with GCC; see the file COPYING3. If not see
#include "params.h"
#include "tree-ssa-threadedge.h"
#include "builtins.h"
+#include "cfg.h"
+#include "cfganal.h"
/* To avoid code explosion due to jump threading, we limit the
number of statements we are going to copy. This variable
@@ -661,6 +663,7 @@ simplify_control_stmt_condition (edge e,
rather than use a relational operator. These are simpler to handle. */
if (TREE_CODE (cond) == SSA_NAME)
{
+ tree original_lhs = cond;
cached_lhs = cond;
/* Get the variable's current value from the equivalence chains.
@@ -689,6 +692,12 @@ simplify_control_stmt_condition (edge e,
pass specific callback to try and simplify it further. */
if (cached_lhs && ! is_gimple_min_invariant (cached_lhs))
cached_lhs = (*simplify) (stmt, stmt);
+
+ /* We couldn't find an invariant. But, callers of this
+ function may be able to do something useful with the
+ unmodified destination. */
+ if (!cached_lhs)
+ cached_lhs = original_lhs;
}
else
cached_lhs = NULL;
Can't you just use COND rather than stuffing its value away into
ORIGINAL_LHS? CACHED_LHS may be better in some cases if it's an
SSA_NAME (and it should be), but I doubt it matters in practice.
Or is it the case that you have to have the original condition --
without any context sensitive equivalences used to "simplify" the condition.
@@ -948,6 +957,188 @@ thread_around_empty_blocks (edge taken_edge,
return false;
}
+/* Return true if there is at least one path from START_BB to END_BB.
+ VISITED_BBS is used to make sure we don't fall into an infinite loop. */
+
+static bool
+fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
+ vec<basic_block, va_gc> *&path,
+ hash_set<basic_block> *visited_bbs, int n_insns)
+{
+ if (start_bb == end_bb)
+ {
+ vec_safe_push (path, start_bb);
+ return true;
+ }
+
+ if (!visited_bbs->add (start_bb))
+ {
+ edge e;
+ edge_iterator ei;
+ FOR_EACH_EDGE (e, ei, start_bb->succs)
+ if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, n_insns))
+ {
+ vec_safe_push (path, start_bb);
+ return true;
+ }
+ }
+
+ return false;
Update comment to indicate how PATH is used to return a path from
START_BB to END_BB.
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+ gsi_next (&gsi))
+ ++n_insns;
Probably don't want to count labels and GIMPLE_NOPs. Probably do want
to count non-virtual PHIs since those may end up as a copies or constant
initializations.
+ if (j == 0)
+ kind = EDGE_START_FSM_THREAD;
+ else if (single_pred_p (e->src))
+ kind = EDGE_NO_COPY_SRC_BLOCK;
+ else {
+ kind = EDGE_COPY_SRC_JOINER_BLOCK;
+ ++joiners;
+ }
Presumably the mis-formatting was added when you tracked the # joiners.
AFAICT that is a write-only variable and ought to be removed. Along
with the braces on the final ELSE which should restore proper formatting.
@@ -2343,6 +2493,55 @@ thread_through_all_blocks (bool may_peel_loop_headers)
threaded_blocks = BITMAP_ALLOC (NULL);
memset (&thread_stats, 0, sizeof (thread_stats));
+ for (i = 0; i < paths.length ();)
Comment before this loop. I can see what you're doing, but I'm already
very familiar with this code. Basically what are you looking for in
this loop and what do you do?
Overall I think this is very very close and I really like the overall
direction. There's a few minor issues noted above and with those
addressed, I think we should be ready to go.
Looking further out....
Removing most of tree-ssa-threadupdate.c and using SEME duplication
would be a huge step forward for making this code more understandable. I
look forward to any work you do in this space in the future.
Similarly moving towards a backwards dataflow driven model is definitely
on my long term plan for this code. Ideally with some kind of knob that
says "optimize the trivial jump threads you can find and do so very
quickly" (say by restricting the lookup to a single block) and a more
expensive version.
The simple version could run early which would solve some problems Jan
has run into. Running the simple version early would also help DOM/VRP.
Ideally I want to disentangle threading from VRP and DOM -- most
threading opportunities are fairly simple to find and exploit. Yet
right now we have to run DOM or VRP which are insanely expensive.
Jeff