On Mon, Dec 1, 2014 at 10:06 PM, Jeff Law <l...@redhat.com> wrote: > On 11/25/14 14:16, Sebastian Pop wrote: >> >> Sebastian Pop wrote: >>> >>> >I will bootstrap and regression test this patch on x86_64-linux and >>> >powerpc64-linux. I will also run it on our internal benchmarks, >>> > coremark, and >>> >the llvm test-suite. >>> > >>> >I will also include a longer testcase that makes sure we do not regress >>> > on >>> >coremark. >> >> Done all the above. Attached is the new patch with a new testcase. I >> have also >> added verify_seme inspired by the recent patch adding verify_sese. >> >> Sebastian >> >> >> 0001-extend-jump-thread-for-finite-state-automata-PR-5474.patch >> >> >> From ca222d5222fb976c7aa258d3e3c04e593f42f7a2 Mon Sep 17 00:00:00 2001 >> From: Sebastian Pop<s....@samsung.com> >> Date: Fri, 26 Sep 2014 14:54:20 -0500 >> Subject: [PATCH] extend jump thread for finite state automata PR 54742 >> >> Adapted from a patch from James Greenhalgh. >> >> * params.def (max-fsm-thread-path-insns, max-fsm-thread-length, >> max-fsm-thread-paths): New. >> >> * doc/invoke.texi (max-fsm-thread-path-insns, >> max-fsm-thread-length, >> max-fsm-thread-paths): Documented. >> >> * tree-cfg.c (split_edge_bb_loc): Export. >> * tree-cfg.h (split_edge_bb_loc): Declared extern. >> >> * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore >> the >> original value of cond when simplification fails. >> (fsm_find_thread_path): New. >> (fsm_find_control_statement_thread_paths): New. >> (fsm_thread_through_normal_block): Call >> find_control_statement_thread_paths. >> >> * tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print >> EDGE_START_FSM_THREAD. >> (verify_seme): New. >> (duplicate_seme_region): New. >> (thread_through_all_blocks): Generate code for >> EDGE_START_FSM_THREAD edges >> calling gimple_duplicate_sese_region. >> >> * tree-ssa-threadupdate.h (jump_thread_edge_type): Add >> EDGE_START_FSM_THREAD. >> >> * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New. >> * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c: New. >> --- >> gcc/doc/invoke.texi | 12 ++ >> gcc/params.def | 15 ++ >> gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 43 +++++ >> gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 127 +++++++++++++ >> gcc/tree-cfg.c | 2 +- >> gcc/tree-cfg.h | 1 + >> gcc/tree-ssa-threadedge.c | 215 >> +++++++++++++++++++++- >> gcc/tree-ssa-threadupdate.c | 201 >> +++++++++++++++++++- >> gcc/tree-ssa-threadupdate.h | 1 + >> 9 files changed, 614 insertions(+), 3 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c >> >> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >> index 89edddb..074183f 100644 >> --- a/gcc/doc/invoke.texi >> +++ b/gcc/doc/invoke.texi >> @@ -10624,6 +10624,18 @@ large and significantly increase compile time at >> optimization level >> @option{-O1} and higher. This parameter is a maximum nubmer of >> statements >> in a single generated constructor. Default value is 5000. >> >> +@item max-fsm-thread-path-insns >> +Maximum number of instructions to copy when duplicating blocks on a >> +finite state automaton jump thread path. The default is 100. >> + >> +@item max-fsm-thread-length >> +Maximum number of basic blocks on a finite state automaton jump thread >> +path. The default is 10. >> + >> +@item max-fsm-thread-paths >> +Maximum number of new jump thread paths to create for a finite state >> +automaton. The default is 50. > > Has there been any tuning on these defaults. I don't have any strong > opinions about what they ought to be, this is more to get any such > information recorded on the lists for historical purposes. > > I think it's worth a note in the debug dump anytime you abort threading when > you hit a limit. > > I'm a bit worried about compile-time impacts of the all the recursion, but > I'm willing to wait and see if it turns out to be a problem in practice.
Please consider restricting it to -fexpensive-optimizations (-O2+). Richard. > >> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c >> index 8b0b7b8..c9fe212 100644 >> --- a/gcc/tree-ssa-threadedge.c >> +++ b/gcc/tree-ssa-threadedge.c >> @@ -56,6 +56,8 @@ along with GCC; see the file COPYING3. If not see >> #include "params.h" >> #include "tree-ssa-threadedge.h" >> #include "builtins.h" >> +#include "cfg.h" >> +#include "cfganal.h" >> >> /* To avoid code explosion due to jump threading, we limit the >> number of statements we are going to copy. This variable >> @@ -661,6 +663,7 @@ simplify_control_stmt_condition (edge e, >> rather than use a relational operator. These are simpler to >> handle. */ >> if (TREE_CODE (cond) == SSA_NAME) >> { >> + tree original_lhs = cond; >> cached_lhs = cond; >> >> /* Get the variable's current value from the equivalence chains. >> @@ -689,6 +692,12 @@ simplify_control_stmt_condition (edge e, >> pass specific callback to try and simplify it further. */ >> if (cached_lhs && ! is_gimple_min_invariant (cached_lhs)) >> cached_lhs = (*simplify) (stmt, stmt); >> + >> + /* We couldn't find an invariant. But, callers of this >> + function may be able to do something useful with the >> + unmodified destination. */ >> + if (!cached_lhs) >> + cached_lhs = original_lhs; >> } >> else >> cached_lhs = NULL; > > Can't you just use COND rather than stuffing its value away into > ORIGINAL_LHS? CACHED_LHS may be better in some cases if it's an SSA_NAME > (and it should be), but I doubt it matters in practice. > > Or is it the case that you have to have the original condition -- without > any context sensitive equivalences used to "simplify" the condition. > > >> @@ -948,6 +957,188 @@ thread_around_empty_blocks (edge taken_edge, >> return false; >> } >> >> +/* Return true if there is at least one path from START_BB to END_BB. >> + VISITED_BBS is used to make sure we don't fall into an infinite loop. >> */ >> + >> +static bool >> +fsm_find_thread_path (basic_block start_bb, basic_block end_bb, >> + vec<basic_block, va_gc> *&path, >> + hash_set<basic_block> *visited_bbs, int n_insns) >> +{ >> + if (start_bb == end_bb) >> + { >> + vec_safe_push (path, start_bb); >> + return true; >> + } >> + >> + if (!visited_bbs->add (start_bb)) >> + { >> + edge e; >> + edge_iterator ei; >> + FOR_EACH_EDGE (e, ei, start_bb->succs) >> + if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, >> n_insns)) >> + { >> + vec_safe_push (path, start_bb); >> + return true; >> + } >> + } >> + >> + return false; > > Update comment to indicate how PATH is used to return a path from START_BB > to END_BB. > > > >> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); >> + gsi_next (&gsi)) >> + ++n_insns; > > Probably don't want to count labels and GIMPLE_NOPs. Probably do want to > count non-virtual PHIs since those may end up as a copies or constant > initializations. > >> + if (j == 0) >> + kind = EDGE_START_FSM_THREAD; >> + else if (single_pred_p (e->src)) >> + kind = EDGE_NO_COPY_SRC_BLOCK; >> + else { >> + kind = EDGE_COPY_SRC_JOINER_BLOCK; >> + ++joiners; >> + } > > Presumably the mis-formatting was added when you tracked the # joiners. > AFAICT that is a write-only variable and ought to be removed. Along with > the braces on the final ELSE which should restore proper formatting. > > >> @@ -2343,6 +2493,55 @@ thread_through_all_blocks (bool >> may_peel_loop_headers) >> threaded_blocks = BITMAP_ALLOC (NULL); >> memset (&thread_stats, 0, sizeof (thread_stats)); >> >> + for (i = 0; i < paths.length ();) > > Comment before this loop. I can see what you're doing, but I'm already very > familiar with this code. Basically what are you looking for in this loop > and what do you do? > > Overall I think this is very very close and I really like the overall > direction. There's a few minor issues noted above and with those addressed, > I think we should be ready to go. > > Looking further out.... > > > Removing most of tree-ssa-threadupdate.c and using SEME duplication would be > a huge step forward for making this code more understandable. I look forward > to any work you do in this space in the future. > > Similarly moving towards a backwards dataflow driven model is definitely on > my long term plan for this code. Ideally with some kind of knob that says > "optimize the trivial jump threads you can find and do so very quickly" (say > by restricting the lookup to a single block) and a more expensive version. > > The simple version could run early which would solve some problems Jan has > run into. Running the simple version early would also help DOM/VRP. > > Ideally I want to disentangle threading from VRP and DOM -- most threading > opportunities are fairly simple to find and exploit. Yet right now we have > to run DOM or VRP which are insanely expensive. > > Jeff