On Fri, Oct 18, 2013 at 7:12 PM, Jeff Law <l...@redhat.com> wrote: > > > Back in 2011 I wrote code to detect cases when traversing a particular path > could be proven to trigger undefined behaviour (such as a null pointer > dereference). That original patch would find the control dependent edges > leading to the dereference and eliminate those edges. The result being those > paths with undefined behaviour simply vanished. > > The problem with that implementation is there could have been observable > side effects on those paths prior to triggering the undefined behaviour. > > At that time I speculated we could isolate the path (via block copying) with > undefined behaviour. On the duplicate path we'd replace the undefined > behaviour with a trap and remove only the outgoing edges. That would still > preserve any visible side effects on the undefined path up to the undefined > behaviour and we still often get to simplify the main path, though not as > much as the original version from 2011. > > That's precisely what this patch does. When we have a PHI which feeds a > memory dereference in the same block as the PHI and one (or more) of the PHI > args is NULL we duplicate the block, redirect incoming edges with the NULL > PHI args to the duplicate and change the statement with the memory > dereference to a trap. > > You can certainly ask is this actually helpful. In my experience, yes. I'm > regularly seeing stuff like > > x2 = PHI (x1, x1, 0, 0); > *x2 > > When we isolate the paths with NULL incoming values, we're left with > x2 = PHI (x1, x1) > > And we can propagate x1 into uses of x2. The block now has just one pred > and the current block can sometimes be merged into that single pred. I've > seen the combination of those two then lead to identification additional > common subexpressions. > > You might also ask how often such paths show up. In a bootstrap about a > week ago, this triggered ~3500 times. > > The code tries to be reasonably smart and re-use an existing duplicate when > there's multiple NULL values in a PHI node. That triggers often enough to > be worth the marginal amount of book keeping. > > The code doesn't yet handle the case where there's multiple dereferences in > the block. There's no guarantee the first dereference will be the first one > we find on the immediate use list. It's on my TODO list. > > > There's no reason this same framework couldn't be used to do the same thing > for an out of bounds array access. Similarly, it wouldn't be hard to issue > a warning when these transformations are applied. > > Posting at this time to get some feedback. Planning a formal RFA prior to > close of 4.9 stage1. > > And, of course, this bootstrapps and regression tests. 20030711-3 is > clearly optimized further with this patch, but I'll cobble together some > more testcases next week. > > > I'm not familiar with the options stuff, so if someone could look at that > more closely, I'd appreciate it. Similarly for the bits to create a new > pass structure. > > > Thoughts, comments, flames?
I wonder why this isn't part of the regular jump-threading code - after all the opportunity is to thead to __builtin_unreachable () ;) Otherwise the question is always where you'd place this pass and whether it enables jump threading or CSE opportunities (that at least it does). Richard. > > > > > > * Makefile.in (OBJS): Add tree-ssa-isolate-paths. > * common.opt (-ftree-isolate-paths): Add option and documentation. > * opts.c (default_options_table): Add OPT_ftree_isolate_paths. > * passes.def: Add path isolation pass. > * timevar.def (TV_TREE_SSA_ISOLATE_PATHS): New timevar. > * tree-pass.h (make_isolate_paths): Declare. > * tree-ssa-isolate-paths.c: New file. > > * gcc.dg/pr38984.c: Add -fno-tree-isolate-paths. > * gcc.dg/tree-ssa/20030711-3.c: Update expected output. > > * testsuite/libmudflap.c/fail20-frag.c: Add -fno-tree-isolate-paths. > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index ba39ac9..e7c18bc 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1412,6 +1412,7 @@ OBJS = \ > tree-ssa-dse.o \ > tree-ssa-forwprop.o \ > tree-ssa-ifcombine.o \ > + tree-ssa-isolate-paths.o \ > tree-ssa-live.o \ > tree-ssa-loop-ch.o \ > tree-ssa-loop-im.o \ > diff --git a/gcc/common.opt b/gcc/common.opt > index 1f11fcd..1fd1bcc 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2104,6 +2104,12 @@ foptimize-strlen > Common Report Var(flag_optimize_strlen) Optimization > Enable string length optimizations on trees > > +ftree-isolate-paths > +Common Report Var(flag_tree_isolate_paths) Init(1) Optimization > +Detect paths which trigger undefined behaviour. Isolate those paths from > +the main control flow and turn the statement with undefined behaviour into > +a trap. > + > ftree-loop-distribution > Common Report Var(flag_tree_loop_distribution) Optimization > Enable loop distribution on trees > diff --git a/gcc/opts.c b/gcc/opts.c > index 728d36d..b6ed0c2 100644 > --- a/gcc/opts.c > +++ b/gcc/opts.c > @@ -453,6 +453,7 @@ static const struct default_options > default_options_table[] = > { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 }, > { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 }, > { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 }, > + { OPT_LEVELS_1_PLUS, OPT_ftree_isolate_paths, NULL, 1 }, > { OPT_LEVELS_1_PLUS, OPT_ftree_slsr, NULL, 1 }, > > /* -O2 optimizations. */ > diff --git a/gcc/passes.def b/gcc/passes.def > index 84eb3f3..1f2c810 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -173,6 +173,13 @@ along with GCC; see the file COPYING3. If not see > only examines PHIs to discover const/copy propagation > opportunities. */ > NEXT_PASS (pass_phi_only_cprop); > + /* At this point the majority of const/copy propagations > + are exposed. Go ahead and identify paths that should never > + be executed in a conforming program and isolate those paths. > + > + This will simplify the CFG in the main path and provide PRE > + DOM and other passes additional opportunities to optimize. */ > + NEXT_PASS (pass_isolate_paths); > NEXT_PASS (pass_dse); > NEXT_PASS (pass_reassoc); > NEXT_PASS (pass_dce); > diff --git a/gcc/testsuite/gcc.dg/pr38984.c b/gcc/testsuite/gcc.dg/pr38984.c > index 11f1e7f..196ca39 100644 > --- a/gcc/testsuite/gcc.dg/pr38984.c > +++ b/gcc/testsuite/gcc.dg/pr38984.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -fno-delete-null-pointer-checks -fdump-tree-optimized" > } > +/* { dg-options "-O2 -fno-delete-null-pointer-checks -fdump-tree-optimized > -fno-tree-isolate-paths" } > * */ > > int f(int *p) > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030711-3.c > b/gcc/testsuite/gcc.dg/tree-ssa/20030711-3.c > index ec04e17..bcd5681 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/20030711-3.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030711-3.c > @@ -52,10 +52,10 @@ get_alias_set (t) > /* There should be one load of decl.rtl. */ > /* { dg-final { scan-tree-dump-times "decl\\.rtl" 1 "dom2"} } */ > > -/* There should be two loads of rtmem. */ > -/* { dg-final { scan-tree-dump-times "rtmem" 2 "dom2"} } */ > +/* There should be one load of rtmem. */ > +/* { dg-final { scan-tree-dump-times "rtmem" 1 "dom2"} } */ > > -/* There should be one load of alias. */ > -/* { dg-final { scan-tree-dump-times "->alias" 1 "dom2"} } */ > +/* There should be no load of alias. */ > +/* { dg-final { scan-tree-dump-not "->alias" "dom2"} } */ > > /* { dg-final { cleanup-tree-dump "dom2" } } */ > diff --git a/gcc/timevar.def b/gcc/timevar.def > index 5a880a8..119bd95 100644 > --- a/gcc/timevar.def > +++ b/gcc/timevar.def > @@ -144,6 +144,7 @@ DEFTIMEVAR (TV_TREE_SSA_INCREMENTAL , "tree SSA > incremental") > DEFTIMEVAR (TV_TREE_OPS , "tree operand scan") > DEFTIMEVAR (TV_TREE_SSA_DOMINATOR_OPTS , "dominator optimization") > DEFTIMEVAR (TV_TREE_SRA , "tree SRA") > +DEFTIMEVAR (TV_TREE_SSA_ISOLATE_PATHS , "tree SSA isolate paths") > DEFTIMEVAR (TV_TREE_CCP , "tree CCP") > DEFTIMEVAR (TV_TREE_PHI_CPROP , "tree PHI const/copy prop") > DEFTIMEVAR (TV_TREE_SPLIT_EDGES , "tree split crit edges") > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > index e72fe9a..30a3afd 100644 > --- a/gcc/tree-pass.h > +++ b/gcc/tree-pass.h > @@ -427,6 +427,7 @@ extern gimple_opt_pass *make_pass_sink_code > (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_fre (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_check_data_deps (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_copy_prop (gcc::context *ctxt); > +extern gimple_opt_pass *make_pass_isolate_paths (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_vrp (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_uncprop (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_return_slot (gcc::context *ctxt); > diff --git a/libmudflap/testsuite/libmudflap.c/fail20-frag.c > b/libmudflap/testsuite/libmudflap.c/fail20-frag.c > index 0dd8bb7..54200cc 100644 > --- a/libmudflap/testsuite/libmudflap.c/fail20-frag.c > +++ b/libmudflap/testsuite/libmudflap.c/fail20-frag.c > @@ -11,3 +11,4 @@ return 0; > /* { dg-output "Nearby object 1.*" } */ > /* { dg-output "mudflap object.*.NULL.*" } */ > /* { dg-do run { xfail *-*-* } } */ > +/* { dg-options "-lmudflap -fmudflap -fno-tree-isolate-paths" } */ > *** /dev/null Mon Sep 30 09:08:45 2013 > --- tree-ssa-isolate-paths.c Thu Oct 17 15:38:23 2013 > *************** > *** 0 **** > --- 1,399 ---- > + /* Detect paths through the CFG which can never be executed in a > conforming > + program and isolate them. > + > + Copyright (C) 2013 > + Free Software Foundation, Inc. > + > + This file is part of GCC. > + > + GCC is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3, or (at your option) > + any later version. > + > + GCC is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with GCC; see the file COPYING3. If not see > + <http://www.gnu.org/licenses/>. */ > + > + #include "config.h" > + #include "system.h" > + #include "coretypes.h" > + #include "tm.h" > + #include "tree.h" > + #include "flags.h" > + #include "tm_p.h" > + #include "basic-block.h" > + #include "output.h" > + #include "function.h" > + #include "tree-pretty-print.h" > + #include "gimple-pretty-print.h" > + #include "timevar.h" > + #include "tree-dump.h" > + #include "tree-flow.h" > + #include "tree-pass.h" > + #include "tree-ssa.h" > + #include "tree-ssa-propagate.h" > + #include "langhooks.h" > + #include "cfgloop.h" > + > + > + /* Note if we've optimized anything and thus want to run CFG cleanups > + when this pass is complete. Obviously if this pass does nothing, then > + CFG cleanups would be a waste of time. */ > + bool cfg_altered; > + > + /* BB when reached via incoming edge E will exhibit undefined behaviour > + at STMT. Isolate and optimize the path which exhibits undefined > + behaviour. > + > + Isolation is simple. Duplicate BB and redirect E to BB'. > + > + Optimization is simple as well. Replace STMT in BB' with an > + unconditional trap and remove all outgoing edges from BB'. > + > + DUPLICATE is a pre-existing duplicate, use it as BB' if it exists. > + > + Return BB'. */ > + > + basic_block > + isolate_path (basic_block bb, basic_block duplicate, edge e, gimple stmt) > + { > + gimple_stmt_iterator si, si2; > + edge_iterator ei; > + edge e2; > + > + > + /* First duplicate BB if we have not done so already and remove all > + the duplicate's outgoing edges as duplicate is going to > unconditionally > + trap. Removing the outgoing edges is both an optimization and > ensures > + we don't need to do any PHI node updates. */ > + if (!duplicate) > + { > + duplicate = duplicate_block (bb, NULL, NULL); > + for (ei = ei_start (duplicate->succs); (e2 = ei_safe_edge (ei)); ) > + remove_edge (e2); > + } > + > + /* Complete the isolation step by redirecting E to reach DUPLICATE. */ > + e2 = redirect_edge_and_branch (e, duplicate); > + flush_pending_stmts (e2); > + > + > + /* There may be more than one statement in DUPLICATE which exhibits > + undefined behaviour. Ultimately we want the first such statement in > + DUPLCIATE so that we're able to delete as much code as possible. > + > + So each time we discover undefined behaviour in DUPLICATE, search for > + the statement which triggers undefined behaviour. If found, then > transform > + the statement into a trap and delete everything after the statement. > If > + not found, then this particular instance was subsumed by an earlier > instance > + of undefined behaviour and there's nothing to do. > + > + This is made more complicated by the fact that we have STMT, which is > in BB > + rather than in DUPLICATE. So we set up two iterators, one for each > block and > + walk forward looking for STMT in BB, advancing each iterator at each > step. > + > + When we find STMT the second iterator should point to STMT's > equivalent in > + duplicate. If DUPLICATE ends before STMT is found in BB, then > there's nothing > + to do. > + > + Ignore labels and debug statements. */ > + for (si = gsi_start_nondebug_bb (bb), si2 = gsi_start_nondebug_bb > (duplicate); > + !gsi_end_p (si) && !gsi_end_p (si2); > + gsi_next_nondebug (&si), gsi_next_nondebug (&si2)) > + { > + while (gimple_code (gsi_stmt (si)) == GIMPLE_LABEL) > + gsi_next_nondebug (&si); > + while (gimple_code (gsi_stmt (si2)) == GIMPLE_LABEL) > + gsi_next_nondebug (&si2); > + if (gsi_stmt (si) == stmt) > + break; > + } > + > + /* This would be an indicator that we never found STMT in BB, which > should > + never happen. */ > + gcc_assert (!gsi_end_p (si)); > + > + /* If we did not run to the end of DUPLICATE, then SI points to STMT and > + SI2 points to the duplicate of STMT in DUPLICATE. */ > + if (!gsi_end_p (si2)) > + { > + /* SI2 is the iterator in the duplicate block and it now points > + to our victim statement. */ > + gimple_seq seq = NULL; > + tree t = build_call_expr_loc (0, > + builtin_decl_explicit (BUILT_IN_TRAP), > 0); > + gimplify_and_add (t, &seq); > + gsi_insert_before (&si2, seq, GSI_SAME_STMT); > + > + /* Now delete all remaining statements in this block. */ > + for (; !gsi_end_p (si2);) > + gsi_remove (&si2, true); > + } > + > + return duplicate; > + } > + > + /* Search the function for statements which, if executed, would cause > + the program to fault such as a dereference of a NULL pointer. > + > + Such a program can't be valid if such a statement was to execute > + according to ISO standards. > + > + We detect explicit NULL pointer dereferences as well as those implied > + by a PHI argument having a NULL value which unconditionally flows into > + a dereference in the same block as the PHI. > + > + In the former case we replace the offending statement with an > + unconditional trap and eliminate the outgoing edges from the > statement's > + basic block. This may expose secondary optimization opportunities. > + > + In the latter case, we isolate the path(s) with the NULL PHI > + feeding the dereference. We can then replace the offending statement > + and eliminate the outgoing edges in the duplicate. Again, this may > + expose secondary optimization opportunities. > + > + A warning for both cases may be advisable as well. > + > + Other statically detectable violations of the ISO standard could be > + handled in a similar way, such as out-of-bounds array indexing. */ > + > + static unsigned int > + tree_ssa_isolate_paths (void) > + { > + basic_block bb; > + > + initialize_original_copy_tables (); > + > + /* Search all the blocks for edges which, if traversed, will > + result in undefined behaviour. */ > + cfg_altered = false; > + FOR_EACH_BB (bb) > + { > + gimple_stmt_iterator si; > + > + /* First look for a PHI which sets a pointer to NULL and which > + is then dereferenced within BB. This is somewhat overly > + conservative, but probably catches most of the interesting > + cases. */ > + for (si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next (&si)) > + { > + gimple phi = gsi_stmt (si); > + tree lhs = gimple_phi_result (phi); > + > + /* If the result is not a pointer, then there is no need to > + examine the arguments. */ > + if (!POINTER_TYPE_P (TREE_TYPE (lhs))) > + continue; > + > + /* PHI produces a pointer result. See if any of the PHI's > + arguments are NULL. > + > + When we remove an edge, we want to reprocess the current > + index, hence the ugly way we update I for each iteration. */ > + basic_block duplicate = NULL; > + for (unsigned i = 0, next_i = 0; i < gimple_phi_num_args (phi); i > = next_i) > + { > + tree op = gimple_phi_arg_def (phi, i); > + > + next_i = i + 1; > + if (operand_equal_p (op, integer_zero_node, 0)) > + { > + gimple use_stmt; > + imm_use_iterator iter; > + > + /* We've got a NULL PHI argument. Now see if the > + PHI's result is dereferenced within BB. */ > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) > + { > + unsigned int j; > + bool removed_edge = false; > + > + /* We only care about uses in BB which are > assignements > + with memory operands. > + > + We could see if any uses are as function arguments > + when the callee has marked the argument as being > + non-null. */ > + if (gimple_bb (use_stmt) != bb > + || !gimple_has_mem_ops (use_stmt) > + || gimple_code (use_stmt) != GIMPLE_ASSIGN) > + continue; > + > + /* Look at each operand for a memory reference using > + LHS. */ > + for (j = 0; j < gimple_num_ops (use_stmt); j++) > + { > + tree op = gimple_op (use_stmt, j); > + > + if (op == NULL) > + continue; > + > + while (handled_component_p (op)) > + op = TREE_OPERAND (op, 0); > + > + if ((TREE_CODE (op) == MEM_REF > + || TREE_CODE (op) == INDIRECT_REF) > + && TREE_OPERAND (op, 0) == lhs) > + { > + edge e = gimple_phi_arg_edge (phi, i); > + duplicate = isolate_path (bb, duplicate, > + e, use_stmt); > + > + /* When we remove an incoming edge, we need to > + reprocess the Ith element. */ > + next_i = i; > + > + removed_edge = true; > + cfg_altered = true; > + break; > + } > + } > + > + /* We really want to process other immediate uses > here. > + There may be another immediate use of LHS in this > + block prior to the one we just turned into a trap. > + > + However, for now, we ignore other immediate uses. > */ > + if (removed_edge) > + BREAK_FROM_IMM_USE_STMT (iter); > + } > + } > + } > + } > + > + /* Now look at the statements in the block and see if any of > + them explicitly dereference a NULL pointer. Believe it or > + not, this does happen from time to time. */ > + bool found = false; > + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > + { > + gimple stmt = gsi_stmt (si); > + unsigned int i; > + > + /* We only care about assignments with memory operands. > + > + Again, we could see if any uses are as function arguments > + when the callee has marked the argument as being non-null. */ > + if (!gimple_has_mem_ops (stmt) > + || gimple_code (stmt) != GIMPLE_ASSIGN) > + continue; > + > + /* Look at each operand for a memory reference using an explicit > + NULL pointer. */ > + for (i = 0; i < gimple_num_ops (stmt); i++) > + { > + tree op = gimple_op (stmt, i); > + > + if (op == NULL) > + continue; > + > + while (handled_component_p (op)) > + op = TREE_OPERAND (op, 0); > + > + if (TREE_CODE (op) == MEM_REF || TREE_CODE (op) == > INDIRECT_REF) > + { > + tree op0 = TREE_OPERAND (op, 0); > + /* If the operand is NULL, then this block will trigger > undefined > + behaviour at runtime if we reach STMT. */ > + if (operand_equal_p (op0, integer_zero_node, 0)) > + { > + /* First insert a TRAP before this statement. */ > + gimple_seq seq = NULL; > + tree t > + = build_call_expr_loc (0,builtin_decl_explicit > (BUILT_IN_TRAP), 0); > + gimplify_and_add (t, &seq); > + gsi_insert_before (&si, seq, GSI_SAME_STMT); > + > + /* Now delete all remaining statements in this block. > */ > + for (gimple_stmt_iterator si2 = si; !gsi_end_p (si2);) > + gsi_remove (&si2, true); > + > + /* And finally, remove all outgoing edges from BB. */ > + edge e; > + for (edge_iterator ei = ei_start (bb->succs); > + (e = ei_safe_edge (ei)); ) > + remove_edge (e); > + > + /* Ignore any more operands on this statement and > + continue the statement iterator (which should > + terminate its loop immediately. */ > + found = true; > + cfg_altered = true; > + break; > + } > + } > + } > + > + if (found) > + break; > + } > + } > + free_original_copy_tables (); > + > + /* We scramble the CFG and loop structures a bit, clean up > + appropriately. We really should incrementally update the > + loop structures, in theory it shouldn't be that hard. */ > + if (cfg_altered) > + { > + free_dominance_info (CDI_DOMINATORS); > + free_dominance_info (CDI_POST_DOMINATORS); > + loops_state_set (LOOPS_NEED_FIXUP); > + return TODO_cleanup_cfg | TODO_update_ssa; > + } > + return 0; > + } > + > + static bool > + gate_isolate_paths (void) > + { > + /* If we do not have a suitable builtin function for the trap statement, > + then do not perform the optimization. */ > + return (flag_tree_isolate_paths != 0 > + && builtin_decl_explicit (BUILT_IN_TRAP) != NULL); > + > + } > + > + namespace { > + const pass_data pass_data_isolate_paths = > + { > + GIMPLE_PASS, /* type */ > + "isolatepaths", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + true, /* has_gate */ > + true, /* has_execute */ > + TV_TREE_SSA_ISOLATE_PATHS, /* tv_id */ > + ( PROP_cfg | PROP_ssa ), /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + TODO_verify_ssa, /* todo_flags_finish */ > + }; > + > + class pass_isolate_paths : public gimple_opt_pass > + { > + public: > + pass_isolate_paths (gcc::context *ctxt) > + : gimple_opt_pass (pass_data_isolate_paths, ctxt) > + {} > + > + /* opt_pass methods: */ > + opt_pass * clone () { return new pass_isolate_paths (m_ctxt); } > + bool gate () { return gate_isolate_paths (); } > + unsigned int execute () { return tree_ssa_isolate_paths (); } > + > + }; // class pass_uncprop > + } > + > + gimple_opt_pass * > + make_pass_isolate_paths (gcc::context *ctxt) > + { > + return new pass_isolate_paths (ctxt); > + } > + > + >