Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On 05/16/2013 11:10 AM, Steve Ellcey wrote: On Thu, 2013-05-16 at 10:58 +0200, Richard Biener wrote: Hmm, not terribly happy with that wording, but that gives you an idea of what I'm after. When would someone set UPDATE_DOMINANCE to true and what are their responsibilities when they do so. Approved with the name change and a better comment for UPDATE_DOMINANCE. Btw, the function does _not_ handle arbitrary SEME regions - it only handles a single exit correctly and assumes no (SSA) data flows across the others. So I'd rather not rename it. Richard. Jeff I went ahead and checked in the change with the comment updates that Jeff wanted but left the name of the function as is. Ok. Thanks. jeff
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On Thu, 2013-05-16 at 10:58 +0200, Richard Biener wrote: > > > > Hmm, not terribly happy with that wording, but that gives you an idea of > > what I'm after. When would someone set UPDATE_DOMINANCE to true and what > > are their responsibilities when they do so. > > > > Approved with the name change and a better comment for UPDATE_DOMINANCE. > > Btw, the function does _not_ handle arbitrary SEME regions - it only handles > a single exit correctly and assumes no (SSA) data flows across the others. > So I'd rather not rename it. > > Richard. > > > Jeff I went ahead and checked in the change with the comment updates that Jeff wanted but left the name of the function as is. Steve Ellcey sell...@imgtec.com
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On Thu, May 16, 2013 at 5:38 AM, Jeff Law wrote: > On 05/15/2013 12:28 PM, Steve Ellcey wrote: >> >> Here is a patch that adds a flag to gimple_duplicate_sese_region to tell >> it whether or not to update the dominator information. I had to add the >> same flag to copy_bbs to make it all work. How does this look? I >> tested it with a bootstrap and test on x86 (with my optimization >> enabled) and got no regressions. >> >> 2013-05-15 Steve Ellcey >> >> * cfghooks.c (copy_bbs): Add update_dominance argument. >> * cfghooks.h (copy_bbs): Update prototype. >> * tree-cfg.c (gimple_duplicate_sese_region): >> Add update_dominance argument. >> * tree-flow.h (gimple_duplicate_sese_region): Update prototype. >> * tree-ssa-loop-ch.c (copy_loop_headers): Update >> gimple_duplicate_sese_region call. >> * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): >> Update copy_bbs call. >> * cfgloopmanip.c (duplicate_loop_to_header_edge): Ditto. >> * trans-mem.c (ipa_uninstrument_transaction): Ditto. > > So I'd change gimple_duplicate_sese_region to gimple_duplicate_seme region > per comments from others. > > Where you document UPDATE_DOMINANCE, I'd add something like: When > UPDATE_DOMINANCE is true, it is assumed that duplicating the region (or > copying the blocks for copy_bbs) may change the dominator tree in ways that > are not suitable for an incremental update and the caller is responsible for > destroying and recomputing the dominator tree. > > Hmm, not terribly happy with that wording, but that gives you an idea of > what I'm after. When would someone set UPDATE_DOMINANCE to true and what > are their responsibilities when they do so. > > Approved with the name change and a better comment for UPDATE_DOMINANCE. Btw, the function does _not_ handle arbitrary SEME regions - it only handles a single exit correctly and assumes no (SSA) data flows across the others. So I'd rather not rename it. Richard. > Jeff >
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On 05/15/2013 12:28 PM, Steve Ellcey wrote: Here is a patch that adds a flag to gimple_duplicate_sese_region to tell it whether or not to update the dominator information. I had to add the same flag to copy_bbs to make it all work. How does this look? I tested it with a bootstrap and test on x86 (with my optimization enabled) and got no regressions. 2013-05-15 Steve Ellcey * cfghooks.c (copy_bbs): Add update_dominance argument. * cfghooks.h (copy_bbs): Update prototype. * tree-cfg.c (gimple_duplicate_sese_region): Add update_dominance argument. * tree-flow.h (gimple_duplicate_sese_region): Update prototype. * tree-ssa-loop-ch.c (copy_loop_headers): Update gimple_duplicate_sese_region call. * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Update copy_bbs call. * cfgloopmanip.c (duplicate_loop_to_header_edge): Ditto. * trans-mem.c (ipa_uninstrument_transaction): Ditto. So I'd change gimple_duplicate_sese_region to gimple_duplicate_seme region per comments from others. Where you document UPDATE_DOMINANCE, I'd add something like: When UPDATE_DOMINANCE is true, it is assumed that duplicating the region (or copying the blocks for copy_bbs) may change the dominator tree in ways that are not suitable for an incremental update and the caller is responsible for destroying and recomputing the dominator tree. Hmm, not terribly happy with that wording, but that gives you an idea of what I'm after. When would someone set UPDATE_DOMINANCE to true and what are their responsibilities when they do so. Approved with the name change and a better comment for UPDATE_DOMINANCE. Jeff
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
Hi, > > I realize you're trying to do the same, but by using the SESE copier, you're > > implicitly trying to do an incremental update. I think you're going to > > really need to look at the assumptions of that code and verify that the > > switch FSA optimization doesn't violate those assumptions. > > Indeed. I'd rather have a flag to the SESE copier that tells it the region > is SEME and in that case make it not update dominators but leave that > to the caller (which means, recompute them). It seems the code already > handles ME regions if the other exits are "not important" (we don't have > to insert/update PHI nodes in those exit blocks), so it may be that there > are even more constraints on those unimportant exits due to the > iterative dominator update - I think that the entry block of the region > needs to dominate all exit destinations, otherwise get_dominated_by_region > is not working correctly. In your case one exit is a back-edge? We should > be able to add checking to the code to verify unimportant edges are > unimportant "enough". > > I'm sure Zdenek knows the limitations best. as far as I remember, indeed only the single-entry assumption is important (but I do not remember all that much, unfortunately), Zdenek
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On Wed, 2013-05-15 at 10:44 +0200, Richard Biener wrote: > Indeed. I'd rather have a flag to the SESE copier that tells it the region > is SEME and in that case make it not update dominators but leave that > to the caller (which means, recompute them). It seems the code already > handles ME regions if the other exits are "not important" (we don't have > to insert/update PHI nodes in those exit blocks), so it may be that there > are even more constraints on those unimportant exits due to the > iterative dominator update - I think that the entry block of the region > needs to dominate all exit destinations, otherwise get_dominated_by_region > is not working correctly. In your case one exit is a back-edge? We should > be able to add checking to the code to verify unimportant edges are > unimportant "enough". > > I'm sure Zdenek knows the limitations best. > > Richard. Here is a patch that adds a flag to gimple_duplicate_sese_region to tell it whether or not to update the dominator information. I had to add the same flag to copy_bbs to make it all work. How does this look? I tested it with a bootstrap and test on x86 (with my optimization enabled) and got no regressions. 2013-05-15 Steve Ellcey * cfghooks.c (copy_bbs): Add update_dominance argument. * cfghooks.h (copy_bbs): Update prototype. * tree-cfg.c (gimple_duplicate_sese_region): Add update_dominance argument. * tree-flow.h (gimple_duplicate_sese_region): Update prototype. * tree-ssa-loop-ch.c (copy_loop_headers): Update gimple_duplicate_sese_region call. * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Update copy_bbs call. * cfgloopmanip.c (duplicate_loop_to_header_edge): Ditto. * trans-mem.c (ipa_uninstrument_transaction): Ditto. diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c index 22b962b..18af38b 100644 --- a/gcc/cfghooks.c +++ b/gcc/cfghooks.c @@ -1287,7 +1287,8 @@ end: function assigns bbs into loops (copy of basic block bb is assigned to bb->loop_father->copy loop, so this must be set up correctly in advance) and updates dominators locally (LOOPS structure that contains the information - about dominators is passed to enable this). + about dominators is passed to enable this) if update_dominance is set to + true. BASE is the superloop to that basic block belongs; if its header or latch is copied, we do not set the new blocks as header or latch. @@ -1301,7 +1302,7 @@ end: void copy_bbs (basic_block *bbs, unsigned n, basic_block *new_bbs, edge *edges, unsigned num_edges, edge *new_edges, - struct loop *base, basic_block after) + struct loop *base, basic_block after, bool update_dominance) { unsigned i, j; basic_block bb, new_bb, dom_bb; @@ -1327,16 +1328,19 @@ copy_bbs (basic_block *bbs, unsigned n, basic_block *new_bbs, } /* Set dominators. */ - for (i = 0; i < n; i++) + if (update_dominance) { - bb = bbs[i]; - new_bb = new_bbs[i]; - - dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); - if (dom_bb->flags & BB_DUPLICATED) + for (i = 0; i < n; i++) { - dom_bb = get_bb_copy (dom_bb); - set_immediate_dominator (CDI_DOMINATORS, new_bb, dom_bb); + bb = bbs[i]; + new_bb = new_bbs[i]; + + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); + if (dom_bb->flags & BB_DUPLICATED) + { + dom_bb = get_bb_copy (dom_bb); + set_immediate_dominator (CDI_DOMINATORS, new_bb, dom_bb); + } } } diff --git a/gcc/cfghooks.h b/gcc/cfghooks.h index bff0a0c..ec595a5 100644 --- a/gcc/cfghooks.h +++ b/gcc/cfghooks.h @@ -201,7 +201,7 @@ extern void lv_add_condition_to_bb (basic_block, basic_block, basic_block, extern bool can_copy_bbs_p (basic_block *, unsigned); extern void copy_bbs (basic_block *, unsigned, basic_block *, edge *, unsigned, edge *, struct loop *, - basic_block); + basic_block, bool); void account_profile_record (struct profile_record *, int); diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c index 9581677..bc87755 100644 --- a/gcc/cfgloopmanip.c +++ b/gcc/cfgloopmanip.c @@ -1300,7 +1300,7 @@ duplicate_loop_to_header_edge (struct loop *loop, edge e, /* Copy bbs. */ copy_bbs (bbs, n, new_bbs, spec_edges, 2, new_spec_edges, loop, - place_after); + place_after, true); place_after = new_spec_edges[SE_LATCH]->src; if (flags & DLTHE_RECORD_COPY_NUMBER) diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c index 5cb8286..c66278c 100644 --- a/gcc/trans-mem.c +++ b/gcc/trans-mem.c @@ -3978,7 +3978,8 @@ ipa_uninstrument_transaction (struct tm_region *region, int n = queue.length (); basic_block *new_bbs = XNEWVEC (basic_block, n); - copy_bbs (queue.address (), n, new_bbs, NULL, 0, NULL, NULL, t
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On 05/15/2013 02:44 AM, Richard Biener wrote: Indeed. I'd rather have a flag to the SESE copier that tells it the region is SEME and in that case make it not update dominators but leave that to the caller (which means, recompute them). It seems the code already handles ME regions if the other exits are "not important" (we don't have to insert/update PHI nodes in those exit blocks), so it may be that there are even more constraints on those unimportant exits due to the iterative dominator update - I think that the entry block of the region needs to dominate all exit destinations, otherwise get_dominated_by_region is not working correctly. In your case one exit is a back-edge? We should be able to add checking to the code to verify unimportant edges are unimportant "enough". There's definitely a backedge -- the whole point is to thread across the back edge which allows us to statically determine where the switch statement on the next iteration. I'm sure Zdenek knows the limitations best. Yea, I relied on Zdenek for a lot of the graph stuff in the past :-0 jeff
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On Wed, May 15, 2013 at 10:44 AM, Richard Biener wrote: > Indeed. I'd rather have a flag to the SESE copier that tells it the region > is SEME and in that case make it not update dominators but leave that > to the caller (which means, recompute them). And rename the copier to SEME copier instead of SESE ;-) Ciao! Steven
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On Wed, May 15, 2013 at 7:02 AM, Jeff Law wrote: > On 05/14/2013 03:14 PM, Steve Ellcey wrote: >> >> >> While Jeff works on the threader, I was wondering if I could get approval >> for >> just the dominance.c part of the patch. This would allow me to use my >> pass as >> a dynamically loaded optimization pass. But without this change to >> dominance.c, >> the compiler aborts in iterate_fix_dominators when I do that. >> >> Steve Ellcey >> sell...@imgtec.com >> >> >> >> 2013-05-14 Steve Ellcey >> >> * dominance.c (iterate_fix_dominators): Add null check. > > I'd like to understand this a little more before we go forward with it. > > AFAICT, that routine is trying to incrementally update the dominators using > knowledge that the region you've copied is SESE. It's unclear what happens > in the region is not SESE. > > Threading mucks up the dominator tree in fairly serious ways and to the best > of my knowledge neither of the calls to thread across edges make any attempt > to incrementally update the dominator tree. They wipe it completely, they > also have to be quite careful in how they manipulate the various graphs to > avoid getting into an inconsistent state, then calling routines that assume > consistent state. > > > I realize you're trying to do the same, but by using the SESE copier, you're > implicitly trying to do an incremental update. I think you're going to > really need to look at the assumptions of that code and verify that the > switch FSA optimization doesn't violate those assumptions. Indeed. I'd rather have a flag to the SESE copier that tells it the region is SEME and in that case make it not update dominators but leave that to the caller (which means, recompute them). It seems the code already handles ME regions if the other exits are "not important" (we don't have to insert/update PHI nodes in those exit blocks), so it may be that there are even more constraints on those unimportant exits due to the iterative dominator update - I think that the entry block of the region needs to dominate all exit destinations, otherwise get_dominated_by_region is not working correctly. In your case one exit is a back-edge? We should be able to add checking to the code to verify unimportant edges are unimportant "enough". I'm sure Zdenek knows the limitations best. Richard. > Jeff
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On 05/14/2013 03:14 PM, Steve Ellcey wrote: While Jeff works on the threader, I was wondering if I could get approval for just the dominance.c part of the patch. This would allow me to use my pass as a dynamically loaded optimization pass. But without this change to dominance.c, the compiler aborts in iterate_fix_dominators when I do that. Steve Ellcey sell...@imgtec.com 2013-05-14 Steve Ellcey * dominance.c (iterate_fix_dominators): Add null check. I'd like to understand this a little more before we go forward with it. AFAICT, that routine is trying to incrementally update the dominators using knowledge that the region you've copied is SESE. It's unclear what happens in the region is not SESE. Threading mucks up the dominator tree in fairly serious ways and to the best of my knowledge neither of the calls to thread across edges make any attempt to incrementally update the dominator tree. They wipe it completely, they also have to be quite careful in how they manipulate the various graphs to avoid getting into an inconsistent state, then calling routines that assume consistent state. I realize you're trying to do the same, but by using the SESE copier, you're implicitly trying to do an incremental update. I think you're going to really need to look at the assumptions of that code and verify that the switch FSA optimization doesn't violate those assumptions. Jeff
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
While Jeff works on the threader, I was wondering if I could get approval for just the dominance.c part of the patch. This would allow me to use my pass as a dynamically loaded optimization pass. But without this change to dominance.c, the compiler aborts in iterate_fix_dominators when I do that. Steve Ellcey sell...@imgtec.com 2013-05-14 Steve Ellcey * dominance.c (iterate_fix_dominators): Add null check. diff --git a/gcc/dominance.c b/gcc/dominance.c index 5c96dad..d858ad1 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -1251,6 +1251,7 @@ iterate_fix_dominators (enum cdi_direction dir, vec bbs, struct pointer_map_t *map; int *parent, *son, *brother; unsigned int dir_index = dom_convert_dir_to_idx (dir); + void **slot; /* We only support updating dominators. There are some problems with updating postdominators (need to add fake edges from infinite loops @@ -1357,7 +1358,10 @@ iterate_fix_dominators (enum cdi_direction dir, vec bbs, if (dom == bb) continue; - dom_i = (size_t) *pointer_map_contains (map, dom); + slot = pointer_map_contains (map, dom); + if (slot == NULL) + continue; + dom_i = (size_t) *slot; /* Do not include parallel edges to G. */ if (!bitmap_set_bit ((bitmap) g->vertices[dom_i].data, i))
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On Mon, May 13, 2013 at 10:28 PM, Jeff Law wrote: > On 05/13/2013 02:16 PM, Steve Ellcey wrote: >> >> Here is the latest version of my SSA optimization pass to do the switch >> statement optimization described in PR 54742 (core_state_transition from >> coremark). >> >> I have tested this optimization with a x86 bootstrap and GCC test run >> with no errors and tested the MIPS cross compiler with no errors. >> Because of that I decided to submit it as a statically linked >> optimization pass instead of a dynamically loaded one, though I did keep >> the ifdefs for using it as a dynamic pass in the code. They could be >> removed if this patch is approved as a statically linked pass. Also, >> while this patch shows the optimization only being turned on with the >> -ftree-switch-shortcut flag, my bootstrap and testing had it turned on >> for all -O2 optimizations in order to maximize the testing. >> We may want to turn this on for -O3 and/or for >> -fexpensive-optimizations. >> >> I had to make one change to dominance.c in order to avoid some compiler >> aborts where it was dereferencing a null pointer. I believe this was >> happening because I am calling gimple_duplicate_sese_region with regions >> that are not really SESE. Because I am doing this, I regenerate the cfg >> and SSA information after each call, but I also had to change >> iterate_fix_dominators to fix the abort. Another way we might want to >> fix this would be to pass a flag to gimple_duplicate_sese_region that >> tells it whether or not we want it to recalculate the dominance >> information at all. If set to false, it would assume the caller will >> take care of it. >> >> Opinions? OK to checkin? >> >> Steve Ellcey >> sell...@imgtec.com >> >> >> 2013-05-13 Steve Ellcey >> >> PR tree-optimization/54742 >> * Makefile.in (OBJS): Add tree-switch-shortcut.o. >> * common.opt (ftree-switch-shortcut): New. >> * dominance.c (iterate_fix_dominators): Add null check. >> * params.def (PARAM_MAX_SWITCH_INSNS): New. >> (PARAM_MAX_SWITCH_PATHS): New. >> * passes.c (init_optimization_passes): Add pass_switch_shortcut. >> * timevar.def (TV_SWITCH_SHORTCUT): New. >> * tree-pass.c (pass_switch_shortcut): New. >> * tree-switch-shortcut.c: New file. > > I was looking at this last week (stuck for hours on tarmac at BWI). > > I think we should fix this in the threader rather than doing a special > purpose pass. This is primarily because we get it for free if we address > one limitation in the threader (some of the other issues I was concerned > about don't apply). > > Specifically if we look at thread_around_empty_block we have: > > /* This block must have a single predecessor (E->dest). */ > if (!single_pred_p (bb)) > return NULL; > > /* This block must have more than one successor. */ > if (single_succ_p (bb)) > return NULL; > > /* This block can have no PHI nodes. This is overly conservative. */ > if (!gsi_end_p (gsi_start_phis (bb))) > return NULL; > > > The test that the block have > 1 successor and no PHI nodes are the keys. > > ;; basic block 17, loop depth 1 > ;;pred: 9 > ;;13 > ;;16 > ;;15 > ;;4 > ;;12 > ;;7 > ;;14 > ;;10 > ;;5 > ;;6 > ;;8 > ;;11 > # state_1 = PHI <0(9), 2(13), 3(16), 3(15), state_36(4), 1(12), 0(7), > 2(14), 1(10), 1(5), 0(6), 2(8), 3(11)> > # .MEM_4 = PHI <.MEM_29(9), .MEM_20(13), .MEM_24(16), .MEM_29(15), > .MEM_29(4), .MEM_29(12), .MEM_12(7), .MEM_29(14), .MEM_16(10), .MEM_29(5), > .MEM_29(6), .MEM_29(8), .MEM_29(11)> > : > str_25 = str_37 + 1; > # VUSE <.MEM_4> > _8 = MEM[base: str_25, offset: 0B]; > if (_8 != 0) > goto ; > else > goto ; > ;;succ: 18 > ;;19 > > ;; basic block 18, loop depth 1 > ;;pred: 17 > goto ; > ;;succ: 4 > > > > bb will be block #18. In theory we just do > if (single_succ_p (bb)) > bb = single_succ_edge (bb)->dest; > > And allow PHI nodes in this specific instance. > > That's enough to allow existing code to identify the potential jump > threading candidates -- and more importantly, it's much more general. > > > > The updater needs copy bb17 & bb4. bb17' will transfer to bb4' which will > directly transfer to the destination of the switch. > > The advantage of doing this in the threader is it'll help more than just the > switch statement case. It's probably highly likely that we're missing cases > to thread through empty blocks with PHIs. > > I've started cobbling together some of the updating code. So far it doesn't > look too terrible. I agree - thanks for looking at this! Richard. > > Jeff >
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On 05/13/2013 02:16 PM, Steve Ellcey wrote: Here is the latest version of my SSA optimization pass to do the switch statement optimization described in PR 54742 (core_state_transition from coremark). I have tested this optimization with a x86 bootstrap and GCC test run with no errors and tested the MIPS cross compiler with no errors. Because of that I decided to submit it as a statically linked optimization pass instead of a dynamically loaded one, though I did keep the ifdefs for using it as a dynamic pass in the code. They could be removed if this patch is approved as a statically linked pass. Also, while this patch shows the optimization only being turned on with the -ftree-switch-shortcut flag, my bootstrap and testing had it turned on for all -O2 optimizations in order to maximize the testing. We may want to turn this on for -O3 and/or for -fexpensive-optimizations. I had to make one change to dominance.c in order to avoid some compiler aborts where it was dereferencing a null pointer. I believe this was happening because I am calling gimple_duplicate_sese_region with regions that are not really SESE. Because I am doing this, I regenerate the cfg and SSA information after each call, but I also had to change iterate_fix_dominators to fix the abort. Another way we might want to fix this would be to pass a flag to gimple_duplicate_sese_region that tells it whether or not we want it to recalculate the dominance information at all. If set to false, it would assume the caller will take care of it. Opinions? OK to checkin? Steve Ellcey sell...@imgtec.com 2013-05-13 Steve Ellcey PR tree-optimization/54742 * Makefile.in (OBJS): Add tree-switch-shortcut.o. * common.opt (ftree-switch-shortcut): New. * dominance.c (iterate_fix_dominators): Add null check. * params.def (PARAM_MAX_SWITCH_INSNS): New. (PARAM_MAX_SWITCH_PATHS): New. * passes.c (init_optimization_passes): Add pass_switch_shortcut. * timevar.def (TV_SWITCH_SHORTCUT): New. * tree-pass.c (pass_switch_shortcut): New. * tree-switch-shortcut.c: New file. I was looking at this last week (stuck for hours on tarmac at BWI). I think we should fix this in the threader rather than doing a special purpose pass. This is primarily because we get it for free if we address one limitation in the threader (some of the other issues I was concerned about don't apply). Specifically if we look at thread_around_empty_block we have: /* This block must have a single predecessor (E->dest). */ if (!single_pred_p (bb)) return NULL; /* This block must have more than one successor. */ if (single_succ_p (bb)) return NULL; /* This block can have no PHI nodes. This is overly conservative. */ if (!gsi_end_p (gsi_start_phis (bb))) return NULL; The test that the block have > 1 successor and no PHI nodes are the keys. ;; basic block 17, loop depth 1 ;;pred: 9 ;;13 ;;16 ;;15 ;;4 ;;12 ;;7 ;;14 ;;10 ;;5 ;;6 ;;8 ;;11 # state_1 = PHI <0(9), 2(13), 3(16), 3(15), state_36(4), 1(12), 0(7), 2(14), 1(10), 1(5), 0(6), 2(8), 3(11)> # .MEM_4 = PHI <.MEM_29(9), .MEM_20(13), .MEM_24(16), .MEM_29(15), .MEM_29(4), .MEM_29(12), .MEM_12(7), .MEM_29(14), .MEM_16(10), .MEM_29(5), .MEM_29(6), .MEM_29(8), .MEM_29(11)> : str_25 = str_37 + 1; # VUSE <.MEM_4> _8 = MEM[base: str_25, offset: 0B]; if (_8 != 0) goto ; else goto ; ;;succ: 18 ;;19 ;; basic block 18, loop depth 1 ;;pred: 17 goto ; ;;succ: 4 bb will be block #18. In theory we just do if (single_succ_p (bb)) bb = single_succ_edge (bb)->dest; And allow PHI nodes in this specific instance. That's enough to allow existing code to identify the potential jump threading candidates -- and more importantly, it's much more general. The updater needs copy bb17 & bb4. bb17' will transfer to bb4' which will directly transfer to the destination of the switch. The advantage of doing this in the threader is it'll help more than just the switch statement case. It's probably highly likely that we're missing cases to thread through empty blocks with PHIs. I've started cobbling together some of the updating code. So far it doesn't look too terrible. Jeff
[PATCH] New switch optimization pass (PR tree-optimization/54742)
Here is the latest version of my SSA optimization pass to do the switch statement optimization described in PR 54742 (core_state_transition from coremark). I have tested this optimization with a x86 bootstrap and GCC test run with no errors and tested the MIPS cross compiler with no errors. Because of that I decided to submit it as a statically linked optimization pass instead of a dynamically loaded one, though I did keep the ifdefs for using it as a dynamic pass in the code. They could be removed if this patch is approved as a statically linked pass. Also, while this patch shows the optimization only being turned on with the -ftree-switch-shortcut flag, my bootstrap and testing had it turned on for all -O2 optimizations in order to maximize the testing. We may want to turn this on for -O3 and/or for -fexpensive-optimizations. I had to make one change to dominance.c in order to avoid some compiler aborts where it was dereferencing a null pointer. I believe this was happening because I am calling gimple_duplicate_sese_region with regions that are not really SESE. Because I am doing this, I regenerate the cfg and SSA information after each call, but I also had to change iterate_fix_dominators to fix the abort. Another way we might want to fix this would be to pass a flag to gimple_duplicate_sese_region that tells it whether or not we want it to recalculate the dominance information at all. If set to false, it would assume the caller will take care of it. Opinions? OK to checkin? Steve Ellcey sell...@imgtec.com 2013-05-13 Steve Ellcey PR tree-optimization/54742 * Makefile.in (OBJS): Add tree-switch-shortcut.o. * common.opt (ftree-switch-shortcut): New. * dominance.c (iterate_fix_dominators): Add null check. * params.def (PARAM_MAX_SWITCH_INSNS): New. (PARAM_MAX_SWITCH_PATHS): New. * passes.c (init_optimization_passes): Add pass_switch_shortcut. * timevar.def (TV_SWITCH_SHORTCUT): New. * tree-pass.c (pass_switch_shortcut): New. * tree-switch-shortcut.c: New file. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 903125e..db0ffcb 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1399,6 +1399,7 @@ OBJS = \ tree-scalar-evolution.o \ tree-sra.o \ tree-switch-conversion.o \ + tree-switch-shortcut.o \ tree-ssa-address.o \ tree-ssa-alias.o \ tree-ssa-ccp.o \ diff --git a/gcc/common.opt b/gcc/common.opt index 4c7933e..e028e2d 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2160,6 +2160,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +ftree-switch-shortcut +Common Report Var(flag_tree_switch_shortcut) Init(0) Optimization +Do fancy switch statement shortcutting + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA->normal pass diff --git a/gcc/dominance.c b/gcc/dominance.c index 5c96dad..d858ad1 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -1251,6 +1251,7 @@ iterate_fix_dominators (enum cdi_direction dir, vec bbs, struct pointer_map_t *map; int *parent, *son, *brother; unsigned int dir_index = dom_convert_dir_to_idx (dir); + void **slot; /* We only support updating dominators. There are some problems with updating postdominators (need to add fake edges from infinite loops @@ -1357,7 +1358,10 @@ iterate_fix_dominators (enum cdi_direction dir, vec bbs, if (dom == bb) continue; - dom_i = (size_t) *pointer_map_contains (map, dom); + slot = pointer_map_contains (map, dom); + if (slot == NULL) + continue; + dom_i = (size_t) *slot; /* Do not include parallel edges to G. */ if (!bitmap_set_bit ((bitmap) g->vertices[dom_i].data, i)) diff --git a/gcc/params.def b/gcc/params.def index 3c52651..bdabe07 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1020,6 +1020,20 @@ DEFPARAM (PARAM_MAX_SLSR_CANDIDATE_SCAN, "strength reduction", 50, 1, 99) +/* Maximum number of instructions to duplicate when shortcutting a switch. */ +DEFPARAM (PARAM_MAX_SWITCH_INSNS, + "max-switch-insns", + "Maximum number of instructions to duplicate when " + "shortcutting a switch statement", + 100, 1, 99) + +/* Maximum number of paths to duplicate when shortcutting a switch. */ +DEFPARAM (PARAM_MAX_SWITCH_PATHS, + "max-switch-paths", + "Maximum number of new paths to create when" + " shortcutting a switch statement", + 50, 1, 99) + /* Local variables: mode:c diff --git a/gcc/passes.c b/gcc/passes.c index fd67ee6..0fb826c 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -1416,6 +1416,7 @@ init_optimization_passes (void) NEXT_PASS (pass_call_cdce); NEXT_PASS (pass_cselim); NEXT_PASS (pass_tree_ifcombine); + NEXT_PASS (pass_switch_shortcut); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_ch); diff --git a/gcc/timevar.def b/gcc/timevar.def index 44f0eac