Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-16 Thread Jeff Law

On 05/16/2013 11:10 AM, Steve Ellcey wrote:

On Thu, 2013-05-16 at 10:58 +0200, Richard Biener wrote:



Hmm, not terribly happy with that wording, but that gives you an idea of
what I'm after.  When would someone set UPDATE_DOMINANCE to true and what
are their responsibilities when they do so.

Approved with the name change and a better comment for UPDATE_DOMINANCE.


Btw, the function does _not_ handle arbitrary SEME regions - it only handles
a single exit correctly and assumes no (SSA) data flows across the others.
So I'd rather not rename it.

Richard.


Jeff


I went ahead and checked in the change with the comment updates that
Jeff wanted but left the name of the function as is.

Ok.  Thanks.

jeff



Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-16 Thread Steve Ellcey
On Thu, 2013-05-16 at 10:58 +0200, Richard Biener wrote:

> >
> > Hmm, not terribly happy with that wording, but that gives you an idea of
> > what I'm after.  When would someone set UPDATE_DOMINANCE to true and what
> > are their responsibilities when they do so.
> >
> > Approved with the name change and a better comment for UPDATE_DOMINANCE.
> 
> Btw, the function does _not_ handle arbitrary SEME regions - it only handles
> a single exit correctly and assumes no (SSA) data flows across the others.
> So I'd rather not rename it.
> 
> Richard.
> 
> > Jeff

I went ahead and checked in the change with the comment updates that
Jeff wanted but left the name of the function as is.

Steve Ellcey
sell...@imgtec.com






Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-16 Thread Richard Biener
On Thu, May 16, 2013 at 5:38 AM, Jeff Law  wrote:
> On 05/15/2013 12:28 PM, Steve Ellcey wrote:
>>
>> Here is a patch that adds a flag to gimple_duplicate_sese_region to tell
>> it whether or not to update the dominator information.  I had to add the
>> same flag to copy_bbs to make it all work.  How does this look?  I
>> tested it with a bootstrap and test on x86 (with my optimization
>> enabled) and got no regressions.
>>
>> 2013-05-15  Steve Ellcey  
>>
>> * cfghooks.c (copy_bbs): Add update_dominance argument.
>> * cfghooks.h (copy_bbs): Update prototype.
>> * tree-cfg.c (gimple_duplicate_sese_region):
>> Add update_dominance argument.
>> * tree-flow.h (gimple_duplicate_sese_region): Update prototype.
>> * tree-ssa-loop-ch.c (copy_loop_headers): Update
>> gimple_duplicate_sese_region call.
>> * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
>> Update copy_bbs call.
>> * cfgloopmanip.c (duplicate_loop_to_header_edge): Ditto.
>> * trans-mem.c (ipa_uninstrument_transaction): Ditto.
>
> So I'd change gimple_duplicate_sese_region to gimple_duplicate_seme region
> per comments from others.
>
> Where you document UPDATE_DOMINANCE, I'd add something like: When
> UPDATE_DOMINANCE is true, it is assumed that duplicating the region (or
> copying the blocks for copy_bbs) may change the dominator tree in ways that
> are not suitable for an incremental update and the caller is responsible for
> destroying and recomputing the dominator tree.
>
> Hmm, not terribly happy with that wording, but that gives you an idea of
> what I'm after.  When would someone set UPDATE_DOMINANCE to true and what
> are their responsibilities when they do so.
>
> Approved with the name change and a better comment for UPDATE_DOMINANCE.

Btw, the function does _not_ handle arbitrary SEME regions - it only handles
a single exit correctly and assumes no (SSA) data flows across the others.
So I'd rather not rename it.

Richard.

> Jeff
>


Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-15 Thread Jeff Law

On 05/15/2013 12:28 PM, Steve Ellcey wrote:

Here is a patch that adds a flag to gimple_duplicate_sese_region to tell
it whether or not to update the dominator information.  I had to add the
same flag to copy_bbs to make it all work.  How does this look?  I
tested it with a bootstrap and test on x86 (with my optimization
enabled) and got no regressions.

2013-05-15  Steve Ellcey  

* cfghooks.c (copy_bbs): Add update_dominance argument.
* cfghooks.h (copy_bbs): Update prototype.
* tree-cfg.c (gimple_duplicate_sese_region):
Add update_dominance argument.
* tree-flow.h (gimple_duplicate_sese_region): Update prototype.
* tree-ssa-loop-ch.c (copy_loop_headers): Update
gimple_duplicate_sese_region call.
* tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
Update copy_bbs call.
* cfgloopmanip.c (duplicate_loop_to_header_edge): Ditto.
* trans-mem.c (ipa_uninstrument_transaction): Ditto.
So I'd change gimple_duplicate_sese_region to gimple_duplicate_seme 
region per comments from others.


Where you document UPDATE_DOMINANCE, I'd add something like: When 
UPDATE_DOMINANCE is true, it is assumed that duplicating the region (or 
copying the blocks for copy_bbs) may change the dominator tree in ways 
that are not suitable for an incremental update and the caller is 
responsible for destroying and recomputing the dominator tree.


Hmm, not terribly happy with that wording, but that gives you an idea of 
what I'm after.  When would someone set UPDATE_DOMINANCE to true and 
what are their responsibilities when they do so.


Approved with the name change and a better comment for UPDATE_DOMINANCE.

Jeff



Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-15 Thread Zdenek Dvorak
Hi,

> > I realize you're trying to do the same, but by using the SESE copier, you're
> > implicitly trying to do an incremental update.  I think you're going to
> > really need to look at the assumptions of that code and verify that the
> > switch FSA optimization doesn't violate those assumptions.
> 
> Indeed.  I'd rather have a flag to the SESE copier that tells it the region
> is SEME and in that case make it not update dominators but leave that
> to the caller (which means, recompute them).  It seems the code already
> handles ME regions if the other exits are "not important" (we don't have
> to insert/update PHI nodes in those exit blocks), so it may be that there
> are even more constraints on those unimportant exits due to the
> iterative dominator update - I think that the entry block of the region
> needs to dominate all exit destinations, otherwise get_dominated_by_region
> is not working correctly.  In your case one exit is a back-edge?  We should
> be able to add checking to the code to verify unimportant edges are
> unimportant "enough".
> 
> I'm sure Zdenek knows the limitations best.

as far as I remember, indeed only the single-entry assumption is important
(but I do not remember all that much, unfortunately),

Zdenek


Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-15 Thread Steve Ellcey
On Wed, 2013-05-15 at 10:44 +0200, Richard Biener wrote:

> Indeed.  I'd rather have a flag to the SESE copier that tells it the region
> is SEME and in that case make it not update dominators but leave that
> to the caller (which means, recompute them).  It seems the code already
> handles ME regions if the other exits are "not important" (we don't have
> to insert/update PHI nodes in those exit blocks), so it may be that there
> are even more constraints on those unimportant exits due to the
> iterative dominator update - I think that the entry block of the region
> needs to dominate all exit destinations, otherwise get_dominated_by_region
> is not working correctly.  In your case one exit is a back-edge?  We should
> be able to add checking to the code to verify unimportant edges are
> unimportant "enough".
> 
> I'm sure Zdenek knows the limitations best.
> 
> Richard.

Here is a patch that adds a flag to gimple_duplicate_sese_region to tell
it whether or not to update the dominator information.  I had to add the
same flag to copy_bbs to make it all work.  How does this look?  I
tested it with a bootstrap and test on x86 (with my optimization
enabled) and got no regressions.

2013-05-15  Steve Ellcey  

* cfghooks.c (copy_bbs): Add update_dominance argument.
* cfghooks.h (copy_bbs): Update prototype.
* tree-cfg.c (gimple_duplicate_sese_region):
Add update_dominance argument.
* tree-flow.h (gimple_duplicate_sese_region): Update prototype.
* tree-ssa-loop-ch.c (copy_loop_headers): Update
gimple_duplicate_sese_region call.
* tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
Update copy_bbs call.
* cfgloopmanip.c (duplicate_loop_to_header_edge): Ditto.
* trans-mem.c (ipa_uninstrument_transaction): Ditto.

diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
index 22b962b..18af38b 100644
--- a/gcc/cfghooks.c
+++ b/gcc/cfghooks.c
@@ -1287,7 +1287,8 @@ end:
function assigns bbs into loops (copy of basic block bb is assigned to
bb->loop_father->copy loop, so this must be set up correctly in advance)
and updates dominators locally (LOOPS structure that contains the 
information
-   about dominators is passed to enable this).
+   about dominators is passed to enable this) if update_dominance is set to
+   true.
 
BASE is the superloop to that basic block belongs; if its header or latch
is copied, we do not set the new blocks as header or latch.
@@ -1301,7 +1302,7 @@ end:
 void
 copy_bbs (basic_block *bbs, unsigned n, basic_block *new_bbs,
  edge *edges, unsigned num_edges, edge *new_edges,
- struct loop *base, basic_block after)
+ struct loop *base, basic_block after, bool update_dominance)
 {
   unsigned i, j;
   basic_block bb, new_bb, dom_bb;
@@ -1327,16 +1328,19 @@ copy_bbs (basic_block *bbs, unsigned n, basic_block 
*new_bbs,
 }
 
   /* Set dominators.  */
-  for (i = 0; i < n; i++)
+  if (update_dominance)
 {
-  bb = bbs[i];
-  new_bb = new_bbs[i];
-
-  dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
-  if (dom_bb->flags & BB_DUPLICATED)
+  for (i = 0; i < n; i++)
{
- dom_bb = get_bb_copy (dom_bb);
- set_immediate_dominator (CDI_DOMINATORS, new_bb, dom_bb);
+ bb = bbs[i];
+ new_bb = new_bbs[i];
+
+ dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+ if (dom_bb->flags & BB_DUPLICATED)
+   {
+ dom_bb = get_bb_copy (dom_bb);
+ set_immediate_dominator (CDI_DOMINATORS, new_bb, dom_bb);
+   }
}
 }
 
diff --git a/gcc/cfghooks.h b/gcc/cfghooks.h
index bff0a0c..ec595a5 100644
--- a/gcc/cfghooks.h
+++ b/gcc/cfghooks.h
@@ -201,7 +201,7 @@ extern void lv_add_condition_to_bb (basic_block, 
basic_block, basic_block,
 extern bool can_copy_bbs_p (basic_block *, unsigned);
 extern void copy_bbs (basic_block *, unsigned, basic_block *,
  edge *, unsigned, edge *, struct loop *,
- basic_block);
+ basic_block, bool);
 
 void account_profile_record (struct profile_record *, int);
 
diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 9581677..bc87755 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1300,7 +1300,7 @@ duplicate_loop_to_header_edge (struct loop *loop, edge e,
 
   /* Copy bbs.  */
   copy_bbs (bbs, n, new_bbs, spec_edges, 2, new_spec_edges, loop,
-   place_after);
+   place_after, true);
   place_after = new_spec_edges[SE_LATCH]->src;
 
   if (flags & DLTHE_RECORD_COPY_NUMBER)
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index 5cb8286..c66278c 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -3978,7 +3978,8 @@ ipa_uninstrument_transaction (struct tm_region *region,
   int n = queue.length ();
   basic_block *new_bbs = XNEWVEC (basic_block, n);
 
-  copy_bbs (queue.address (), n, new_bbs, NULL, 0, NULL, NULL, t

Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-15 Thread Jeff Law

On 05/15/2013 02:44 AM, Richard Biener wrote:


Indeed.  I'd rather have a flag to the SESE copier that tells it the region
is SEME and in that case make it not update dominators but leave that
to the caller (which means, recompute them).  It seems the code already
handles ME regions if the other exits are "not important" (we don't have
to insert/update PHI nodes in those exit blocks), so it may be that there
are even more constraints on those unimportant exits due to the
iterative dominator update - I think that the entry block of the region
needs to dominate all exit destinations, otherwise get_dominated_by_region
is not working correctly.  In your case one exit is a back-edge?  We should
be able to add checking to the code to verify unimportant edges are
unimportant "enough".
There's definitely a backedge -- the whole point is to thread across the 
back edge which allows us to statically determine where the switch 
statement on the next iteration.



I'm sure Zdenek knows the limitations best.

Yea, I relied on Zdenek for a lot of the graph stuff in the past :-0

jeff



Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-15 Thread Steven Bosscher
On Wed, May 15, 2013 at 10:44 AM, Richard Biener wrote:
> Indeed.  I'd rather have a flag to the SESE copier that tells it the region
> is SEME and in that case make it not update dominators but leave that
> to the caller (which means, recompute them).

And rename the copier to SEME copier instead of SESE ;-)

Ciao!
Steven


Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-15 Thread Richard Biener
On Wed, May 15, 2013 at 7:02 AM, Jeff Law  wrote:
> On 05/14/2013 03:14 PM, Steve Ellcey wrote:
>>
>>
>> While Jeff works on the threader, I was wondering if I could get approval
>> for
>> just the dominance.c part of the patch.  This would allow me to use my
>> pass as
>> a dynamically loaded optimization pass.  But without this change to
>> dominance.c,
>> the compiler aborts in iterate_fix_dominators when I do that.
>>
>> Steve Ellcey
>> sell...@imgtec.com
>>
>>
>>
>> 2013-05-14  Steve Ellcey  
>>
>> * dominance.c (iterate_fix_dominators): Add null check.
>
> I'd like to understand this a little more before we go forward with it.
>
> AFAICT, that routine is trying to incrementally update the dominators using
> knowledge that the region you've copied is SESE.  It's unclear what happens
> in the region is not SESE.
>
> Threading mucks up the dominator tree in fairly serious ways and to the best
> of my knowledge neither of the calls to thread across edges make any attempt
> to incrementally update the dominator tree.  They wipe it completely, they
> also have to be quite careful in how they manipulate the various graphs to
> avoid getting into an inconsistent state, then calling routines that assume
> consistent state.
>
>
> I realize you're trying to do the same, but by using the SESE copier, you're
> implicitly trying to do an incremental update.  I think you're going to
> really need to look at the assumptions of that code and verify that the
> switch FSA optimization doesn't violate those assumptions.

Indeed.  I'd rather have a flag to the SESE copier that tells it the region
is SEME and in that case make it not update dominators but leave that
to the caller (which means, recompute them).  It seems the code already
handles ME regions if the other exits are "not important" (we don't have
to insert/update PHI nodes in those exit blocks), so it may be that there
are even more constraints on those unimportant exits due to the
iterative dominator update - I think that the entry block of the region
needs to dominate all exit destinations, otherwise get_dominated_by_region
is not working correctly.  In your case one exit is a back-edge?  We should
be able to add checking to the code to verify unimportant edges are
unimportant "enough".

I'm sure Zdenek knows the limitations best.

Richard.

> Jeff


Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-14 Thread Jeff Law

On 05/14/2013 03:14 PM, Steve Ellcey wrote:


While Jeff works on the threader, I was wondering if I could get approval for
just the dominance.c part of the patch.  This would allow me to use my pass as
a dynamically loaded optimization pass.  But without this change to dominance.c,
the compiler aborts in iterate_fix_dominators when I do that.

Steve Ellcey
sell...@imgtec.com



2013-05-14  Steve Ellcey  

* dominance.c (iterate_fix_dominators): Add null check.

I'd like to understand this a little more before we go forward with it.

AFAICT, that routine is trying to incrementally update the dominators 
using knowledge that the region you've copied is SESE.  It's unclear 
what happens in the region is not SESE.


Threading mucks up the dominator tree in fairly serious ways and to the 
best of my knowledge neither of the calls to thread across edges make 
any attempt to incrementally update the dominator tree.  They wipe it 
completely, they also have to be quite careful in how they manipulate 
the various graphs to avoid getting into an inconsistent state, then 
calling routines that assume consistent state.



I realize you're trying to do the same, but by using the SESE copier, 
you're implicitly trying to do an incremental update.  I think you're 
going to really need to look at the assumptions of that code and verify 
that the switch FSA optimization doesn't violate those assumptions.


Jeff


Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-14 Thread Steve Ellcey

While Jeff works on the threader, I was wondering if I could get approval for
just the dominance.c part of the patch.  This would allow me to use my pass as
a dynamically loaded optimization pass.  But without this change to dominance.c,
the compiler aborts in iterate_fix_dominators when I do that.

Steve Ellcey
sell...@imgtec.com



2013-05-14  Steve Ellcey  

* dominance.c (iterate_fix_dominators): Add null check.


diff --git a/gcc/dominance.c b/gcc/dominance.c
index 5c96dad..d858ad1 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -1251,6 +1251,7 @@ iterate_fix_dominators (enum cdi_direction dir, 
vec bbs,
   struct pointer_map_t *map;
   int *parent, *son, *brother;
   unsigned int dir_index = dom_convert_dir_to_idx (dir);
+  void **slot;
 
   /* We only support updating dominators.  There are some problems with
  updating postdominators (need to add fake edges from infinite loops
@@ -1357,7 +1358,10 @@ iterate_fix_dominators (enum cdi_direction dir, 
vec bbs,
  if (dom == bb)
continue;
 
- dom_i = (size_t) *pointer_map_contains (map, dom);
+ slot = pointer_map_contains (map, dom);
+ if (slot == NULL)
+   continue;
+ dom_i = (size_t) *slot;
 
  /* Do not include parallel edges to G.  */
  if (!bitmap_set_bit ((bitmap) g->vertices[dom_i].data, i))





Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-14 Thread Richard Biener
On Mon, May 13, 2013 at 10:28 PM, Jeff Law  wrote:
> On 05/13/2013 02:16 PM, Steve Ellcey wrote:
>>
>> Here is the latest version of my SSA optimization pass to do the switch
>> statement optimization described in PR 54742 (core_state_transition from
>> coremark).
>>
>> I have tested this optimization with a x86 bootstrap and GCC test run
>> with no errors and tested the MIPS cross compiler with no errors.
>> Because of that I decided to submit it as a statically linked
>> optimization pass instead of a dynamically loaded one, though I did keep
>> the ifdefs for using it as a dynamic pass in the code.  They could be
>> removed if this patch is approved as a statically linked pass.  Also,
>> while this patch shows the optimization only being turned on with the
>> -ftree-switch-shortcut flag, my bootstrap and testing had it turned on
>> for all -O2 optimizations in order to maximize the testing.
>> We may want to turn this on for -O3 and/or for
>> -fexpensive-optimizations.
>>
>> I had to make one change to dominance.c in order to avoid some compiler
>> aborts where it was dereferencing a null pointer.  I believe this was
>> happening because I am calling gimple_duplicate_sese_region with regions
>> that are not really SESE.  Because I am doing this, I regenerate the cfg
>> and SSA information after each call, but I also had to change
>> iterate_fix_dominators to fix the abort.  Another way we might want to
>> fix this would be to pass a flag to gimple_duplicate_sese_region that
>> tells it whether or not we want it to recalculate the dominance
>> information at all.  If set to false, it would assume the caller will
>> take care of it.
>>
>> Opinions?  OK to checkin?
>>
>> Steve Ellcey
>> sell...@imgtec.com
>>
>>
>> 2013-05-13  Steve Ellcey  
>>
>> PR tree-optimization/54742
>> * Makefile.in (OBJS): Add tree-switch-shortcut.o.
>> * common.opt (ftree-switch-shortcut): New.
>> * dominance.c (iterate_fix_dominators): Add null check.
>> * params.def (PARAM_MAX_SWITCH_INSNS): New.
>> (PARAM_MAX_SWITCH_PATHS): New.
>> * passes.c (init_optimization_passes): Add pass_switch_shortcut.
>> * timevar.def (TV_SWITCH_SHORTCUT): New.
>> * tree-pass.c (pass_switch_shortcut): New.
>> * tree-switch-shortcut.c: New file.
>
> I was looking at this last week (stuck for hours on tarmac at BWI).
>
> I think we should fix this in the threader rather than doing a special
> purpose pass.  This is primarily because we get it for free if we address
> one limitation in the threader (some of the other issues I was concerned
> about don't apply).
>
> Specifically if we look at thread_around_empty_block we have:
>
>   /* This block must have a single predecessor (E->dest).  */
>   if (!single_pred_p (bb))
> return NULL;
>
>   /* This block must have more than one successor.  */
>   if (single_succ_p (bb))
> return NULL;
>
>   /* This block can have no PHI nodes.  This is overly conservative.  */
>   if (!gsi_end_p (gsi_start_phis (bb)))
> return NULL;
>
>
> The test that the block have > 1 successor and no PHI nodes are the keys.
>
> ;;   basic block 17, loop depth 1
> ;;pred:   9
> ;;13
> ;;16
> ;;15
> ;;4
> ;;12
> ;;7
> ;;14
> ;;10
> ;;5
> ;;6
> ;;8
> ;;11
>   # state_1 = PHI <0(9), 2(13), 3(16), 3(15), state_36(4), 1(12), 0(7),
> 2(14), 1(10), 1(5), 0(6), 2(8), 3(11)>
>   # .MEM_4 = PHI <.MEM_29(9), .MEM_20(13), .MEM_24(16), .MEM_29(15),
> .MEM_29(4), .MEM_29(12), .MEM_12(7), .MEM_29(14), .MEM_16(10), .MEM_29(5),
> .MEM_29(6), .MEM_29(8), .MEM_29(11)>
> :
>   str_25 = str_37 + 1;
>   # VUSE <.MEM_4>
>   _8 = MEM[base: str_25, offset: 0B];
>   if (_8 != 0)
> goto ;
>   else
> goto ;
> ;;succ:   18
> ;;19
>
> ;;   basic block 18, loop depth 1
> ;;pred:   17
>   goto ;
> ;;succ:   4
>
>
>
> bb will be block #18.  In theory we just do
> if (single_succ_p (bb))
>   bb = single_succ_edge (bb)->dest;
>
> And allow PHI nodes in this specific instance.
>
> That's enough to allow existing code to identify the potential jump
> threading candidates -- and more importantly, it's much more general.
>
>
>
> The updater needs copy bb17 & bb4.  bb17' will transfer to bb4' which will
> directly transfer to the destination of the switch.
>
> The advantage of doing this in the threader is it'll help more than just the
> switch statement case.  It's probably highly likely that we're missing cases
> to thread through empty blocks with PHIs.
>
> I've started cobbling together some of the updating code.  So far it doesn't
> look too terrible.

I agree - thanks for looking at this!

Richard.

>
> Jeff
>


Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-13 Thread Jeff Law

On 05/13/2013 02:16 PM, Steve Ellcey wrote:

Here is the latest version of my SSA optimization pass to do the switch
statement optimization described in PR 54742 (core_state_transition from
coremark).

I have tested this optimization with a x86 bootstrap and GCC test run
with no errors and tested the MIPS cross compiler with no errors.
Because of that I decided to submit it as a statically linked
optimization pass instead of a dynamically loaded one, though I did keep
the ifdefs for using it as a dynamic pass in the code.  They could be
removed if this patch is approved as a statically linked pass.  Also,
while this patch shows the optimization only being turned on with the
-ftree-switch-shortcut flag, my bootstrap and testing had it turned on
for all -O2 optimizations in order to maximize the testing.
We may want to turn this on for -O3 and/or for
-fexpensive-optimizations.

I had to make one change to dominance.c in order to avoid some compiler
aborts where it was dereferencing a null pointer.  I believe this was
happening because I am calling gimple_duplicate_sese_region with regions
that are not really SESE.  Because I am doing this, I regenerate the cfg
and SSA information after each call, but I also had to change
iterate_fix_dominators to fix the abort.  Another way we might want to
fix this would be to pass a flag to gimple_duplicate_sese_region that
tells it whether or not we want it to recalculate the dominance
information at all.  If set to false, it would assume the caller will
take care of it.

Opinions?  OK to checkin?

Steve Ellcey
sell...@imgtec.com


2013-05-13  Steve Ellcey  

PR tree-optimization/54742
* Makefile.in (OBJS): Add tree-switch-shortcut.o.
* common.opt (ftree-switch-shortcut): New.
* dominance.c (iterate_fix_dominators): Add null check.
* params.def (PARAM_MAX_SWITCH_INSNS): New.
(PARAM_MAX_SWITCH_PATHS): New.
* passes.c (init_optimization_passes): Add pass_switch_shortcut.
* timevar.def (TV_SWITCH_SHORTCUT): New.
* tree-pass.c (pass_switch_shortcut): New.
* tree-switch-shortcut.c: New file.

I was looking at this last week (stuck for hours on tarmac at BWI).

I think we should fix this in the threader rather than doing a special 
purpose pass.  This is primarily because we get it for free if we 
address one limitation in the threader (some of the other issues I was 
concerned about don't apply).


Specifically if we look at thread_around_empty_block we have:

  /* This block must have a single predecessor (E->dest).  */
  if (!single_pred_p (bb))
return NULL;

  /* This block must have more than one successor.  */
  if (single_succ_p (bb))
return NULL;

  /* This block can have no PHI nodes.  This is overly conservative.  */
  if (!gsi_end_p (gsi_start_phis (bb)))
return NULL;


The test that the block have > 1 successor and no PHI nodes are the keys.

;;   basic block 17, loop depth 1
;;pred:   9
;;13
;;16
;;15
;;4
;;12
;;7
;;14
;;10
;;5
;;6
;;8
;;11
  # state_1 = PHI <0(9), 2(13), 3(16), 3(15), state_36(4), 1(12), 0(7), 
2(14), 1(10), 1(5), 0(6), 2(8), 3(11)>
  # .MEM_4 = PHI <.MEM_29(9), .MEM_20(13), .MEM_24(16), .MEM_29(15), 
.MEM_29(4), .MEM_29(12), .MEM_12(7), .MEM_29(14), .MEM_16(10), 
.MEM_29(5), .MEM_29(6), .MEM_29(8), .MEM_29(11)>

:
  str_25 = str_37 + 1;
  # VUSE <.MEM_4>
  _8 = MEM[base: str_25, offset: 0B];
  if (_8 != 0)
goto ;
  else
goto ;
;;succ:   18
;;19

;;   basic block 18, loop depth 1
;;pred:   17
  goto ;
;;succ:   4



bb will be block #18.  In theory we just do
if (single_succ_p (bb))
  bb = single_succ_edge (bb)->dest;

And allow PHI nodes in this specific instance.

That's enough to allow existing code to identify the potential jump 
threading candidates -- and more importantly, it's much more general.




The updater needs copy bb17 & bb4.  bb17' will transfer to bb4' which 
will directly transfer to the destination of the switch.


The advantage of doing this in the threader is it'll help more than just 
the switch statement case.  It's probably highly likely that we're 
missing cases to thread through empty blocks with PHIs.


I've started cobbling together some of the updating code.  So far it 
doesn't look too terrible.



Jeff



[PATCH] New switch optimization pass (PR tree-optimization/54742)

2013-05-13 Thread Steve Ellcey
Here is the latest version of my SSA optimization pass to do the switch
statement optimization described in PR 54742 (core_state_transition from
coremark).

I have tested this optimization with a x86 bootstrap and GCC test run
with no errors and tested the MIPS cross compiler with no errors.
Because of that I decided to submit it as a statically linked
optimization pass instead of a dynamically loaded one, though I did keep
the ifdefs for using it as a dynamic pass in the code.  They could be
removed if this patch is approved as a statically linked pass.  Also,
while this patch shows the optimization only being turned on with the
-ftree-switch-shortcut flag, my bootstrap and testing had it turned on
for all -O2 optimizations in order to maximize the testing.
We may want to turn this on for -O3 and/or for
-fexpensive-optimizations.

I had to make one change to dominance.c in order to avoid some compiler
aborts where it was dereferencing a null pointer.  I believe this was
happening because I am calling gimple_duplicate_sese_region with regions
that are not really SESE.  Because I am doing this, I regenerate the cfg
and SSA information after each call, but I also had to change
iterate_fix_dominators to fix the abort.  Another way we might want to
fix this would be to pass a flag to gimple_duplicate_sese_region that
tells it whether or not we want it to recalculate the dominance
information at all.  If set to false, it would assume the caller will
take care of it.

Opinions?  OK to checkin?

Steve Ellcey
sell...@imgtec.com


2013-05-13  Steve Ellcey  

PR tree-optimization/54742
* Makefile.in (OBJS): Add tree-switch-shortcut.o.
* common.opt (ftree-switch-shortcut): New.
* dominance.c (iterate_fix_dominators): Add null check.
* params.def (PARAM_MAX_SWITCH_INSNS): New.
(PARAM_MAX_SWITCH_PATHS): New.
* passes.c (init_optimization_passes): Add pass_switch_shortcut.
* timevar.def (TV_SWITCH_SHORTCUT): New.
* tree-pass.c (pass_switch_shortcut): New.
* tree-switch-shortcut.c: New file.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 903125e..db0ffcb 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1399,6 +1399,7 @@ OBJS = \
 	tree-scalar-evolution.o \
 	tree-sra.o \
 	tree-switch-conversion.o \
+	tree-switch-shortcut.o \
 	tree-ssa-address.o \
 	tree-ssa-alias.o \
 	tree-ssa-ccp.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 4c7933e..e028e2d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2160,6 +2160,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+ftree-switch-shortcut
+Common Report Var(flag_tree_switch_shortcut) Init(0) Optimization
+Do fancy switch statement shortcutting
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA->normal pass
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 5c96dad..d858ad1 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -1251,6 +1251,7 @@ iterate_fix_dominators (enum cdi_direction dir, vec bbs,
   struct pointer_map_t *map;
   int *parent, *son, *brother;
   unsigned int dir_index = dom_convert_dir_to_idx (dir);
+  void **slot;
 
   /* We only support updating dominators.  There are some problems with
  updating postdominators (need to add fake edges from infinite loops
@@ -1357,7 +1358,10 @@ iterate_fix_dominators (enum cdi_direction dir, vec bbs,
 	  if (dom == bb)
 	continue;
 
-	  dom_i = (size_t) *pointer_map_contains (map, dom);
+	  slot = pointer_map_contains (map, dom);
+	  if (slot == NULL)
+	continue;
+	  dom_i = (size_t) *slot;
 
 	  /* Do not include parallel edges to G.  */
 	  if (!bitmap_set_bit ((bitmap) g->vertices[dom_i].data, i))
diff --git a/gcc/params.def b/gcc/params.def
index 3c52651..bdabe07 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1020,6 +1020,20 @@ DEFPARAM (PARAM_MAX_SLSR_CANDIDATE_SCAN,
 	  "strength reduction",
 	  50, 1, 99)
 
+/* Maximum number of instructions to duplicate when shortcutting a switch.  */
+DEFPARAM (PARAM_MAX_SWITCH_INSNS,
+	  "max-switch-insns",
+	  "Maximum number of instructions to duplicate when "
+	  "shortcutting a switch statement",
+	  100, 1, 99)
+
+/* Maximum number of paths to duplicate when shortcutting a switch.  */
+DEFPARAM (PARAM_MAX_SWITCH_PATHS,
+	  "max-switch-paths",
+	  "Maximum number of new paths to create when"
+	  " shortcutting a switch statement",
+	  50, 1, 99)
+
 /*
 Local variables:
 mode:c
diff --git a/gcc/passes.c b/gcc/passes.c
index fd67ee6..0fb826c 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1416,6 +1416,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_call_cdce);
   NEXT_PASS (pass_cselim);
   NEXT_PASS (pass_tree_ifcombine);
+  NEXT_PASS (pass_switch_shortcut);
   NEXT_PASS (pass_phiopt);
   NEXT_PASS (pass_tail_recursion);
   NEXT_PASS (pass_ch);
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 44f0eac