On Mon, May 13, 2013 at 10:28 PM, Jeff Law <l...@redhat.com> wrote:
> On 05/13/2013 02:16 PM, Steve Ellcey wrote:
>>
>> Here is the latest version of my SSA optimization pass to do the switch
>> statement optimization described in PR 54742 (core_state_transition from
>> coremark).
>>
>> I have tested this optimization with a x86 bootstrap and GCC test run
>> with no errors and tested the MIPS cross compiler with no errors.
>> Because of that I decided to submit it as a statically linked
>> optimization pass instead of a dynamically loaded one, though I did keep
>> the ifdefs for using it as a dynamic pass in the code.  They could be
>> removed if this patch is approved as a statically linked pass.  Also,
>> while this patch shows the optimization only being turned on with the
>> -ftree-switch-shortcut flag, my bootstrap and testing had it turned on
>> for all -O2 optimizations in order to maximize the testing.
>> We may want to turn this on for -O3 and/or for
>> -fexpensive-optimizations.
>>
>> I had to make one change to dominance.c in order to avoid some compiler
>> aborts where it was dereferencing a null pointer.  I believe this was
>> happening because I am calling gimple_duplicate_sese_region with regions
>> that are not really SESE.  Because I am doing this, I regenerate the cfg
>> and SSA information after each call, but I also had to change
>> iterate_fix_dominators to fix the abort.  Another way we might want to
>> fix this would be to pass a flag to gimple_duplicate_sese_region that
>> tells it whether or not we want it to recalculate the dominance
>> information at all.  If set to false, it would assume the caller will
>> take care of it.
>>
>> Opinions?  OK to checkin?
>>
>> Steve Ellcey
>> sell...@imgtec.com
>>
>>
>> 2013-05-13  Steve Ellcey  <sell...@imgtec.com>
>>
>>         PR tree-optimization/54742
>>         * Makefile.in (OBJS): Add tree-switch-shortcut.o.
>>         * common.opt (ftree-switch-shortcut): New.
>>         * dominance.c (iterate_fix_dominators): Add null check.
>>         * params.def (PARAM_MAX_SWITCH_INSNS): New.
>>         (PARAM_MAX_SWITCH_PATHS): New.
>>         * passes.c (init_optimization_passes): Add pass_switch_shortcut.
>>         * timevar.def (TV_SWITCH_SHORTCUT): New.
>>         * tree-pass.c (pass_switch_shortcut): New.
>>         * tree-switch-shortcut.c: New file.
>
> I was looking at this last week (stuck for hours on tarmac at BWI).
>
> I think we should fix this in the threader rather than doing a special
> purpose pass.  This is primarily because we get it for free if we address
> one limitation in the threader (some of the other issues I was concerned
> about don't apply).
>
> Specifically if we look at thread_around_empty_block we have:
>
>   /* This block must have a single predecessor (E->dest).  */
>   if (!single_pred_p (bb))
>     return NULL;
>
>   /* This block must have more than one successor.  */
>   if (single_succ_p (bb))
>     return NULL;
>
>   /* This block can have no PHI nodes.  This is overly conservative.  */
>   if (!gsi_end_p (gsi_start_phis (bb)))
>     return NULL;
>
>
> The test that the block have > 1 successor and no PHI nodes are the keys.
>
> ;;   basic block 17, loop depth 1
> ;;    pred:       9
> ;;                13
> ;;                16
> ;;                15
> ;;                4
> ;;                12
> ;;                7
> ;;                14
> ;;                10
> ;;                5
> ;;                6
> ;;                8
> ;;                11
>   # state_1 = PHI <0(9), 2(13), 3(16), 3(15), state_36(4), 1(12), 0(7),
> 2(14), 1(10), 1(5), 0(6), 2(8), 3(11)>
>   # .MEM_4 = PHI <.MEM_29(9), .MEM_20(13), .MEM_24(16), .MEM_29(15),
> .MEM_29(4), .MEM_29(12), .MEM_12(7), .MEM_29(14), .MEM_16(10), .MEM_29(5),
> .MEM_29(6), .MEM_29(8), .MEM_29(11)>
> <L32>:
>   str_25 = str_37 + 1;
>   # VUSE <.MEM_4>
>   _8 = MEM[base: str_25, offset: 0B];
>   if (_8 != 0)
>     goto <bb 18>;
>   else
>     goto <bb 19>;
> ;;    succ:       18
> ;;                19
>
> ;;   basic block 18, loop depth 1
> ;;    pred:       17
>   goto <bb 4>;
> ;;    succ:       4
>
>
>
> bb will be block #18.  In theory we just do
> if (single_succ_p (bb))
>   bb = single_succ_edge (bb)->dest;
>
> And allow PHI nodes in this specific instance.
>
> That's enough to allow existing code to identify the potential jump
> threading candidates -- and more importantly, it's much more general.
>
>
>
> The updater needs copy bb17 & bb4.  bb17' will transfer to bb4' which will
> directly transfer to the destination of the switch.
>
> The advantage of doing this in the threader is it'll help more than just the
> switch statement case.  It's probably highly likely that we're missing cases
> to thread through empty blocks with PHIs.
>
> I've started cobbling together some of the updating code.  So far it doesn't
> look too terrible.

I agree - thanks for looking at this!

Richard.

>
> Jeff
>

Reply via email to