On Mon, May 13, 2013 at 10:28 PM, Jeff Law <l...@redhat.com> wrote: > On 05/13/2013 02:16 PM, Steve Ellcey wrote: >> >> Here is the latest version of my SSA optimization pass to do the switch >> statement optimization described in PR 54742 (core_state_transition from >> coremark). >> >> I have tested this optimization with a x86 bootstrap and GCC test run >> with no errors and tested the MIPS cross compiler with no errors. >> Because of that I decided to submit it as a statically linked >> optimization pass instead of a dynamically loaded one, though I did keep >> the ifdefs for using it as a dynamic pass in the code. They could be >> removed if this patch is approved as a statically linked pass. Also, >> while this patch shows the optimization only being turned on with the >> -ftree-switch-shortcut flag, my bootstrap and testing had it turned on >> for all -O2 optimizations in order to maximize the testing. >> We may want to turn this on for -O3 and/or for >> -fexpensive-optimizations. >> >> I had to make one change to dominance.c in order to avoid some compiler >> aborts where it was dereferencing a null pointer. I believe this was >> happening because I am calling gimple_duplicate_sese_region with regions >> that are not really SESE. Because I am doing this, I regenerate the cfg >> and SSA information after each call, but I also had to change >> iterate_fix_dominators to fix the abort. Another way we might want to >> fix this would be to pass a flag to gimple_duplicate_sese_region that >> tells it whether or not we want it to recalculate the dominance >> information at all. If set to false, it would assume the caller will >> take care of it. >> >> Opinions? OK to checkin? >> >> Steve Ellcey >> sell...@imgtec.com >> >> >> 2013-05-13 Steve Ellcey <sell...@imgtec.com> >> >> PR tree-optimization/54742 >> * Makefile.in (OBJS): Add tree-switch-shortcut.o. >> * common.opt (ftree-switch-shortcut): New. >> * dominance.c (iterate_fix_dominators): Add null check. >> * params.def (PARAM_MAX_SWITCH_INSNS): New. >> (PARAM_MAX_SWITCH_PATHS): New. >> * passes.c (init_optimization_passes): Add pass_switch_shortcut. >> * timevar.def (TV_SWITCH_SHORTCUT): New. >> * tree-pass.c (pass_switch_shortcut): New. >> * tree-switch-shortcut.c: New file. > > I was looking at this last week (stuck for hours on tarmac at BWI). > > I think we should fix this in the threader rather than doing a special > purpose pass. This is primarily because we get it for free if we address > one limitation in the threader (some of the other issues I was concerned > about don't apply). > > Specifically if we look at thread_around_empty_block we have: > > /* This block must have a single predecessor (E->dest). */ > if (!single_pred_p (bb)) > return NULL; > > /* This block must have more than one successor. */ > if (single_succ_p (bb)) > return NULL; > > /* This block can have no PHI nodes. This is overly conservative. */ > if (!gsi_end_p (gsi_start_phis (bb))) > return NULL; > > > The test that the block have > 1 successor and no PHI nodes are the keys. > > ;; basic block 17, loop depth 1 > ;; pred: 9 > ;; 13 > ;; 16 > ;; 15 > ;; 4 > ;; 12 > ;; 7 > ;; 14 > ;; 10 > ;; 5 > ;; 6 > ;; 8 > ;; 11 > # state_1 = PHI <0(9), 2(13), 3(16), 3(15), state_36(4), 1(12), 0(7), > 2(14), 1(10), 1(5), 0(6), 2(8), 3(11)> > # .MEM_4 = PHI <.MEM_29(9), .MEM_20(13), .MEM_24(16), .MEM_29(15), > .MEM_29(4), .MEM_29(12), .MEM_12(7), .MEM_29(14), .MEM_16(10), .MEM_29(5), > .MEM_29(6), .MEM_29(8), .MEM_29(11)> > <L32>: > str_25 = str_37 + 1; > # VUSE <.MEM_4> > _8 = MEM[base: str_25, offset: 0B]; > if (_8 != 0) > goto <bb 18>; > else > goto <bb 19>; > ;; succ: 18 > ;; 19 > > ;; basic block 18, loop depth 1 > ;; pred: 17 > goto <bb 4>; > ;; succ: 4 > > > > bb will be block #18. In theory we just do > if (single_succ_p (bb)) > bb = single_succ_edge (bb)->dest; > > And allow PHI nodes in this specific instance. > > That's enough to allow existing code to identify the potential jump > threading candidates -- and more importantly, it's much more general. > > > > The updater needs copy bb17 & bb4. bb17' will transfer to bb4' which will > directly transfer to the destination of the switch. > > The advantage of doing this in the threader is it'll help more than just the > switch statement case. It's probably highly likely that we're missing cases > to thread through empty blocks with PHIs. > > I've started cobbling together some of the updating code. So far it doesn't > look too terrible.
I agree - thanks for looking at this! Richard. > > Jeff >