RE: Gimple loop splitting v2
Hi Michael, The commit seems to be causing an ICE on aarch64 (just tested latest trunk). I've created a Bugzilla ticket with a test input https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78107 Regards, Tamar > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Michael Matz > Sent: 24 October 2016 10:02 > To: Bin.Cheng > Cc: Jeff Law; gcc-patches List > Subject: Re: Gimple loop splitting v2 > > Hi, > > On Mon, 24 Oct 2016, Bin.Cheng wrote: > > > Unfortunately I didn't reproduce the improvement in my run on AArch64, > > I will double check if I made some mistakes. > > Yeah, our regular testers also didn't pick up these kinds of improvements. > As I said, the machine was quite jumpy (though not loaded at all, and fixated > to run on one CPU) :-/ > > > Ciao, > Michael.
Re: Gimple loop splitting v2
Hi, On Mon, 24 Oct 2016, Bin.Cheng wrote: > Unfortunately I didn't reproduce the improvement in my run on AArch64, I > will double check if I made some mistakes. Yeah, our regular testers also didn't pick up these kinds of improvements. As I said, the machine was quite jumpy (though not loaded at all, and fixated to run on one CPU) :-/ Ciao, Michael.
Re: Gimple loop splitting v2
On Thu, Oct 20, 2016 at 3:55 PM, Bin.Cheng wrote: > On Thu, Oct 20, 2016 at 3:43 PM, Michael Matz wrote: >> Hi, >> >> On Sat, 5 Dec 2015, Jeff Law wrote: >> >>> Nit. I don't think you want a comma after "so". And it looks like your >>> comment got truncated as well. >>> >>> With the comment above fixed, this is fine for the trunk. >> >> I'm terribly sorry to have dropped the ball here, but I've committed this >> now after not even a year ;-/ (r241374) Obviously after rebootstrapping >> with all,ada languages. I also did some benchmark run which should be >> taken with a grain of salt as the machine had fairly variant results but >> the improvements are real, though perhaps not always in that range (it's a >> normal three repeats run). I'm really curious if our automatic tester can >> pick up similar improvements, because if so, it's extreme (5 to 15 percent >> in some benchmarks) and we can brag about it for GCC 7 ;-) > This is nice, thanks for doing it. I will check the improvement on AArch64. Hi, Unfortunately I didn't reproduce the improvement in my run on AArch64, I will double check if I made some mistakes. Thanks, bin >> >> 400.perlbench9770519 18.8 *9770 508 19.2 * >> 401.bzip29650668 14.5 *9650 666 14.5 * >> 403.gcc 8050455 17.7 *8050 432 18.6 * >> 429.mcf 9120477 19.1 *9120 467 19.5 * >> 445.gobmk 10490643 16.3 * 10490 644 16.3 * >> 456.hmmer9330641 14.6 *9330 614 15.2 * >> 458.sjeng 12100784 15.4 * 12100 762 15.9 * >> 462.libquantum 20720605 34.2 * 20720 600 34.5 * >> 464.h264ref 22130969 22.8 * 22130 969 22.8 * >> 471.omnetpp 6250438 14.3 *6250 358 17.5 * >> 473.astar7020494 14.2 *7020 492 14.3 * >> 483.xalancbmk6900342 20.2 *6900 336 20.6 * >> Est. SPECint(R)_base2006 17.9 >> Est. SPECint2006 18.5 >> >> 410.bwaves 13590563 24.1 * 13590 506 26.9 * >> 416.gamess NR NR >> 433.milc 9180375 24.5 *9180 349 26.3 * >> 434.zeusmp 9100433 21.0 *9100 423 21.5 * >> 435.gromacs 7140402 17.7 *7140 411 17.4 * >> 436.cactusADM 11950486 24.6 * 11950 486 24.6 * >> 437.leslie3d 9400421 22.4 *9400 419 22.4 * >> 444.namd 8020520 15.4 *8020 520 15.4 * >> 447.dealII NR NR >> 450.soplex 8340393 21.2 *8340 391 21.3 * >> 453.povray 5320277 19.2 *5320 278 19.1 * >> 454.calculix 8250453 18.2 *8250 460 17.9 * >> 459.GemsFDTD10610542 19.6 * 10610 537 19.8 * >> 465.tonto9840492 20.0 *9840 491 20.0 * >> 470.lbm 13740466 29.5 * 13740 430 32.0 * >> 481.wrf 11170492 22.7 * 11170 457 24.4 * >> 482.sphinx3 19490659 29.6 * 19490 655 29.8 * >> Est. SPECfp(R)_base2006 21.6 >> Est. SPECfp2006 22.1 >> >> >> Ciao, >> Michael.
Re: Gimple loop splitting v2
On 10/20/2016 08:43 AM, Michael Matz wrote: Hi, On Sat, 5 Dec 2015, Jeff Law wrote: Nit. I don't think you want a comma after "so". And it looks like your comment got truncated as well. With the comment above fixed, this is fine for the trunk. I'm terribly sorry to have dropped the ball here, but I've committed this now after not even a year ;-/ (r241374) It'd totally fallen off my radar. I had to go find it in my archives :-). Obviously after rebootstrapping with all,ada languages. I also did some benchmark run which should be taken with a grain of salt as the machine had fairly variant results but the improvements are real, though perhaps not always in that range (it's a normal three repeats run). I'm really curious if our automatic tester can pick up similar improvements, because if so, it's extreme (5 to 15 percent in some benchmarks) and we can brag about it for GCC 7 ;-) Yea. I don't expect it applies that often and ISTM that it's probably most beneficial by enabling other stuff later in the loop optimizer pipeline to see more loops without embedded flow control. ANyway, glad to see it go in. jeff
Re: Gimple loop splitting v2
On Thu, Oct 20, 2016 at 3:43 PM, Michael Matz wrote: > Hi, > > On Sat, 5 Dec 2015, Jeff Law wrote: > >> Nit. I don't think you want a comma after "so". And it looks like your >> comment got truncated as well. >> >> With the comment above fixed, this is fine for the trunk. > > I'm terribly sorry to have dropped the ball here, but I've committed this > now after not even a year ;-/ (r241374) Obviously after rebootstrapping > with all,ada languages. I also did some benchmark run which should be > taken with a grain of salt as the machine had fairly variant results but > the improvements are real, though perhaps not always in that range (it's a > normal three repeats run). I'm really curious if our automatic tester can > pick up similar improvements, because if so, it's extreme (5 to 15 percent > in some benchmarks) and we can brag about it for GCC 7 ;-) This is nice, thanks for doing it. I will check the improvement on AArch64. Thanks, bin > > 400.perlbench9770519 18.8 *9770 508 19.2 * > 401.bzip29650668 14.5 *9650 666 14.5 * > 403.gcc 8050455 17.7 *8050 432 18.6 * > 429.mcf 9120477 19.1 *9120 467 19.5 * > 445.gobmk 10490643 16.3 * 10490 644 16.3 * > 456.hmmer9330641 14.6 *9330 614 15.2 * > 458.sjeng 12100784 15.4 * 12100 762 15.9 * > 462.libquantum 20720605 34.2 * 20720 600 34.5 * > 464.h264ref 22130969 22.8 * 22130 969 22.8 * > 471.omnetpp 6250438 14.3 *6250 358 17.5 * > 473.astar7020494 14.2 *7020 492 14.3 * > 483.xalancbmk6900342 20.2 *6900 336 20.6 * > Est. SPECint(R)_base2006 17.9 > Est. SPECint2006 18.5 > > 410.bwaves 13590563 24.1 * 13590 506 26.9 * > 416.gamess NR NR > 433.milc 9180375 24.5 *9180 349 26.3 * > 434.zeusmp 9100433 21.0 *9100 423 21.5 * > 435.gromacs 7140402 17.7 *7140 411 17.4 * > 436.cactusADM 11950486 24.6 * 11950 486 24.6 * > 437.leslie3d 9400421 22.4 *9400 419 22.4 * > 444.namd 8020520 15.4 *8020 520 15.4 * > 447.dealII NR NR > 450.soplex 8340393 21.2 *8340 391 21.3 * > 453.povray 5320277 19.2 *5320 278 19.1 * > 454.calculix 8250453 18.2 *8250 460 17.9 * > 459.GemsFDTD10610542 19.6 * 10610 537 19.8 * > 465.tonto9840492 20.0 *9840 491 20.0 * > 470.lbm 13740466 29.5 * 13740 430 32.0 * > 481.wrf 11170492 22.7 * 11170 457 24.4 * > 482.sphinx3 19490659 29.6 * 19490 655 29.8 * > Est. SPECfp(R)_base2006 21.6 > Est. SPECfp2006 22.1 > > > Ciao, > Michael.
Re: Gimple loop splitting v2
Hi, On Sat, 5 Dec 2015, Jeff Law wrote: > Nit. I don't think you want a comma after "so". And it looks like your > comment got truncated as well. > > With the comment above fixed, this is fine for the trunk. I'm terribly sorry to have dropped the ball here, but I've committed this now after not even a year ;-/ (r241374) Obviously after rebootstrapping with all,ada languages. I also did some benchmark run which should be taken with a grain of salt as the machine had fairly variant results but the improvements are real, though perhaps not always in that range (it's a normal three repeats run). I'm really curious if our automatic tester can pick up similar improvements, because if so, it's extreme (5 to 15 percent in some benchmarks) and we can brag about it for GCC 7 ;-) 400.perlbench9770519 18.8 *9770 508 19.2 * 401.bzip29650668 14.5 *9650 666 14.5 * 403.gcc 8050455 17.7 *8050 432 18.6 * 429.mcf 9120477 19.1 *9120 467 19.5 * 445.gobmk 10490643 16.3 * 10490 644 16.3 * 456.hmmer9330641 14.6 *9330 614 15.2 * 458.sjeng 12100784 15.4 * 12100 762 15.9 * 462.libquantum 20720605 34.2 * 20720 600 34.5 * 464.h264ref 22130969 22.8 * 22130 969 22.8 * 471.omnetpp 6250438 14.3 *6250 358 17.5 * 473.astar7020494 14.2 *7020 492 14.3 * 483.xalancbmk6900342 20.2 *6900 336 20.6 * Est. SPECint(R)_base2006 17.9 Est. SPECint2006 18.5 410.bwaves 13590563 24.1 * 13590 506 26.9 * 416.gamess NR NR 433.milc 9180375 24.5 *9180 349 26.3 * 434.zeusmp 9100433 21.0 *9100 423 21.5 * 435.gromacs 7140402 17.7 *7140 411 17.4 * 436.cactusADM 11950486 24.6 * 11950 486 24.6 * 437.leslie3d 9400421 22.4 *9400 419 22.4 * 444.namd 8020520 15.4 *8020 520 15.4 * 447.dealII NR NR 450.soplex 8340393 21.2 *8340 391 21.3 * 453.povray 5320277 19.2 *5320 278 19.1 * 454.calculix 8250453 18.2 *8250 460 17.9 * 459.GemsFDTD10610542 19.6 * 10610 537 19.8 * 465.tonto9840492 20.0 *9840 491 20.0 * 470.lbm 13740466 29.5 * 13740 430 32.0 * 481.wrf 11170492 22.7 * 11170 457 24.4 * 482.sphinx3 19490659 29.6 * 19490 655 29.8 * Est. SPECfp(R)_base2006 21.6 Est. SPECfp2006 22.1 Ciao, Michael.
Re: Gimple loop splitting v2
On Wed, Jul 27, 2016 at 8:17 AM, Andrew Pinski wrote: > On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener > wrote: >> On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski wrote: >>> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz wrote: Hi, On Tue, 1 Dec 2015, Jeff Law wrote: > > So, okay for trunk? > -ENOPATCH Sigh :) Here it is. >>> >>> >>> I found one problem with it. >>> Take: >>> void f(int *a, int M, int *b) >>> { >>> for(int i = 0; i <= M; i++) >>> { >>>if (i < M) >>> a[i] = i; >>> } >>> } >>> CUT --- >>> There are two issues with the code as below. The outer most loop's >>> aux is still set which causes the vectorizer not to vector the loop. >>> The other issue is I need to run pass_scev_cprop after pass_loop_split >>> to get the induction variable usage after the loop gone so the >>> vectorizer will work. >> >> I think scev_cprop needs to be re-written to an utility so that the >> vectorizer >> itself can (within its own cost-model) eliminate an induction using it. >> >> Richard. >> >>> Something like (note this is copy and paste from a terminal): >>> diff --git a/gcc/passes.def b/gcc/passes.def >>> index c327900..e8d6ea6 100644 >>> --- a/gcc/passes.def >>> +++ b/gcc/passes.def >>> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3. If not see >>> NEXT_PASS (pass_copy_prop); >>> NEXT_PASS (pass_dce); >>> NEXT_PASS (pass_tree_unswitch); >>> - NEXT_PASS (pass_scev_cprop); >>> NEXT_PASS (pass_loop_split); >>> + NEXT_PASS (pass_scev_cprop); >>> NEXT_PASS (pass_record_bounds); >>> NEXT_PASS (pass_loop_distribution); >>> NEXT_PASS (pass_copy_prop); >>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c >>> index 5411530..e72ef19 100644 >>> --- a/gcc/tree-ssa-loop-split.c >>> +++ b/gcc/tree-ssa-loop-split.c >>> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void) >>> >>>gcc_assert (scev_initialized_p ()); >>>FOR_EACH_LOOP (loop, 0) >>> -loop->aux = NULL; >>> +{ >>> + loop->aux = NULL; >>> + if (loop_outer (loop)) >>> + loop_outer (loop)->aux = NULL; >>> +} >> >> How does the iterator not visit loop_outer (loop)?! > > The iterator with flags of 0 does not visit the the root. So the way > to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so > we zero out the root too. Or not set ->aux on the root in the first place. Richard. > Thanks, > Andrew > >> >>> >>>/* Go through all loops starting from innermost. */ >>>FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) >>> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void) >>> } >>> >>>FOR_EACH_LOOP (loop, 0) >>> -loop->aux = NULL; >>> +{ >>> + loop->aux = NULL; >>> + if (loop_outer (loop)) >>> + loop_outer (loop)->aux = NULL; >>> +} >>> >>>if (changed) >>> return TODO_cleanup_cfg; >>> - CUT - >>> >>> Thanks, >>> Andrew >>> >>> Ciao, Michael. * common.opt (-fsplit-loops): New flag. * passes.def (pass_loop_split): Add. * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3. (enable_fdo_optimizations): Add loop splitting. * timevar.def (TV_LOOP_SPLIT): Add. * tree-pass.h (make_pass_loop_split): Declare. * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, * tree-ssa-loop-split.c: New file. * Makefile.in (OBJS): Add tree-ssa-loop-split.o. * doc/invoke.texi (fsplit-loops): Document. * doc/passes.texi (Loop optimization): Add paragraph about loop splitting. testsuite/ * gcc.dg/loop-split.c: New test. Index: common.opt === --- common.opt (revision 231115) +++ common.opt (working copy) @@ -2453,6 +2457,10 @@ funswitch-loops Common Report Var(flag_unswitch_loops) Optimization Perform loop unswitching. +fsplit-loops +Common Report Var(flag_split_loops) Optimization +Perform loop splitting. + funwind-tables Common Report Var(flag_unwind_tables) Optimization Just generate unwind tables for exception handling. Index: passes.def === --- passes.def (revision 231115) +++ passes.def (working copy) @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_dce); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); + NEXT_PASS (pass_loop_split); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_loop_distribution); NEXT_PASS (pass_copy_prop); I
Re: Gimple loop splitting v2
On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener wrote: > On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski wrote: >> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz wrote: >>> Hi, >>> >>> On Tue, 1 Dec 2015, Jeff Law wrote: >>> > So, okay for trunk? -ENOPATCH >>> >>> Sigh :) >>> Here it is. >> >> >> I found one problem with it. >> Take: >> void f(int *a, int M, int *b) >> { >> for(int i = 0; i <= M; i++) >> { >>if (i < M) >> a[i] = i; >> } >> } >> CUT --- >> There are two issues with the code as below. The outer most loop's >> aux is still set which causes the vectorizer not to vector the loop. >> The other issue is I need to run pass_scev_cprop after pass_loop_split >> to get the induction variable usage after the loop gone so the >> vectorizer will work. > > I think scev_cprop needs to be re-written to an utility so that the vectorizer > itself can (within its own cost-model) eliminate an induction using it. > > Richard. > >> Something like (note this is copy and paste from a terminal): >> diff --git a/gcc/passes.def b/gcc/passes.def >> index c327900..e8d6ea6 100644 >> --- a/gcc/passes.def >> +++ b/gcc/passes.def >> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3. If not see >> NEXT_PASS (pass_copy_prop); >> NEXT_PASS (pass_dce); >> NEXT_PASS (pass_tree_unswitch); >> - NEXT_PASS (pass_scev_cprop); >> NEXT_PASS (pass_loop_split); >> + NEXT_PASS (pass_scev_cprop); >> NEXT_PASS (pass_record_bounds); >> NEXT_PASS (pass_loop_distribution); >> NEXT_PASS (pass_copy_prop); >> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c >> index 5411530..e72ef19 100644 >> --- a/gcc/tree-ssa-loop-split.c >> +++ b/gcc/tree-ssa-loop-split.c >> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void) >> >>gcc_assert (scev_initialized_p ()); >>FOR_EACH_LOOP (loop, 0) >> -loop->aux = NULL; >> +{ >> + loop->aux = NULL; >> + if (loop_outer (loop)) >> + loop_outer (loop)->aux = NULL; >> +} > > How does the iterator not visit loop_outer (loop)?! The iterator with flags of 0 does not visit the the root. So the way to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so we zero out the root too. Thanks, Andrew > >> >>/* Go through all loops starting from innermost. */ >>FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) >> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void) >> } >> >>FOR_EACH_LOOP (loop, 0) >> -loop->aux = NULL; >> +{ >> + loop->aux = NULL; >> + if (loop_outer (loop)) >> + loop_outer (loop)->aux = NULL; >> +} >> >>if (changed) >> return TODO_cleanup_cfg; >> - CUT - >> >> Thanks, >> Andrew >> >> >>> >>> >>> Ciao, >>> Michael. >>> * common.opt (-fsplit-loops): New flag. >>> * passes.def (pass_loop_split): Add. >>> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3. >>> (enable_fdo_optimizations): Add loop splitting. >>> * timevar.def (TV_LOOP_SPLIT): Add. >>> * tree-pass.h (make_pass_loop_split): Declare. >>> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. >>> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, >>> * tree-ssa-loop-split.c: New file. >>> * Makefile.in (OBJS): Add tree-ssa-loop-split.o. >>> * doc/invoke.texi (fsplit-loops): Document. >>> * doc/passes.texi (Loop optimization): Add paragraph about loop >>> splitting. >>> >>> testsuite/ >>> * gcc.dg/loop-split.c: New test. >>> >>> Index: common.opt >>> === >>> --- common.opt (revision 231115) >>> +++ common.opt (working copy) >>> @@ -2453,6 +2457,10 @@ funswitch-loops >>> Common Report Var(flag_unswitch_loops) Optimization >>> Perform loop unswitching. >>> >>> +fsplit-loops >>> +Common Report Var(flag_split_loops) Optimization >>> +Perform loop splitting. >>> + >>> funwind-tables >>> Common Report Var(flag_unwind_tables) Optimization >>> Just generate unwind tables for exception handling. >>> Index: passes.def >>> === >>> --- passes.def (revision 231115) >>> +++ passes.def (working copy) >>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3. >>> NEXT_PASS (pass_dce); >>> NEXT_PASS (pass_tree_unswitch); >>> NEXT_PASS (pass_scev_cprop); >>> + NEXT_PASS (pass_loop_split); >>> NEXT_PASS (pass_record_bounds); >>> NEXT_PASS (pass_loop_distribution); >>> NEXT_PASS (pass_copy_prop); >>> Index: opts.c >>> === >>> --- opts.c (revision 231115) >>> +++ opts.c (working copy) >>> @@ -532,6 +532,7 @@ static const struct default_options defa >>> regardless of them being declared inline. *
Re: Gimple loop splitting v2
On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski wrote: > On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz wrote: >> Hi, >> >> On Tue, 1 Dec 2015, Jeff Law wrote: >> >>> > So, okay for trunk? >>> -ENOPATCH >> >> Sigh :) >> Here it is. > > > I found one problem with it. > Take: > void f(int *a, int M, int *b) > { > for(int i = 0; i <= M; i++) > { >if (i < M) > a[i] = i; > } > } > CUT --- > There are two issues with the code as below. The outer most loop's > aux is still set which causes the vectorizer not to vector the loop. > The other issue is I need to run pass_scev_cprop after pass_loop_split > to get the induction variable usage after the loop gone so the > vectorizer will work. I think scev_cprop needs to be re-written to an utility so that the vectorizer itself can (within its own cost-model) eliminate an induction using it. Richard. > Something like (note this is copy and paste from a terminal): > diff --git a/gcc/passes.def b/gcc/passes.def > index c327900..e8d6ea6 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_dce); > NEXT_PASS (pass_tree_unswitch); > - NEXT_PASS (pass_scev_cprop); > NEXT_PASS (pass_loop_split); > + NEXT_PASS (pass_scev_cprop); > NEXT_PASS (pass_record_bounds); > NEXT_PASS (pass_loop_distribution); > NEXT_PASS (pass_copy_prop); > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > index 5411530..e72ef19 100644 > --- a/gcc/tree-ssa-loop-split.c > +++ b/gcc/tree-ssa-loop-split.c > @@ -592,7 +592,11 @@ tree_ssa_split_loops (void) > >gcc_assert (scev_initialized_p ()); >FOR_EACH_LOOP (loop, 0) > -loop->aux = NULL; > +{ > + loop->aux = NULL; > + if (loop_outer (loop)) > + loop_outer (loop)->aux = NULL; > +} How does the iterator not visit loop_outer (loop)?! > >/* Go through all loops starting from innermost. */ >FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) > @@ -631,7 +635,11 @@ tree_ssa_split_loops (void) > } > >FOR_EACH_LOOP (loop, 0) > -loop->aux = NULL; > +{ > + loop->aux = NULL; > + if (loop_outer (loop)) > + loop_outer (loop)->aux = NULL; > +} > >if (changed) > return TODO_cleanup_cfg; > - CUT - > > Thanks, > Andrew > > >> >> >> Ciao, >> Michael. >> * common.opt (-fsplit-loops): New flag. >> * passes.def (pass_loop_split): Add. >> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3. >> (enable_fdo_optimizations): Add loop splitting. >> * timevar.def (TV_LOOP_SPLIT): Add. >> * tree-pass.h (make_pass_loop_split): Declare. >> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. >> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, >> * tree-ssa-loop-split.c: New file. >> * Makefile.in (OBJS): Add tree-ssa-loop-split.o. >> * doc/invoke.texi (fsplit-loops): Document. >> * doc/passes.texi (Loop optimization): Add paragraph about loop >> splitting. >> >> testsuite/ >> * gcc.dg/loop-split.c: New test. >> >> Index: common.opt >> === >> --- common.opt (revision 231115) >> +++ common.opt (working copy) >> @@ -2453,6 +2457,10 @@ funswitch-loops >> Common Report Var(flag_unswitch_loops) Optimization >> Perform loop unswitching. >> >> +fsplit-loops >> +Common Report Var(flag_split_loops) Optimization >> +Perform loop splitting. >> + >> funwind-tables >> Common Report Var(flag_unwind_tables) Optimization >> Just generate unwind tables for exception handling. >> Index: passes.def >> === >> --- passes.def (revision 231115) >> +++ passes.def (working copy) >> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3. >> NEXT_PASS (pass_dce); >> NEXT_PASS (pass_tree_unswitch); >> NEXT_PASS (pass_scev_cprop); >> + NEXT_PASS (pass_loop_split); >> NEXT_PASS (pass_record_bounds); >> NEXT_PASS (pass_loop_distribution); >> NEXT_PASS (pass_copy_prop); >> Index: opts.c >> === >> --- opts.c (revision 231115) >> +++ opts.c (working copy) >> @@ -532,6 +532,7 @@ static const struct default_options defa >> regardless of them being declared inline. */ >> { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 }, >> { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, >> 1 }, >> +{ OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 }, >> { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 }, >> { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, >> { OPT_LEVELS_3_PLUS, OPT_ftree_lo
Re: Gimple loop splitting v2
On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz wrote: > Hi, > > On Tue, 1 Dec 2015, Jeff Law wrote: > >> > So, okay for trunk? >> -ENOPATCH > > Sigh :) > Here it is. I found one problem with it. Take: void f(int *a, int M, int *b) { for(int i = 0; i <= M; i++) { if (i < M) a[i] = i; } } CUT --- There are two issues with the code as below. The outer most loop's aux is still set which causes the vectorizer not to vector the loop. The other issue is I need to run pass_scev_cprop after pass_loop_split to get the induction variable usage after the loop gone so the vectorizer will work. Something like (note this is copy and paste from a terminal): diff --git a/gcc/passes.def b/gcc/passes.def index c327900..e8d6ea6 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_dce); NEXT_PASS (pass_tree_unswitch); - NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_loop_split); + NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_loop_distribution); NEXT_PASS (pass_copy_prop); diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c index 5411530..e72ef19 100644 --- a/gcc/tree-ssa-loop-split.c +++ b/gcc/tree-ssa-loop-split.c @@ -592,7 +592,11 @@ tree_ssa_split_loops (void) gcc_assert (scev_initialized_p ()); FOR_EACH_LOOP (loop, 0) -loop->aux = NULL; +{ + loop->aux = NULL; + if (loop_outer (loop)) + loop_outer (loop)->aux = NULL; +} /* Go through all loops starting from innermost. */ FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) @@ -631,7 +635,11 @@ tree_ssa_split_loops (void) } FOR_EACH_LOOP (loop, 0) -loop->aux = NULL; +{ + loop->aux = NULL; + if (loop_outer (loop)) + loop_outer (loop)->aux = NULL; +} if (changed) return TODO_cleanup_cfg; - CUT - Thanks, Andrew > > > Ciao, > Michael. > * common.opt (-fsplit-loops): New flag. > * passes.def (pass_loop_split): Add. > * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3. > (enable_fdo_optimizations): Add loop splitting. > * timevar.def (TV_LOOP_SPLIT): Add. > * tree-pass.h (make_pass_loop_split): Declare. > * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. > * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, > * tree-ssa-loop-split.c: New file. > * Makefile.in (OBJS): Add tree-ssa-loop-split.o. > * doc/invoke.texi (fsplit-loops): Document. > * doc/passes.texi (Loop optimization): Add paragraph about loop > splitting. > > testsuite/ > * gcc.dg/loop-split.c: New test. > > Index: common.opt > === > --- common.opt (revision 231115) > +++ common.opt (working copy) > @@ -2453,6 +2457,10 @@ funswitch-loops > Common Report Var(flag_unswitch_loops) Optimization > Perform loop unswitching. > > +fsplit-loops > +Common Report Var(flag_split_loops) Optimization > +Perform loop splitting. > + > funwind-tables > Common Report Var(flag_unwind_tables) Optimization > Just generate unwind tables for exception handling. > Index: passes.def > === > --- passes.def (revision 231115) > +++ passes.def (working copy) > @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_dce); > NEXT_PASS (pass_tree_unswitch); > NEXT_PASS (pass_scev_cprop); > + NEXT_PASS (pass_loop_split); > NEXT_PASS (pass_record_bounds); > NEXT_PASS (pass_loop_distribution); > NEXT_PASS (pass_copy_prop); > Index: opts.c > === > --- opts.c (revision 231115) > +++ opts.c (working copy) > @@ -532,6 +532,7 @@ static const struct default_options defa > regardless of them being declared inline. */ > { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 }, > { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, > 1 }, > +{ OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, > @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt > opts->x_flag_ipa_cp_alignment = value; >if (!opts_set->x_flag_predictive_commoning) > opts->x_flag_predictive_commoning = value; > + if (!opts_set->x_flag_split_loops) > +opts->x_flag_split_loops = value; >if (!opts_set->x_flag_unswitch_loops) > opts->x_flag_unswitch_loops = value; >if (!opts_set->x_flag_gcse_after_reload) > Index: timevar.def > =
Re: Gimple loop splitting
Hi, On Sun, 24 Jul 2016, Andrew Pinski wrote: > What ever happened to this patch? It got accepted but I deferred inclusion in GCC 6 because it was late in the cycle then and performance results didn't show super improvements (only looked at cpu2006). No regressions, but no nice speedups either. > I was looking into doing this myself today but I found this patch. It is > stage 1 of GCC 7, it might be a good idea to get this patch into GCC. Indeed. If you want to performance test it on something you know where it should help, I'm all ears. Ciao, Michael.
Re: Gimple loop splitting
On Thu, Nov 12, 2015 at 8:52 AM, Michael Matz wrote: > Hello, > > this new pass implements loop iteration space splitting for loops that > contain a conditional that's always true for one part of the iteration > space and false for the other, i.e. such situations: > > for (i = beg; i < end; i++) > if (i < p) > dothis(); > else > dothat(); > > this is transformed into roughly: > > for (i = beg; i < p; i++) > dothis(); > for (; i < end; i++) > dothat(); > > Of course, not quite the above as there needs to be provisions for the > border conditions, if e.g. 'p' is outside the original iteration space, or > the conditional doesn't directly use the control IV, but some other, or > the IV runs backwards. The testcase checks many of these border > conditions. > > This transformation is in itself a good one but can also be an enabler for > the vectorizer. It does increase code size, when the loop body contains > also unconditional code (that one is duplicated), so we only transform hot > loops. I'm a bit unsure of the placement of the new pass, or if it should > be an own pass at all. Right now I've placed it after unswitching and > scev_cprop, before loop distribution. Ideally I think all three, together > with loop fusion and an gimple unroller should be integrated into one loop > nest optimizer, alas, we aren't there yet. > > I'm planning to work on loop fusion in the future as well, but that's not > for GCC 6. > > I've regstrapped this pass enabled with -O2 on x86-64-linux, without > regressions. I've also checked cpu2006 (the non-fortran part) for > correctness, not yet for performance. In the end it should probably only > be enabled for -O3+ (although if the whole loop body is conditional it > makes sense to also have it with -O2 because code growth is very small > then). > > So, okay for trunk? What ever happened to this patch? I was looking into doing this myself today but I found this patch. It is stage 1 of GCC 7, it might be a good idea to get this patch into GCC. Thanks, Andrew > > > Ciao, > Michael. > * passes.def (pass_loop_split): Add. > * timevar.def (TV_LOOP_SPLIT): Add. > * tree-pass.h (make_pass_loop_split): Declare. > * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. > * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, > cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h, > gimple-pretty-print.h, gimple-fold.h, gimplify-me.h. > (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi, > split_loop, tree_ssa_split_loops, > make_pass_loop_split): New functions. > (pass_data_loop_split): New. > (pass_loop_split): New. > > testsuite/ > * gcc.dg/loop-split.c: New test. > > Index: passes.def > === > --- passes.def (revision 229763) > +++ passes.def (working copy) > @@ -233,6 +233,7 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_dce); > NEXT_PASS (pass_tree_unswitch); > NEXT_PASS (pass_scev_cprop); > + NEXT_PASS (pass_loop_split); > NEXT_PASS (pass_record_bounds); > NEXT_PASS (pass_loop_distribution); > NEXT_PASS (pass_copy_prop); > Index: timevar.def > === > --- timevar.def (revision 229763) > +++ timevar.def (working copy) > @@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM , " > DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv") > DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop") > DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching") > +DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting") > DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling") > DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops") > DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization") > Index: tree-pass.h > === > --- tree-pass.h (revision 229763) > +++ tree-pass.h (working copy) > @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n > extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt); > +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt); > Index: tree-ssa-loop-manip.h > === > --- tree-ssa-loop-manip.h (revision 229763) > +++ tree-ssa-loop-manip.h (working copy) > @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc > > extern void create_iv (tr
Re: Gimple loop splitting v2
On 12/02/2015 06:23 AM, Michael Matz wrote: Hi, On Tue, 1 Dec 2015, Jeff Law wrote: So, okay for trunk? -ENOPATCH Sigh :) Here it is. Ciao, Michael. * common.opt (-fsplit-loops): New flag. * passes.def (pass_loop_split): Add. * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3. (enable_fdo_optimizations): Add loop splitting. * timevar.def (TV_LOOP_SPLIT): Add. * tree-pass.h (make_pass_loop_split): Declare. * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, * tree-ssa-loop-split.c: New file. * Makefile.in (OBJS): Add tree-ssa-loop-split.o. * doc/invoke.texi (fsplit-loops): Document. * doc/passes.texi (Loop optimization): Add paragraph about loop splitting. testsuite/ * gcc.dg/loop-split.c: New test. Index: tree-ssa-loop-split.c +/* Return true when BB inside LOOP is a potential iteration space + split point, i.e. ends with a condition like "IV < comp", which + is true on one side of the iteration space and false on the other, + and the split point can be computed. If so, also return the border + point in *BORDER and the comparison induction variable in IV. */ + +static tree +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv) +{ + gimple *last; + gcond *stmt; + affine_iv iv2; + + + /* Make it so, that the first argument of the condition is + the looping one (only swap. */ Nit. I don't think you want a comma after "so". And it looks like your comment got truncated as well. With the comment above fixed, this is fine for the trunk. jeff
Re: Gimple loop splitting v2
Hi, On Tue, 1 Dec 2015, Jeff Law wrote: > > So, okay for trunk? > -ENOPATCH Sigh :) Here it is. Ciao, Michael. * common.opt (-fsplit-loops): New flag. * passes.def (pass_loop_split): Add. * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3. (enable_fdo_optimizations): Add loop splitting. * timevar.def (TV_LOOP_SPLIT): Add. * tree-pass.h (make_pass_loop_split): Declare. * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, * tree-ssa-loop-split.c: New file. * Makefile.in (OBJS): Add tree-ssa-loop-split.o. * doc/invoke.texi (fsplit-loops): Document. * doc/passes.texi (Loop optimization): Add paragraph about loop splitting. testsuite/ * gcc.dg/loop-split.c: New test. Index: common.opt === --- common.opt (revision 231115) +++ common.opt (working copy) @@ -2453,6 +2457,10 @@ funswitch-loops Common Report Var(flag_unswitch_loops) Optimization Perform loop unswitching. +fsplit-loops +Common Report Var(flag_split_loops) Optimization +Perform loop splitting. + funwind-tables Common Report Var(flag_unwind_tables) Optimization Just generate unwind tables for exception handling. Index: passes.def === --- passes.def (revision 231115) +++ passes.def (working copy) @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_dce); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); + NEXT_PASS (pass_loop_split); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_loop_distribution); NEXT_PASS (pass_copy_prop); Index: opts.c === --- opts.c (revision 231115) +++ opts.c (working copy) @@ -532,6 +532,7 @@ static const struct default_options defa regardless of them being declared inline. */ { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 }, +{ OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt opts->x_flag_ipa_cp_alignment = value; if (!opts_set->x_flag_predictive_commoning) opts->x_flag_predictive_commoning = value; + if (!opts_set->x_flag_split_loops) +opts->x_flag_split_loops = value; if (!opts_set->x_flag_unswitch_loops) opts->x_flag_unswitch_loops = value; if (!opts_set->x_flag_gcse_after_reload) Index: timevar.def === --- timevar.def (revision 231115) +++ timevar.def (working copy) @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM , " DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv") DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop") DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching") +DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting") DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling") DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops") DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization") Index: tree-pass.h === --- tree-pass.h (revision 231115) +++ tree-pass.h (working copy) @@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt); extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt); extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt); extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt); Index: tree-ssa-loop-manip.h === --- tree-ssa-loop-manip.h (revision 231115) +++ tree-ssa-loop-manip.h (working copy) @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *, bool, tree *, tree *); +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int, + struct loop *); extern void rewrite_into_loop_closed_ssa (bitmap, unsigned); extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *); extern void verify_loop_closed_ssa (bool); Index: Makefile.in ==
Re: Gimple loop splitting v2
On 12/01/2015 09:46 AM, Michael Matz wrote: Hi, So, okay for trunk? -ENOPATCH Jeff
Gimple loop splitting v2
Hi, On Mon, 16 Nov 2015, Jeff Law wrote: > OK, if you want to keep them, then have a consistent way to turn them > on/off for future debugging. if0/if1 doesn't provide much of a clue to > someone else what to turn on/off if they need to debug this stuff. > > > I don't see any negative tests -- ie tests that should not be split > > > due to boundary conditions. Do you have any from development? > > > > Good point, I had some but only ones where I was able to extend the > > splitters to cover them. I'll think of some that really shouldn't be > > split. > If you've got them, certainly add them. Though I realize they may get > lost over time. Actually, thinking a bit more about this, I don't have any that wouldn't be merely restrictions in the implementation that couldn't be lifted in the future (e.g. unequal step sizes), so I've added no additional ones. > But in that case, the immediate dominator of pre2 & join is still the > initial if statement. So I think we're OK. That was the conclusion I > was starting to come to yesterday, having the ascii art makes it pretty > clear. I'm just not good at conceptualizing a CFG. I have to see it > explicitly and then everything seems so clear and simple. So, this second version should reflect the review. I've moved everything to a new file, split the long function into several logically separate ones, and even included ascii art in the comments :) The testcase got a comment about what to #define for debugging. I've included the pass to -O3 or alternatively if profile-use is on, similar to funswitch-loops. I've also added a proper -fsplit-loops option. There's two functional changes in v2: a bugfix to not try splitting a non-iterating loop (irritatingly such a look returns true from number_of_iterations_exit, but with an ERROR_MARK comparator), and a limitation to avoid combinatorical explosion in artificial testcases: Once we have done a splitting, we don't do any in that loops parents (we may still do splitting in siblings or childs of siblings). I've also done some measurements: first, bootstrap time is unaffected, and regstrapping succeeds without regressions when I activate the pass by default. Then SPECcpu2006: build times are unaffected, everything builds and works also with -fsplit-loops, performance is mostly unaffected, base is -Ofast -funroll-loops -fpeel-loops, peak adds -fsplit-loops. Estimated Estimated Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - - 400.perlbench9770325 30.1 *9770323 30.3 * 401.bzip29650382 25.2 *9650382 25.3 * 403.gcc 8050242 33.3 *8050241 33.4 * 429.mcf 9120311 29.3 *9120311 29.3 * 445.gobmk 10490392 26.8 * 10490391 26.8 * 456.hmmer9330345 27.0 *9330342 27.3 * 458.sjeng 12100422 28.7 * 12100420 28.8 * 462.libquantum 20720308 67.3 * 20720308 67.3 * 464.h264ref 22130423 52.3 * 22130423 52.3 * 471.omnetpp 6250273 22.9 *6250273 22.9 * 473.astar7020311 22.6 *7020311 22.6 * 483.xalancbmk6900191 36.2 *6900190 36.2 * Est. SPECint_base2006 31.7 Est. SPECint2006 31.7 Estimated Estimated Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - - 410.bwaves 13590235 57.7 * 13590235 57.8 * 416.gamess NR NR 433.milc 9180347 26.5 *9180345 26.6 * 434.zeusmp 9100269 33.9 *9100268 33.9 * 435.gromacs 7140260 27.4 *7140262 27.3 * 436.cactusADM 11950237 50.5 * 11950240 49.9 * 437.leslie3d 9400228 41.3 *9400228 41.2 * 444.namd 8020312 25.7 *8020311 25.7 * 447.dealII 11440254 45.0 * 11440254 45.0 * 450.soplex 8340201 41.4 *8340202 41.4 * 453.povray NR NR 454.calculix
Re: Gimple loop splitting
On 11/16/2015 09:05 AM, Michael Matz wrote: It's a prerequisite indeed, but not enough in itself. Sigh. OK. One can always hope. hmmer is actually quite interesting because it's a fairly isolated hot loop posing quite challenging problems for us :) Sounds like it. Essentially, it's a TODO list :-) Probably ought to be disabled when we're not optimizing for speed as well. That should be dealt with by '!optimize_loop_for_size_p (loop)'. Doh, must have missed that. Please clean up the #if 0/#if 1 code in the new tests. Actually I'd prefer if that test contains the by-hand code and the TRACE stuff as well, I'd only change the #if 0 into some #if BYHAND or so ... You might also want to clean out the TRACE stuff. Essentially the tests look like you just dropped in a test you'd been running by hand until now :-) ... the reason being, that bugs in the splitter are somewhat unwieldy to debug by just staring at the dumps, you only get a checksum mismatch, so TRACE=1 is for finding out which of the params and loops is actually miscompiled, TRACE=2 for finding the specific iteration that's broken, and the #if0 code for putting that situation into a non-macroized and smaller function than dotest. (That's actually how I've run the testcase after I had it basically working, extending dotest with a couple more lines, aka example loop sitations, adjusting the checksum, and then making a face and scratching my head and mucking with the TRACE and #if0 macros :) ). OK, if you want to keep them, then have a consistent way to turn them on/off for future debugging. if0/if1 doesn't provide much of a clue to someone else what to turn on/off if they need to debug this stuff. I don't see any negative tests -- ie tests that should not be split due to boundary conditions. Do you have any from development? Good point, I had some but only ones where I was able to extend the splitters to cover them. I'll think of some that really shouldn't be split. If you've got them, certainly add them. Though I realize they may get lost over time. Only with such situations: for (int i = start; i < end; i++) { if (i + offset < bound) ... } Here the condition-IV is not directly defined by a PHI node. If it happens often I don't know, I guess the usual situation is testing the control IV directly. The deficiency is not hard to fix. I'm comfortable waiting until we see the need. I think I convinced myself on paper that the dominator tree is correct due to our helpers doing the right thing (loop_version() for the initial loop copying and split_edge for the above diddling). Let's see if I can paint some ASCII art. So, after loop_version (which updates dom) we have: OK. I was worried about the next step -- where we insert the conditional on the exit from pre1 to have it transfer to join or pre2. But in that case, the immediate dominator of pre2 & join is still the initial if statement. So I think we're OK. That was the conclusion I was starting to come to yesterday, having the ascii art makes it pretty clear. I'm just not good at conceptualizing a CFG. I have to see it explicitly and then everything seems so clear and simple. jeff
Re: Gimple loop splitting
Hi, On Thu, 12 Nov 2015, Jeff Law wrote: > > this new pass implements loop iteration space splitting for loops that > > contain a conditional that's always true for one part of the iteration > > space and false for the other, i.e. such situations: > FWIW, Ajit suggested the same transformation earlier this year. During that > discussion Richi indicated that for hmmer this transformation would enable > vectorization. It's a prerequisite indeed, but not enough in itself. The next problem will be that only parts of access chains inside the hot loop are vectorizable, but for that those parts need to be disambiguated. ICC is doing that by a massive chain of conditionals testing non-overlapping of the respective arrays at runtime. Our vectorizer could also do that (possibly by increasing the allowed number of conditionals), but the next problem then is that one of these (then indeed separated) parts is not vectorizable by our vectorizer: it's a 'a[i] = f(a[i-1])' dependency that can't yet be handled by us. If the separation of parts would be done by loop distribution that would be fine (we'd have separate loops for the parts, some of them vectorizable), but our loop distribution can't do runtime disambiguation, only our vectorizer. hmmer is actually quite interesting because it's a fairly isolated hot loop posing quite challenging problems for us :) > > It does increase code size, when the loop body contains > > also unconditional code (that one is duplicated), so we only transform hot > > loops. > > Probably ought to be disabled when we're not optimizing for speed as well. That should be dealt with by '!optimize_loop_for_size_p (loop)'. > > I've regstrapped this pass enabled with -O2 on x86-64-linux, without > > regressions. I've also checked cpu2006 (the non-fortran part) for > > correctness, not yet for performance. In the end it should probably only > > be enabled for -O3+ (although if the whole loop body is conditional it > > makes sense to also have it with -O2 because code growth is very small > > then). > > Very curious on the performance side, so if you could get some #s on that, > it'd be greatly appreciated. My test machine misbehaved over the weekend, but as soon as I have them I'll update here. > > testsuite/ > > * gcc.dg/loop-split.c: New test. > > Please clean up the #if 0/#if 1 code in the new tests. Actually I'd prefer if that test contains the by-hand code and the TRACE stuff as well, I'd only change the #if 0 into some #if BYHAND or so ... > You might also want to clean out the TRACE stuff. Essentially the tests > look like you just dropped in a test you'd been running by hand until > now :-) ... the reason being, that bugs in the splitter are somewhat unwieldy to debug by just staring at the dumps, you only get a checksum mismatch, so TRACE=1 is for finding out which of the params and loops is actually miscompiled, TRACE=2 for finding the specific iteration that's broken, and the #if0 code for putting that situation into a non-macroized and smaller function than dotest. (That's actually how I've run the testcase after I had it basically working, extending dotest with a couple more lines, aka example loop sitations, adjusting the checksum, and then making a face and scratching my head and mucking with the TRACE and #if0 macros :) ). > I don't see any negative tests -- ie tests that should not be split due > to boundary conditions. Do you have any from development? Good point, I had some but only ones where I was able to extend the splitters to cover them. I'll think of some that really shouldn't be split. > > Index: tree-ssa-loop-unswitch.c > > === > > --- tree-ssa-loop-unswitch.c(revision 229763) > > +++ tree-ssa-loop-unswitch.c(working copy) > Given the amount of new code, unless there's a strong need, I'd prefer this > transformation to be implemented in its own file. Okay. > > +/* Give an induction variable GUARD_IV, and its affine descriptor IV, > > + find the loop phi node in LOOP defining it directly, or create > > + such phi node. Return that phi node. */ > > + > > +static gphi * > > +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * > > /*iv*/) > > +{ > > + gimple *def = SSA_NAME_DEF_STMT (guard_iv); > > + gphi *phi; > > + if ((phi = dyn_cast (def)) > > + && gimple_bb (phi) == loop->header) > > +return phi; > > + > > + /* XXX Create the PHI instead. */ > > + return NULL; > > So right now we just punt if we need to create the PHI? Does that > happen with any kind of regularity in practice? Only with such situations: for (int i = start; i < end; i++) { if (i + offset < bound) ... } Here the condition-IV is not directly defined by a PHI node. If it happens often I don't know, I guess the usual situation is testing the control IV directly. The deficiency is not hard to fix. > >
Re: Gimple loop splitting
On 11/12/2015 09:52 AM, Michael Matz wrote: Hello, this new pass implements loop iteration space splitting for loops that contain a conditional that's always true for one part of the iteration space and false for the other, i.e. such situations: FWIW, Ajit suggested the same transformation earlier this year. During that discussion Richi indicated that for hmmer this transformation would enable vectorization. This transformation is in itself a good one but can also be an enabler for the vectorizer. Agreed. It does increase code size, when the loop body contains also unconditional code (that one is duplicated), so we only transform hot loops. Probably ought to be disabled when we're not optimizing for speed as well. I'm a bit unsure of the placement of the new pass, or if it should be an own pass at all. Right now I've placed it after unswitching and scev_cprop, before loop distribution. Ideally I think all three, together with loop fusion and an gimple unroller should be integrated into one loop nest optimizer, alas, we aren't there yet. Given its impact on the looping structure, I'd think early in the loop optimizer. Given the similarities with unswitching, I think before/after unswitching is a natural first cut. We can always iterate if it looks like putting it elsewhere would make sense. I've regstrapped this pass enabled with -O2 on x86-64-linux, without regressions. I've also checked cpu2006 (the non-fortran part) for correctness, not yet for performance. In the end it should probably only be enabled for -O3+ (although if the whole loop body is conditional it makes sense to also have it with -O2 because code growth is very small then). Very curious on the performance side, so if you could get some #s on that, it'd be greatly appreciated. I'd be comfortable with this at -O2, but won't object if you'd prefer -O3. So, okay for trunk? Ciao, Michael. * passes.def (pass_loop_split): Add. * timevar.def (TV_LOOP_SPLIT): Add. * tree-pass.h (make_pass_loop_split): Declare. * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h, gimple-pretty-print.h, gimple-fold.h, gimplify-me.h. (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi, split_loop, tree_ssa_split_loops, make_pass_loop_split): New functions. (pass_data_loop_split): New. (pass_loop_split): New. testsuite/ * gcc.dg/loop-split.c: New test. Please clean up the #if 0/#if 1 code in the new tests. You might also want to clean out the TRACE stuff. Essentially the tests look like you just dropped in a test you'd been running by hand until now :-) I don't see any negative tests -- ie tests that should not be split due to boundary conditions. Do you have any from development? If so it'd be good to have those too. Index: tree-ssa-loop-manip.h === --- tree-ssa-loop-manip.h (revision 229763) +++ tree-ssa-loop-manip.h (working copy) @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *, bool, tree *, tree *); +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int, + struct loop *); extern void rewrite_into_loop_closed_ssa (bitmap, unsigned); extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *); extern void verify_loop_closed_ssa (bool); Index: tree-ssa-loop-unswitch.c === --- tree-ssa-loop-unswitch.c(revision 229763) +++ tree-ssa-loop-unswitch.c(working copy) Given the amount of new code, unless there's a strong need, I'd prefer this transformation to be implemented in its own file. + +/* Give an induction variable GUARD_IV, and its affine descriptor IV, + find the loop phi node in LOOP defining it directly, or create + such phi node. Return that phi node. */ + +static gphi * +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/) +{ + gimple *def = SSA_NAME_DEF_STMT (guard_iv); + gphi *phi; + if ((phi = dyn_cast (def)) + && gimple_bb (phi) == loop->header) +return phi; + + /* XXX Create the PHI instead. */ + return NULL; So right now we just punt if we need to create the PHI? Does that happen with any kind of regularity in practice? +} + +/* Checks if LOOP contains an conditional block whose condition + depends on which side in the iteration space it is, and if so + splits the iteration space into two loops. Returns true if the + loop was split. NITER must contain the iteration descriptor for the + single exit of LOOP. */ + +static bool +split_loop (struct loop *loop, s
Gimple loop splitting
Hello, this new pass implements loop iteration space splitting for loops that contain a conditional that's always true for one part of the iteration space and false for the other, i.e. such situations: for (i = beg; i < end; i++) if (i < p) dothis(); else dothat(); this is transformed into roughly: for (i = beg; i < p; i++) dothis(); for (; i < end; i++) dothat(); Of course, not quite the above as there needs to be provisions for the border conditions, if e.g. 'p' is outside the original iteration space, or the conditional doesn't directly use the control IV, but some other, or the IV runs backwards. The testcase checks many of these border conditions. This transformation is in itself a good one but can also be an enabler for the vectorizer. It does increase code size, when the loop body contains also unconditional code (that one is duplicated), so we only transform hot loops. I'm a bit unsure of the placement of the new pass, or if it should be an own pass at all. Right now I've placed it after unswitching and scev_cprop, before loop distribution. Ideally I think all three, together with loop fusion and an gimple unroller should be integrated into one loop nest optimizer, alas, we aren't there yet. I'm planning to work on loop fusion in the future as well, but that's not for GCC 6. I've regstrapped this pass enabled with -O2 on x86-64-linux, without regressions. I've also checked cpu2006 (the non-fortran part) for correctness, not yet for performance. In the end it should probably only be enabled for -O3+ (although if the whole loop body is conditional it makes sense to also have it with -O2 because code growth is very small then). So, okay for trunk? Ciao, Michael. * passes.def (pass_loop_split): Add. * timevar.def (TV_LOOP_SPLIT): Add. * tree-pass.h (make_pass_loop_split): Declare. * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare. * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h, cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h, gimple-pretty-print.h, gimple-fold.h, gimplify-me.h. (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi, split_loop, tree_ssa_split_loops, make_pass_loop_split): New functions. (pass_data_loop_split): New. (pass_loop_split): New. testsuite/ * gcc.dg/loop-split.c: New test. Index: passes.def === --- passes.def (revision 229763) +++ passes.def (working copy) @@ -233,6 +233,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_dce); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); + NEXT_PASS (pass_loop_split); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_loop_distribution); NEXT_PASS (pass_copy_prop); Index: timevar.def === --- timevar.def (revision 229763) +++ timevar.def (working copy) @@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM , " DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv") DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop") DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching") +DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting") DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling") DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops") DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization") Index: tree-pass.h === --- tree-pass.h (revision 229763) +++ tree-pass.h (working copy) @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt); extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt); extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt); extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt); Index: tree-ssa-loop-manip.h === --- tree-ssa-loop-manip.h (revision 229763) +++ tree-ssa-loop-manip.h (working copy) @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *, bool, tree *, tree *); +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int, + struct loop *); extern void rewrite_into_loop_closed_ssa (bitmap, unsigned); extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *); extern void verify_loop_closed_ssa (bool); Index: tree-ssa-loop-unswitch