RE: Gimple loop splitting v2

2016-10-25 Thread Tamar Christina
Hi Michael,

The commit seems to be causing an ICE on aarch64 (just tested latest trunk).

I've created a Bugzilla ticket with a test input 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78107

Regards,
Tamar

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Michael Matz
> Sent: 24 October 2016 10:02
> To: Bin.Cheng
> Cc: Jeff Law; gcc-patches List
> Subject: Re: Gimple loop splitting v2
> 
> Hi,
> 
> On Mon, 24 Oct 2016, Bin.Cheng wrote:
> 
> > Unfortunately I didn't reproduce the improvement in my run on AArch64,
> > I will double check if I made some mistakes.
> 
> Yeah, our regular testers also didn't pick up these kinds of improvements.
> As I said, the machine was quite jumpy (though not loaded at all, and fixated
> to run on one CPU) :-/
> 
> 
> Ciao,
> Michael.



Re: Gimple loop splitting v2

2016-10-24 Thread Michael Matz
Hi,

On Mon, 24 Oct 2016, Bin.Cheng wrote:

> Unfortunately I didn't reproduce the improvement in my run on AArch64, I 
> will double check if I made some mistakes.

Yeah, our regular testers also didn't pick up these kinds of improvements.  
As I said, the machine was quite jumpy (though not loaded at all, and 
fixated to run on one CPU) :-/


Ciao,
Michael.


Re: Gimple loop splitting v2

2016-10-24 Thread Bin.Cheng
On Thu, Oct 20, 2016 at 3:55 PM, Bin.Cheng  wrote:
> On Thu, Oct 20, 2016 at 3:43 PM, Michael Matz  wrote:
>> Hi,
>>
>> On Sat, 5 Dec 2015, Jeff Law wrote:
>>
>>> Nit.  I don't think you want a comma after "so".  And it looks like your
>>> comment got truncated as well.
>>>
>>> With the comment above fixed, this is fine for the trunk.
>>
>> I'm terribly sorry to have dropped the ball here, but I've committed this
>> now after not even a year ;-/ (r241374)  Obviously after rebootstrapping
>> with all,ada languages.  I also did some benchmark run which should be
>> taken with a grain of salt as the machine had fairly variant results but
>> the improvements are real, though perhaps not always in that range (it's a
>> normal three repeats run).  I'm really curious if our automatic tester can
>> pick up similar improvements, because if so, it's extreme (5 to 15 percent
>> in some benchmarks) and we can brag about it for GCC 7 ;-)
> This is nice, thanks for doing it.  I will check the improvement on AArch64.
Hi,
Unfortunately I didn't reproduce the improvement in my run on AArch64,
I will double check if I made some mistakes.

Thanks,
bin
>>
>> 400.perlbench9770519   18.8 *9770   508   19.2 *
>> 401.bzip29650668   14.5 *9650   666   14.5 *
>> 403.gcc  8050455   17.7 *8050   432   18.6 *
>> 429.mcf  9120477   19.1 *9120   467   19.5 *
>> 445.gobmk   10490643   16.3 *   10490   644   16.3 *
>> 456.hmmer9330641   14.6 *9330   614   15.2 *
>> 458.sjeng   12100784   15.4 *   12100   762   15.9 *
>> 462.libquantum  20720605   34.2 *   20720   600   34.5 *
>> 464.h264ref 22130969   22.8 *   22130   969   22.8 *
>> 471.omnetpp  6250438   14.3 *6250   358   17.5 *
>> 473.astar7020494   14.2 *7020   492   14.3 *
>> 483.xalancbmk6900342   20.2 *6900   336   20.6 *
>>  Est. SPECint(R)_base2006  17.9
>>  Est. SPECint2006 18.5
>>
>> 410.bwaves  13590563   24.1 *   13590   506   26.9 *
>> 416.gamess  NR NR
>> 433.milc 9180375   24.5 *9180   349   26.3 *
>> 434.zeusmp   9100433   21.0 *9100   423   21.5 *
>> 435.gromacs  7140402   17.7 *7140   411   17.4 *
>> 436.cactusADM   11950486   24.6 *   11950   486   24.6 *
>> 437.leslie3d 9400421   22.4 *9400   419   22.4 *
>> 444.namd 8020520   15.4 *8020   520   15.4 *
>> 447.dealII  NR NR
>> 450.soplex   8340393   21.2 *8340   391   21.3 *
>> 453.povray   5320277   19.2 *5320   278   19.1 *
>> 454.calculix 8250453   18.2 *8250   460   17.9 *
>> 459.GemsFDTD10610542   19.6 *   10610   537   19.8 *
>> 465.tonto9840492   20.0 *9840   491   20.0 *
>> 470.lbm 13740466   29.5 *   13740   430   32.0 *
>> 481.wrf 11170492   22.7 *   11170   457   24.4 *
>> 482.sphinx3 19490659   29.6 *   19490   655   29.8 *
>>  Est. SPECfp(R)_base2006   21.6
>>  Est. SPECfp2006  22.1
>>
>>
>> Ciao,
>> Michael.


Re: Gimple loop splitting v2

2016-10-20 Thread Jeff Law

On 10/20/2016 08:43 AM, Michael Matz wrote:

Hi,

On Sat, 5 Dec 2015, Jeff Law wrote:


Nit.  I don't think you want a comma after "so".  And it looks like your
comment got truncated as well.

With the comment above fixed, this is fine for the trunk.


I'm terribly sorry to have dropped the ball here, but I've committed this
now after not even a year ;-/ (r241374)

It'd totally fallen off my radar.  I had to go find it in my archives :-).




 Obviously after rebootstrapping

with all,ada languages.  I also did some benchmark run which should be
taken with a grain of salt as the machine had fairly variant results but
the improvements are real, though perhaps not always in that range (it's a
normal three repeats run).  I'm really curious if our automatic tester can
pick up similar improvements, because if so, it's extreme (5 to 15 percent
in some benchmarks) and we can brag about it for GCC 7 ;-)
Yea.  I don't expect it applies that often and ISTM that it's probably 
most beneficial by enabling other stuff later in the loop optimizer 
pipeline to see more loops without embedded flow control.



ANyway, glad to see it go in.

jeff



Re: Gimple loop splitting v2

2016-10-20 Thread Bin.Cheng
On Thu, Oct 20, 2016 at 3:43 PM, Michael Matz  wrote:
> Hi,
>
> On Sat, 5 Dec 2015, Jeff Law wrote:
>
>> Nit.  I don't think you want a comma after "so".  And it looks like your
>> comment got truncated as well.
>>
>> With the comment above fixed, this is fine for the trunk.
>
> I'm terribly sorry to have dropped the ball here, but I've committed this
> now after not even a year ;-/ (r241374)  Obviously after rebootstrapping
> with all,ada languages.  I also did some benchmark run which should be
> taken with a grain of salt as the machine had fairly variant results but
> the improvements are real, though perhaps not always in that range (it's a
> normal three repeats run).  I'm really curious if our automatic tester can
> pick up similar improvements, because if so, it's extreme (5 to 15 percent
> in some benchmarks) and we can brag about it for GCC 7 ;-)
This is nice, thanks for doing it.  I will check the improvement on AArch64.

Thanks,
bin
>
> 400.perlbench9770519   18.8 *9770   508   19.2 *
> 401.bzip29650668   14.5 *9650   666   14.5 *
> 403.gcc  8050455   17.7 *8050   432   18.6 *
> 429.mcf  9120477   19.1 *9120   467   19.5 *
> 445.gobmk   10490643   16.3 *   10490   644   16.3 *
> 456.hmmer9330641   14.6 *9330   614   15.2 *
> 458.sjeng   12100784   15.4 *   12100   762   15.9 *
> 462.libquantum  20720605   34.2 *   20720   600   34.5 *
> 464.h264ref 22130969   22.8 *   22130   969   22.8 *
> 471.omnetpp  6250438   14.3 *6250   358   17.5 *
> 473.astar7020494   14.2 *7020   492   14.3 *
> 483.xalancbmk6900342   20.2 *6900   336   20.6 *
>  Est. SPECint(R)_base2006  17.9
>  Est. SPECint2006 18.5
>
> 410.bwaves  13590563   24.1 *   13590   506   26.9 *
> 416.gamess  NR NR
> 433.milc 9180375   24.5 *9180   349   26.3 *
> 434.zeusmp   9100433   21.0 *9100   423   21.5 *
> 435.gromacs  7140402   17.7 *7140   411   17.4 *
> 436.cactusADM   11950486   24.6 *   11950   486   24.6 *
> 437.leslie3d 9400421   22.4 *9400   419   22.4 *
> 444.namd 8020520   15.4 *8020   520   15.4 *
> 447.dealII  NR NR
> 450.soplex   8340393   21.2 *8340   391   21.3 *
> 453.povray   5320277   19.2 *5320   278   19.1 *
> 454.calculix 8250453   18.2 *8250   460   17.9 *
> 459.GemsFDTD10610542   19.6 *   10610   537   19.8 *
> 465.tonto9840492   20.0 *9840   491   20.0 *
> 470.lbm 13740466   29.5 *   13740   430   32.0 *
> 481.wrf 11170492   22.7 *   11170   457   24.4 *
> 482.sphinx3 19490659   29.6 *   19490   655   29.8 *
>  Est. SPECfp(R)_base2006   21.6
>  Est. SPECfp2006  22.1
>
>
> Ciao,
> Michael.


Re: Gimple loop splitting v2

2016-10-20 Thread Michael Matz
Hi,

On Sat, 5 Dec 2015, Jeff Law wrote:

> Nit.  I don't think you want a comma after "so".  And it looks like your
> comment got truncated as well.
> 
> With the comment above fixed, this is fine for the trunk.

I'm terribly sorry to have dropped the ball here, but I've committed this 
now after not even a year ;-/ (r241374)  Obviously after rebootstrapping 
with all,ada languages.  I also did some benchmark run which should be 
taken with a grain of salt as the machine had fairly variant results but 
the improvements are real, though perhaps not always in that range (it's a 
normal three repeats run).  I'm really curious if our automatic tester can 
pick up similar improvements, because if so, it's extreme (5 to 15 percent 
in some benchmarks) and we can brag about it for GCC 7 ;-)

400.perlbench9770519   18.8 *9770   508   19.2 *  
401.bzip29650668   14.5 *9650   666   14.5 *  
403.gcc  8050455   17.7 *8050   432   18.6 *  
429.mcf  9120477   19.1 *9120   467   19.5 *  
445.gobmk   10490643   16.3 *   10490   644   16.3 *  
456.hmmer9330641   14.6 *9330   614   15.2 *  
458.sjeng   12100784   15.4 *   12100   762   15.9 *  
462.libquantum  20720605   34.2 *   20720   600   34.5 *  
464.h264ref 22130969   22.8 *   22130   969   22.8 *  
471.omnetpp  6250438   14.3 *6250   358   17.5 *  
473.astar7020494   14.2 *7020   492   14.3 *  
483.xalancbmk6900342   20.2 *6900   336   20.6 *  
 Est. SPECint(R)_base2006  17.9
 Est. SPECint2006 18.5

410.bwaves  13590563   24.1 *   13590   506   26.9 *  
416.gamess  NR NR 
433.milc 9180375   24.5 *9180   349   26.3 *  
434.zeusmp   9100433   21.0 *9100   423   21.5 *  
435.gromacs  7140402   17.7 *7140   411   17.4 *  
436.cactusADM   11950486   24.6 *   11950   486   24.6 *  
437.leslie3d 9400421   22.4 *9400   419   22.4 *  
444.namd 8020520   15.4 *8020   520   15.4 *  
447.dealII  NR NR 
450.soplex   8340393   21.2 *8340   391   21.3 *  
453.povray   5320277   19.2 *5320   278   19.1 *  
454.calculix 8250453   18.2 *8250   460   17.9 *  
459.GemsFDTD10610542   19.6 *   10610   537   19.8 *  
465.tonto9840492   20.0 *9840   491   20.0 *  
470.lbm 13740466   29.5 *   13740   430   32.0 *  
481.wrf 11170492   22.7 *   11170   457   24.4 *  
482.sphinx3 19490659   29.6 *   19490   655   29.8 *  
 Est. SPECfp(R)_base2006   21.6
 Est. SPECfp2006  22.1


Ciao,
Michael.


Re: Gimple loop splitting v2

2016-07-27 Thread Richard Biener
On Wed, Jul 27, 2016 at 8:17 AM, Andrew Pinski  wrote:
> On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener
>  wrote:
>> On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski  wrote:
>>> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz  wrote:
 Hi,

 On Tue, 1 Dec 2015, Jeff Law wrote:

> > So, okay for trunk?
> -ENOPATCH

 Sigh :)
 Here it is.
>>>
>>>
>>> I found one problem with it.
>>> Take:
>>> void f(int *a, int M, int *b)
>>> {
>>>   for(int i = 0; i <= M; i++)
>>> {
>>>if (i < M)
>>> a[i] = i;
>>> }
>>> }
>>>  CUT ---
>>> There are two issues with the code as below.  The outer most loop's
>>> aux is still set which causes the vectorizer not to vector the loop.
>>> The other issue is I need to run pass_scev_cprop after pass_loop_split
>>> to get the induction variable usage after the loop gone so the
>>> vectorizer will work.
>>
>> I think scev_cprop needs to be re-written to an utility so that the 
>> vectorizer
>> itself can (within its own cost-model) eliminate an induction using it.
>>
>> Richard.
>>
>>> Something like (note this is copy and paste from a terminal):
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index c327900..e8d6ea6 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
>>>   NEXT_PASS (pass_copy_prop);
>>>   NEXT_PASS (pass_dce);
>>>   NEXT_PASS (pass_tree_unswitch);
>>> - NEXT_PASS (pass_scev_cprop);
>>>   NEXT_PASS (pass_loop_split);
>>> + NEXT_PASS (pass_scev_cprop);
>>>   NEXT_PASS (pass_record_bounds);
>>>   NEXT_PASS (pass_loop_distribution);
>>>   NEXT_PASS (pass_copy_prop);
>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>> index 5411530..e72ef19 100644
>>> --- a/gcc/tree-ssa-loop-split.c
>>> +++ b/gcc/tree-ssa-loop-split.c
>>> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>>>
>>>gcc_assert (scev_initialized_p ());
>>>FOR_EACH_LOOP (loop, 0)
>>> -loop->aux = NULL;
>>> +{
>>> +  loop->aux = NULL;
>>> +  if (loop_outer (loop))
>>> +   loop_outer (loop)->aux = NULL;
>>> +}
>>
>> How does the iterator not visit loop_outer (loop)?!
>
> The iterator with flags of 0 does not visit the the root.  So the way
> to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so
> we zero out the root too.

Or not set ->aux on the root in the first place.

Richard.

> Thanks,
> Andrew
>
>>
>>>
>>>/* Go through all loops starting from innermost.  */
>>>FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>>>  }
>>>
>>>FOR_EACH_LOOP (loop, 0)
>>> -loop->aux = NULL;
>>> +{
>>> +  loop->aux = NULL;
>>> +  if (loop_outer (loop))
>>> +   loop_outer (loop)->aux = NULL;
>>> +}
>>>
>>>if (changed)
>>>  return TODO_cleanup_cfg;
>>> -  CUT -
>>>
>>> Thanks,
>>> Andrew
>>>
>>>


 Ciao,
 Michael.
 * common.opt (-fsplit-loops): New flag.
 * passes.def (pass_loop_split): Add.
 * opts.c (default_options_table): Add OPT_fsplit_loops entry at 
 -O3.
 (enable_fdo_optimizations): Add loop splitting.
 * timevar.def (TV_LOOP_SPLIT): Add.
 * tree-pass.h (make_pass_loop_split): Declare.
 * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
 * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
 * tree-ssa-loop-split.c: New file.
 * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
 * doc/invoke.texi (fsplit-loops): Document.
 * doc/passes.texi (Loop optimization): Add paragraph about loop
 splitting.

 testsuite/
 * gcc.dg/loop-split.c: New test.

 Index: common.opt
 ===
 --- common.opt  (revision 231115)
 +++ common.opt  (working copy)
 @@ -2453,6 +2457,10 @@ funswitch-loops
  Common Report Var(flag_unswitch_loops) Optimization
  Perform loop unswitching.

 +fsplit-loops
 +Common Report Var(flag_split_loops) Optimization
 +Perform loop splitting.
 +
  funwind-tables
  Common Report Var(flag_unwind_tables) Optimization
  Just generate unwind tables for exception handling.
 Index: passes.def
 ===
 --- passes.def  (revision 231115)
 +++ passes.def  (working copy)
 @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_tree_unswitch);
   NEXT_PASS (pass_scev_cprop);
 + NEXT_PASS (pass_loop_split);
   NEXT_PASS (pass_record_bounds);
   NEXT_PASS (pass_loop_distribution);
   NEXT_PASS (pass_copy_prop);
 I

Re: Gimple loop splitting v2

2016-07-26 Thread Andrew Pinski
On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener
 wrote:
> On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski  wrote:
>> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz  wrote:
>>> Hi,
>>>
>>> On Tue, 1 Dec 2015, Jeff Law wrote:
>>>
 > So, okay for trunk?
 -ENOPATCH
>>>
>>> Sigh :)
>>> Here it is.
>>
>>
>> I found one problem with it.
>> Take:
>> void f(int *a, int M, int *b)
>> {
>>   for(int i = 0; i <= M; i++)
>> {
>>if (i < M)
>> a[i] = i;
>> }
>> }
>>  CUT ---
>> There are two issues with the code as below.  The outer most loop's
>> aux is still set which causes the vectorizer not to vector the loop.
>> The other issue is I need to run pass_scev_cprop after pass_loop_split
>> to get the induction variable usage after the loop gone so the
>> vectorizer will work.
>
> I think scev_cprop needs to be re-written to an utility so that the vectorizer
> itself can (within its own cost-model) eliminate an induction using it.
>
> Richard.
>
>> Something like (note this is copy and paste from a terminal):
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index c327900..e8d6ea6 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
>>   NEXT_PASS (pass_copy_prop);
>>   NEXT_PASS (pass_dce);
>>   NEXT_PASS (pass_tree_unswitch);
>> - NEXT_PASS (pass_scev_cprop);
>>   NEXT_PASS (pass_loop_split);
>> + NEXT_PASS (pass_scev_cprop);
>>   NEXT_PASS (pass_record_bounds);
>>   NEXT_PASS (pass_loop_distribution);
>>   NEXT_PASS (pass_copy_prop);
>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>> index 5411530..e72ef19 100644
>> --- a/gcc/tree-ssa-loop-split.c
>> +++ b/gcc/tree-ssa-loop-split.c
>> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>>
>>gcc_assert (scev_initialized_p ());
>>FOR_EACH_LOOP (loop, 0)
>> -loop->aux = NULL;
>> +{
>> +  loop->aux = NULL;
>> +  if (loop_outer (loop))
>> +   loop_outer (loop)->aux = NULL;
>> +}
>
> How does the iterator not visit loop_outer (loop)?!

The iterator with flags of 0 does not visit the the root.  So the way
to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so
we zero out the root too.

Thanks,
Andrew

>
>>
>>/* Go through all loops starting from innermost.  */
>>FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>>  }
>>
>>FOR_EACH_LOOP (loop, 0)
>> -loop->aux = NULL;
>> +{
>> +  loop->aux = NULL;
>> +  if (loop_outer (loop))
>> +   loop_outer (loop)->aux = NULL;
>> +}
>>
>>if (changed)
>>  return TODO_cleanup_cfg;
>> -  CUT -
>>
>> Thanks,
>> Andrew
>>
>>
>>>
>>>
>>> Ciao,
>>> Michael.
>>> * common.opt (-fsplit-loops): New flag.
>>> * passes.def (pass_loop_split): Add.
>>> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>>> (enable_fdo_optimizations): Add loop splitting.
>>> * timevar.def (TV_LOOP_SPLIT): Add.
>>> * tree-pass.h (make_pass_loop_split): Declare.
>>> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>>> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>>> * tree-ssa-loop-split.c: New file.
>>> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>>> * doc/invoke.texi (fsplit-loops): Document.
>>> * doc/passes.texi (Loop optimization): Add paragraph about loop
>>> splitting.
>>>
>>> testsuite/
>>> * gcc.dg/loop-split.c: New test.
>>>
>>> Index: common.opt
>>> ===
>>> --- common.opt  (revision 231115)
>>> +++ common.opt  (working copy)
>>> @@ -2453,6 +2457,10 @@ funswitch-loops
>>>  Common Report Var(flag_unswitch_loops) Optimization
>>>  Perform loop unswitching.
>>>
>>> +fsplit-loops
>>> +Common Report Var(flag_split_loops) Optimization
>>> +Perform loop splitting.
>>> +
>>>  funwind-tables
>>>  Common Report Var(flag_unwind_tables) Optimization
>>>  Just generate unwind tables for exception handling.
>>> Index: passes.def
>>> ===
>>> --- passes.def  (revision 231115)
>>> +++ passes.def  (working copy)
>>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>>>   NEXT_PASS (pass_dce);
>>>   NEXT_PASS (pass_tree_unswitch);
>>>   NEXT_PASS (pass_scev_cprop);
>>> + NEXT_PASS (pass_loop_split);
>>>   NEXT_PASS (pass_record_bounds);
>>>   NEXT_PASS (pass_loop_distribution);
>>>   NEXT_PASS (pass_copy_prop);
>>> Index: opts.c
>>> ===
>>> --- opts.c  (revision 231115)
>>> +++ opts.c  (working copy)
>>> @@ -532,6 +532,7 @@ static const struct default_options defa
>>> regardless of them being declared inline.  *

Re: Gimple loop splitting v2

2016-07-26 Thread Richard Biener
On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski  wrote:
> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz  wrote:
>> Hi,
>>
>> On Tue, 1 Dec 2015, Jeff Law wrote:
>>
>>> > So, okay for trunk?
>>> -ENOPATCH
>>
>> Sigh :)
>> Here it is.
>
>
> I found one problem with it.
> Take:
> void f(int *a, int M, int *b)
> {
>   for(int i = 0; i <= M; i++)
> {
>if (i < M)
> a[i] = i;
> }
> }
>  CUT ---
> There are two issues with the code as below.  The outer most loop's
> aux is still set which causes the vectorizer not to vector the loop.
> The other issue is I need to run pass_scev_cprop after pass_loop_split
> to get the induction variable usage after the loop gone so the
> vectorizer will work.

I think scev_cprop needs to be re-written to an utility so that the vectorizer
itself can (within its own cost-model) eliminate an induction using it.

Richard.

> Something like (note this is copy and paste from a terminal):
> diff --git a/gcc/passes.def b/gcc/passes.def
> index c327900..e8d6ea6 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
>   NEXT_PASS (pass_copy_prop);
>   NEXT_PASS (pass_dce);
>   NEXT_PASS (pass_tree_unswitch);
> - NEXT_PASS (pass_scev_cprop);
>   NEXT_PASS (pass_loop_split);
> + NEXT_PASS (pass_scev_cprop);
>   NEXT_PASS (pass_record_bounds);
>   NEXT_PASS (pass_loop_distribution);
>   NEXT_PASS (pass_copy_prop);
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 5411530..e72ef19 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>
>gcc_assert (scev_initialized_p ());
>FOR_EACH_LOOP (loop, 0)
> -loop->aux = NULL;
> +{
> +  loop->aux = NULL;
> +  if (loop_outer (loop))
> +   loop_outer (loop)->aux = NULL;
> +}

How does the iterator not visit loop_outer (loop)?!

>
>/* Go through all loops starting from innermost.  */
>FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>  }
>
>FOR_EACH_LOOP (loop, 0)
> -loop->aux = NULL;
> +{
> +  loop->aux = NULL;
> +  if (loop_outer (loop))
> +   loop_outer (loop)->aux = NULL;
> +}
>
>if (changed)
>  return TODO_cleanup_cfg;
> -  CUT -
>
> Thanks,
> Andrew
>
>
>>
>>
>> Ciao,
>> Michael.
>> * common.opt (-fsplit-loops): New flag.
>> * passes.def (pass_loop_split): Add.
>> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>> (enable_fdo_optimizations): Add loop splitting.
>> * timevar.def (TV_LOOP_SPLIT): Add.
>> * tree-pass.h (make_pass_loop_split): Declare.
>> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>> * tree-ssa-loop-split.c: New file.
>> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>> * doc/invoke.texi (fsplit-loops): Document.
>> * doc/passes.texi (Loop optimization): Add paragraph about loop
>> splitting.
>>
>> testsuite/
>> * gcc.dg/loop-split.c: New test.
>>
>> Index: common.opt
>> ===
>> --- common.opt  (revision 231115)
>> +++ common.opt  (working copy)
>> @@ -2453,6 +2457,10 @@ funswitch-loops
>>  Common Report Var(flag_unswitch_loops) Optimization
>>  Perform loop unswitching.
>>
>> +fsplit-loops
>> +Common Report Var(flag_split_loops) Optimization
>> +Perform loop splitting.
>> +
>>  funwind-tables
>>  Common Report Var(flag_unwind_tables) Optimization
>>  Just generate unwind tables for exception handling.
>> Index: passes.def
>> ===
>> --- passes.def  (revision 231115)
>> +++ passes.def  (working copy)
>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>>   NEXT_PASS (pass_dce);
>>   NEXT_PASS (pass_tree_unswitch);
>>   NEXT_PASS (pass_scev_cprop);
>> + NEXT_PASS (pass_loop_split);
>>   NEXT_PASS (pass_record_bounds);
>>   NEXT_PASS (pass_loop_distribution);
>>   NEXT_PASS (pass_copy_prop);
>> Index: opts.c
>> ===
>> --- opts.c  (revision 231115)
>> +++ opts.c  (working copy)
>> @@ -532,6 +532,7 @@ static const struct default_options defa
>> regardless of them being declared inline.  */
>>  { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>>  { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 
>> 1 },
>> +{ OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>>  { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>>  { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>>  { OPT_LEVELS_3_PLUS, OPT_ftree_lo

Re: Gimple loop splitting v2

2016-07-25 Thread Andrew Pinski
On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz  wrote:
> Hi,
>
> On Tue, 1 Dec 2015, Jeff Law wrote:
>
>> > So, okay for trunk?
>> -ENOPATCH
>
> Sigh :)
> Here it is.


I found one problem with it.
Take:
void f(int *a, int M, int *b)
{
  for(int i = 0; i <= M; i++)
{
   if (i < M)
a[i] = i;
}
}
 CUT ---
There are two issues with the code as below.  The outer most loop's
aux is still set which causes the vectorizer not to vector the loop.
The other issue is I need to run pass_scev_cprop after pass_loop_split
to get the induction variable usage after the loop gone so the
vectorizer will work.

Something like (note this is copy and paste from a terminal):
diff --git a/gcc/passes.def b/gcc/passes.def
index c327900..e8d6ea6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_dce);
  NEXT_PASS (pass_tree_unswitch);
- NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_loop_split);
+ NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_record_bounds);
  NEXT_PASS (pass_loop_distribution);
  NEXT_PASS (pass_copy_prop);
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 5411530..e72ef19 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -592,7 +592,11 @@ tree_ssa_split_loops (void)

   gcc_assert (scev_initialized_p ());
   FOR_EACH_LOOP (loop, 0)
-loop->aux = NULL;
+{
+  loop->aux = NULL;
+  if (loop_outer (loop))
+   loop_outer (loop)->aux = NULL;
+}

   /* Go through all loops starting from innermost.  */
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
@@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
 }

   FOR_EACH_LOOP (loop, 0)
-loop->aux = NULL;
+{
+  loop->aux = NULL;
+  if (loop_outer (loop))
+   loop_outer (loop)->aux = NULL;
+}

   if (changed)
 return TODO_cleanup_cfg;
-  CUT -

Thanks,
Andrew


>
>
> Ciao,
> Michael.
> * common.opt (-fsplit-loops): New flag.
> * passes.def (pass_loop_split): Add.
> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
> (enable_fdo_optimizations): Add loop splitting.
> * timevar.def (TV_LOOP_SPLIT): Add.
> * tree-pass.h (make_pass_loop_split): Declare.
> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> * tree-ssa-loop-split.c: New file.
> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
> * doc/invoke.texi (fsplit-loops): Document.
> * doc/passes.texi (Loop optimization): Add paragraph about loop
> splitting.
>
> testsuite/
> * gcc.dg/loop-split.c: New test.
>
> Index: common.opt
> ===
> --- common.opt  (revision 231115)
> +++ common.opt  (working copy)
> @@ -2453,6 +2457,10 @@ funswitch-loops
>  Common Report Var(flag_unswitch_loops) Optimization
>  Perform loop unswitching.
>
> +fsplit-loops
> +Common Report Var(flag_split_loops) Optimization
> +Perform loop splitting.
> +
>  funwind-tables
>  Common Report Var(flag_unwind_tables) Optimization
>  Just generate unwind tables for exception handling.
> Index: passes.def
> ===
> --- passes.def  (revision 231115)
> +++ passes.def  (working copy)
> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>   NEXT_PASS (pass_dce);
>   NEXT_PASS (pass_tree_unswitch);
>   NEXT_PASS (pass_scev_cprop);
> + NEXT_PASS (pass_loop_split);
>   NEXT_PASS (pass_record_bounds);
>   NEXT_PASS (pass_loop_distribution);
>   NEXT_PASS (pass_copy_prop);
> Index: opts.c
> ===
> --- opts.c  (revision 231115)
> +++ opts.c  (working copy)
> @@ -532,6 +532,7 @@ static const struct default_options defa
> regardless of them being declared inline.  */
>  { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>  { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 
> 1 },
> +{ OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>  { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>  { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>  { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
>  opts->x_flag_ipa_cp_alignment = value;
>if (!opts_set->x_flag_predictive_commoning)
>  opts->x_flag_predictive_commoning = value;
> +  if (!opts_set->x_flag_split_loops)
> +opts->x_flag_split_loops = value;
>if (!opts_set->x_flag_unswitch_loops)
>  opts->x_flag_unswitch_loops = value;
>if (!opts_set->x_flag_gcse_after_reload)
> Index: timevar.def
> =

Re: Gimple loop splitting

2016-07-25 Thread Michael Matz
Hi,

On Sun, 24 Jul 2016, Andrew Pinski wrote:

> What ever happened to this patch?

It got accepted but I deferred inclusion in GCC 6 because it 
was late in the cycle then and performance results didn't show super 
improvements (only looked at cpu2006).  No regressions, but no nice 
speedups either.

> I was looking into doing this myself today but I found this patch. It is 
> stage 1 of GCC 7, it might be a good idea to get this patch into GCC.

Indeed.  If you want to performance test it on something you know where it 
should help, I'm all ears.


Ciao,
Michael.


Re: Gimple loop splitting

2016-07-25 Thread Andrew Pinski
On Thu, Nov 12, 2015 at 8:52 AM, Michael Matz  wrote:
> Hello,
>
> this new pass implements loop iteration space splitting for loops that
> contain a conditional that's always true for one part of the iteration
> space and false for the other, i.e. such situations:
>
>   for (i = beg; i < end; i++)
> if (i < p)
>   dothis();
> else
>   dothat();
>
> this is transformed into roughly:
>
>   for (i = beg; i < p; i++)
> dothis();
>   for (; i < end; i++)
> dothat();
>
> Of course, not quite the above as there needs to be provisions for the
> border conditions, if e.g. 'p' is outside the original iteration space, or
> the conditional doesn't directly use the control IV, but some other, or
> the IV runs backwards.  The testcase checks many of these border
> conditions.
>
> This transformation is in itself a good one but can also be an enabler for
> the vectorizer.  It does increase code size, when the loop body contains
> also unconditional code (that one is duplicated), so we only transform hot
> loops.  I'm a bit unsure of the placement of the new pass, or if it should
> be an own pass at all.  Right now I've placed it after unswitching and
> scev_cprop, before loop distribution.  Ideally I think all three, together
> with loop fusion and an gimple unroller should be integrated into one loop
> nest optimizer, alas, we aren't there yet.
>
> I'm planning to work on loop fusion in the future as well, but that's not
> for GCC 6.
>
> I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> regressions.  I've also checked cpu2006 (the non-fortran part) for
> correctness, not yet for performance.  In the end it should probably only
> be enabled for -O3+ (although if the whole loop body is conditional it
> makes sense to also have it with -O2 because code growth is very small
> then).
>
> So, okay for trunk?

What ever happened to this patch?  I was looking into doing this
myself today but I found this patch.
It is stage 1 of GCC 7, it might be a good idea to get this patch into GCC.

Thanks,
Andrew

>
>
> Ciao,
> Michael.
> * passes.def (pass_loop_split): Add.
> * timevar.def (TV_LOOP_SPLIT): Add.
> * tree-pass.h (make_pass_loop_split): Declare.
> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
> gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
> (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
> split_loop, tree_ssa_split_loops,
> make_pass_loop_split): New functions.
> (pass_data_loop_split): New.
> (pass_loop_split): New.
>
> testsuite/
> * gcc.dg/loop-split.c: New test.
>
> Index: passes.def
> ===
> --- passes.def  (revision 229763)
> +++ passes.def  (working copy)
> @@ -233,6 +233,7 @@ along with GCC; see the file COPYING3.
>   NEXT_PASS (pass_dce);
>   NEXT_PASS (pass_tree_unswitch);
>   NEXT_PASS (pass_scev_cprop);
> + NEXT_PASS (pass_loop_split);
>   NEXT_PASS (pass_record_bounds);
>   NEXT_PASS (pass_loop_distribution);
>   NEXT_PASS (pass_copy_prop);
> Index: timevar.def
> ===
> --- timevar.def (revision 229763)
> +++ timevar.def (working copy)
> @@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM   , "
>  DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
>  DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop")
>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching")
> +DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting")
>  DEFTIMEVAR (TV_COMPLETE_UNROLL   , "complete unrolling")
>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>  DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization")
> Index: tree-pass.h
> ===
> --- tree-pass.h (revision 229763)
> +++ tree-pass.h (working copy)
> @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n
>  extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
> Index: tree-ssa-loop-manip.h
> ===
> --- tree-ssa-loop-manip.h   (revision 229763)
> +++ tree-ssa-loop-manip.h   (working copy)
> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>
>  extern void create_iv (tr

Re: Gimple loop splitting v2

2015-12-04 Thread Jeff Law

On 12/02/2015 06:23 AM, Michael Matz wrote:

Hi,

On Tue, 1 Dec 2015, Jeff Law wrote:


So, okay for trunk?

-ENOPATCH


Sigh :)
Here it is.


Ciao,
Michael.
* common.opt (-fsplit-loops): New flag.
* passes.def (pass_loop_split): Add.
* opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
(enable_fdo_optimizations): Add loop splitting.
* timevar.def (TV_LOOP_SPLIT): Add.
* tree-pass.h (make_pass_loop_split): Declare.
* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
* tree-ssa-loop-split.c: New file.
* Makefile.in (OBJS): Add tree-ssa-loop-split.o.
* doc/invoke.texi (fsplit-loops): Document.
* doc/passes.texi (Loop optimization): Add paragraph about loop
splitting.

testsuite/
* gcc.dg/loop-split.c: New test.

Index: tree-ssa-loop-split.c



+/* Return true when BB inside LOOP is a potential iteration space
+   split point, i.e. ends with a condition like "IV < comp", which
+   is true on one side of the iteration space and false on the other,
+   and the split point can be computed.  If so, also return the border
+   point in *BORDER and the comparison induction variable in IV.  */
+
+static tree
+split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
+{
+  gimple *last;
+  gcond *stmt;
+  affine_iv iv2;
+

 +

+  /* Make it so, that the first argument of the condition is
+ the looping one (only swap.  */
Nit.  I don't think you want a comma after "so".  And it looks like your 
comment got truncated as well.


With the comment above fixed, this is fine for the trunk.

jeff


Re: Gimple loop splitting v2

2015-12-02 Thread Michael Matz
Hi,

On Tue, 1 Dec 2015, Jeff Law wrote:

> > So, okay for trunk?
> -ENOPATCH

Sigh :)
Here it is.


Ciao,
Michael.
* common.opt (-fsplit-loops): New flag.
* passes.def (pass_loop_split): Add.
* opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
(enable_fdo_optimizations): Add loop splitting.
* timevar.def (TV_LOOP_SPLIT): Add.
* tree-pass.h (make_pass_loop_split): Declare.
* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
* tree-ssa-loop-split.c: New file.
* Makefile.in (OBJS): Add tree-ssa-loop-split.o.
* doc/invoke.texi (fsplit-loops): Document.
* doc/passes.texi (Loop optimization): Add paragraph about loop
splitting.

testsuite/
* gcc.dg/loop-split.c: New test.

Index: common.opt
===
--- common.opt  (revision 231115)
+++ common.opt  (working copy)
@@ -2453,6 +2457,10 @@ funswitch-loops
 Common Report Var(flag_unswitch_loops) Optimization
 Perform loop unswitching.
 
+fsplit-loops
+Common Report Var(flag_split_loops) Optimization
+Perform loop splitting.
+
 funwind-tables
 Common Report Var(flag_unwind_tables) Optimization
 Just generate unwind tables for exception handling.
Index: passes.def
===
--- passes.def  (revision 231115)
+++ passes.def  (working copy)
@@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
  NEXT_PASS (pass_dce);
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
+ NEXT_PASS (pass_loop_split);
  NEXT_PASS (pass_record_bounds);
  NEXT_PASS (pass_loop_distribution);
  NEXT_PASS (pass_copy_prop);
Index: opts.c
===
--- opts.c  (revision 231115)
+++ opts.c  (working copy)
@@ -532,6 +532,7 @@ static const struct default_options defa
regardless of them being declared inline.  */
 { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 
},
+{ OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
@@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
 opts->x_flag_ipa_cp_alignment = value;
   if (!opts_set->x_flag_predictive_commoning)
 opts->x_flag_predictive_commoning = value;
+  if (!opts_set->x_flag_split_loops)
+opts->x_flag_split_loops = value;
   if (!opts_set->x_flag_unswitch_loops)
 opts->x_flag_unswitch_loops = value;
   if (!opts_set->x_flag_gcse_after_reload)
Index: timevar.def
===
--- timevar.def (revision 231115)
+++ timevar.def (working copy)
@@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM   , "
 DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching")
+DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting")
 DEFTIMEVAR (TV_COMPLETE_UNROLL   , "complete unrolling")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
 DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization")
Index: tree-pass.h
===
--- tree-pass.h (revision 231115)
+++ tree-pass.h (working copy)
@@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
 extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
Index: tree-ssa-loop-manip.h
===
--- tree-ssa-loop-manip.h   (revision 231115)
+++ tree-ssa-loop-manip.h   (working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
 
 extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
   bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+   struct loop *);
 extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
 extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
 extern void verify_loop_closed_ssa (bool);
Index: Makefile.in
==

Re: Gimple loop splitting v2

2015-12-01 Thread Jeff Law

On 12/01/2015 09:46 AM, Michael Matz wrote:

Hi,

So, okay for trunk?

-ENOPATCH

Jeff



Gimple loop splitting v2

2015-12-01 Thread Michael Matz
Hi,

On Mon, 16 Nov 2015, Jeff Law wrote:

> OK, if you want to keep them, then have a consistent way to turn them 
> on/off for future debugging.  if0/if1 doesn't provide much of a clue to 
> someone else what to turn on/off if they need to debug this stuff.

> > > I don't see any negative tests -- ie tests that should not be split 
> > > due to boundary conditions.  Do you have any from development?
> > 
> > Good point, I had some but only ones where I was able to extend the 
> > splitters to cover them.  I'll think of some that really shouldn't be 
> > split.
> If you've got them, certainly add them.  Though I realize they may get 
> lost over time.

Actually, thinking a bit more about this, I don't have any that wouldn't 
be merely restrictions in the implementation that couldn't be lifted in 
the future (e.g. unequal step sizes), so I've added no additional ones.

> But in that case, the immediate dominator of pre2 & join is still the 
> initial if statement.  So I think we're OK.  That was the conclusion I 
> was starting to come to yesterday, having the ascii art makes it pretty 
> clear.  I'm just not good at conceptualizing a CFG.  I have to see it 
> explicitly and then everything seems so clear and simple.

So, this second version should reflect the review.  I've moved everything 
to a new file, split the long function into several logically separate 
ones, and even included ascii art in the comments :)  The testcase got a 
comment about what to #define for debugging.  I've included the pass to 
-O3 or alternatively if profile-use is on, similar to funswitch-loops.  
I've also added a proper -fsplit-loops option.

There's two functional changes in v2: a bugfix to not try splitting a 
non-iterating loop (irritatingly such a look returns true from 
number_of_iterations_exit, but with an ERROR_MARK comparator), and a 
limitation to avoid combinatorical explosion in artificial testcases: Once 
we have done a splitting, we don't do any in that loops parents (we may 
still do splitting in siblings or childs of siblings).

I've also done some measurements: first, bootstrap time is unaffected, and 
regstrapping succeeds without regressions when I activate the pass by 
default.  Then SPECcpu2006: build times are unaffected, everything builds 
and works also with -fsplit-loops, performance is mostly unaffected, base 
is -Ofast -funroll-loops -fpeel-loops, peak adds -fsplit-loops.

  Estimated   Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
400.perlbench9770325   30.1 *9770323   30.3 *  
401.bzip29650382   25.2 *9650382   25.3 *  
403.gcc  8050242   33.3 *8050241   33.4 *  
429.mcf  9120311   29.3 *9120311   29.3 *  
445.gobmk   10490392   26.8 *   10490391   26.8 *  
456.hmmer9330345   27.0 *9330342   27.3 *  
458.sjeng   12100422   28.7 *   12100420   28.8 *  
462.libquantum  20720308   67.3 *   20720308   67.3 *  
464.h264ref 22130423   52.3 *   22130423   52.3 *  
471.omnetpp  6250273   22.9 *6250273   22.9 *  
473.astar7020311   22.6 *7020311   22.6 *  
483.xalancbmk6900191   36.2 *6900190   36.2 *  
 Est. SPECint_base2006 31.7
 Est. SPECint2006  31.7

  Estimated   Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
410.bwaves  13590235   57.7 *   13590235   57.8 *  
416.gamess  NR  NR 
433.milc 9180347   26.5 *9180345   26.6 *  
434.zeusmp   9100269   33.9 *9100268   33.9 *  
435.gromacs  7140260   27.4 *7140262   27.3 *  
436.cactusADM   11950237   50.5 *   11950240   49.9 *  
437.leslie3d 9400228   41.3 *9400228   41.2 *  
444.namd 8020312   25.7 *8020311   25.7 *  
447.dealII  11440254   45.0 *   11440254   45.0 *  
450.soplex   8340201   41.4 *8340202   41.4 *  
453.povray  NR  NR 
454.calculix 

Re: Gimple loop splitting

2015-11-16 Thread Jeff Law

On 11/16/2015 09:05 AM, Michael Matz wrote:

It's a prerequisite indeed, but not enough in itself.

Sigh.  OK.  One can always hope.





hmmer is actually quite interesting because it's a fairly isolated hot
loop posing quite challenging problems for us :)

Sounds like it.  Essentially, it's a TODO list :-)


Probably ought to be disabled when we're not optimizing for speed as well.


That should be dealt with by '!optimize_loop_for_size_p (loop)'.

Doh, must have missed that.



Please clean up the #if 0/#if 1 code in the new tests.


Actually I'd prefer if that test contains the by-hand code and the TRACE
stuff as well, I'd only change the #if 0 into some #if BYHAND or so ...


You might also want to clean out the TRACE stuff.  Essentially the tests
look like you just dropped in a test you'd been running by hand until
now :-)


... the reason being, that bugs in the splitter are somewhat unwieldy to
debug by just staring at the dumps, you only get a checksum mismatch, so
TRACE=1 is for finding out which of the params and loops is actually
miscompiled, TRACE=2 for finding the specific iteration that's broken, and
the #if0 code for putting that situation into a non-macroized and smaller
function than dotest.  (That's actually how I've run the testcase after I
had it basically working, extending dotest with a couple more lines, aka
example loop sitations, adjusting the checksum, and then making a face and
scratching my head and mucking with the TRACE and #if0 macros :) ).
OK, if you want to keep them, then  have a consistent way to turn them 
on/off for future debugging.  if0/if1 doesn't provide much of a clue to 
someone else what to turn on/off if they need to debug this stuff.





I don't see any negative tests -- ie tests that should not be split due
to boundary conditions.  Do you have any from development?


Good point, I had some but only ones where I was able to extend the
splitters to cover them.  I'll think of some that really shouldn't be
split.
If you've got them, certainly add them.  Though I realize they may get 
lost over time.




Only with such situations:

   for (int i = start; i < end; i++) {
 if (i + offset < bound)
   ...
   }

Here the condition-IV is not directly defined by a PHI node.  If it
happens often I don't know, I guess the usual situation is testing the
control IV directly.  The deficiency is not hard to fix.

I'm comfortable waiting until we see the need.


I think I convinced myself on paper that the dominator tree is correct due
to our helpers doing the right thing (loop_version() for the initial
loop copying and split_edge for the above diddling).  Let's see if I can
paint some ASCII art.  So, after loop_version (which updates dom) we
have:
OK.  I was worried about the next step -- where we insert the 
conditional on the exit from pre1 to have it transfer to join or pre2.


But in that case, the immediate dominator of pre2 & join is still the 
initial if statement.  So I think we're OK.  That was the conclusion I 
was starting to come to yesterday, having the ascii art makes it pretty 
clear.  I'm just not good at conceptualizing a CFG.  I have to see it 
explicitly and then everything seems so clear and simple.


jeff



Re: Gimple loop splitting

2015-11-16 Thread Michael Matz
Hi,

On Thu, 12 Nov 2015, Jeff Law wrote:

> > this new pass implements loop iteration space splitting for loops that
> > contain a conditional that's always true for one part of the iteration
> > space and false for the other, i.e. such situations:
> FWIW, Ajit suggested the same transformation earlier this year.  During that
> discussion Richi indicated that for hmmer this transformation would enable
> vectorization.

It's a prerequisite indeed, but not enough in itself.  The next problem 
will be that only parts of access chains inside the hot loop are 
vectorizable, but for that those parts need to be disambiguated.  ICC is 
doing that by a massive chain of conditionals testing non-overlapping of 
the respective arrays at runtime.  Our vectorizer could also do that 
(possibly by increasing the allowed number of conditionals), but the next 
problem then is that one of these (then indeed separated) parts is not 
vectorizable by our vectorizer: it's a 'a[i] = f(a[i-1])' dependency that 
can't yet be handled by us.  If the separation of parts would be done by 
loop distribution that would be fine (we'd have separate loops for the 
parts, some of them vectorizable), but our loop distribution can't do 
runtime disambiguation, only our vectorizer.

hmmer is actually quite interesting because it's a fairly isolated hot 
loop posing quite challenging problems for us :)

> 
>   It does increase code size, when the loop body contains
> > also unconditional code (that one is duplicated), so we only transform hot
> > loops.
> 
> Probably ought to be disabled when we're not optimizing for speed as well.

That should be dealt with by '!optimize_loop_for_size_p (loop)'.

> > I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> > regressions.  I've also checked cpu2006 (the non-fortran part) for
> > correctness, not yet for performance.  In the end it should probably only
> > be enabled for -O3+ (although if the whole loop body is conditional it
> > makes sense to also have it with -O2 because code growth is very small
> > then).
> 
> Very curious on the performance side, so if you could get some #s on that,
> it'd be greatly appreciated.

My test machine misbehaved over the weekend, but as soon as I have them 
I'll update here.

> > testsuite/
> > * gcc.dg/loop-split.c: New test.
> 
> Please clean up the #if 0/#if 1 code in the new tests.

Actually I'd prefer if that test contains the by-hand code and the TRACE 
stuff as well, I'd only change the #if 0 into some #if BYHAND or so ...

> You might also want to clean out the TRACE stuff.  Essentially the tests 
> look like you just dropped in a test you'd been running by hand until 
> now :-)

... the reason being, that bugs in the splitter are somewhat unwieldy to 
debug by just staring at the dumps, you only get a checksum mismatch, so 
TRACE=1 is for finding out which of the params and loops is actually 
miscompiled, TRACE=2 for finding the specific iteration that's broken, and 
the #if0 code for putting that situation into a non-macroized and smaller 
function than dotest.  (That's actually how I've run the testcase after I 
had it basically working, extending dotest with a couple more lines, aka 
example loop sitations, adjusting the checksum, and then making a face and 
scratching my head and mucking with the TRACE and #if0 macros :) ).

> I don't see any negative tests -- ie tests that should not be split due 
> to boundary conditions.  Do you have any from development?

Good point, I had some but only ones where I was able to extend the 
splitters to cover them.  I'll think of some that really shouldn't be 
split.

> > Index: tree-ssa-loop-unswitch.c
> > ===
> > --- tree-ssa-loop-unswitch.c(revision 229763)
> > +++ tree-ssa-loop-unswitch.c(working copy)
> Given the amount of new code, unless there's a strong need, I'd prefer this
> transformation to be implemented in its own file.

Okay.

> > +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> > +   find the loop phi node in LOOP defining it directly, or create
> > +   such phi node.  Return that phi node.  */
> > +
> > +static gphi *
> > +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv *
> > /*iv*/)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> > +  gphi *phi;
> > +  if ((phi = dyn_cast  (def))
> > +  && gimple_bb (phi) == loop->header)
> > +return phi;
> > +
> > +  /* XXX Create the PHI instead.  */
> > +  return NULL;
> 
> So right now we just punt if we need to create the PHI?  Does that 
> happen with any kind of regularity in practice?

Only with such situations:

  for (int i = start; i < end; i++) {
if (i + offset < bound)
  ...
  }

Here the condition-IV is not directly defined by a PHI node.  If it 
happens often I don't know, I guess the usual situation is testing the 
control IV directly.  The deficiency is not hard to fix.

> > 

Re: Gimple loop splitting

2015-11-12 Thread Jeff Law

On 11/12/2015 09:52 AM, Michael Matz wrote:

Hello,

this new pass implements loop iteration space splitting for loops that
contain a conditional that's always true for one part of the iteration
space and false for the other, i.e. such situations:
FWIW, Ajit suggested the same transformation earlier this year.  During 
that discussion Richi indicated that for hmmer this transformation would 
enable vectorization.




This transformation is in itself a good one but can also be an enabler for
the vectorizer.

Agreed.


  It does increase code size, when the loop body contains

also unconditional code (that one is duplicated), so we only transform hot
loops.

Probably ought to be disabled when we're not optimizing for speed as well.




  I'm a bit unsure of the placement of the new pass, or if it should

be an own pass at all.  Right now I've placed it after unswitching and
scev_cprop, before loop distribution.  Ideally I think all three, together
with loop fusion and an gimple unroller should be integrated into one loop
nest optimizer, alas, we aren't there yet.
Given its impact on the looping structure, I'd think early in the loop 
optimizer.  Given the similarities with unswitching, I think 
before/after unswitching is a natural first cut.  We can always iterate 
if it looks like putting it elsewhere would make sense.





I've regstrapped this pass enabled with -O2 on x86-64-linux, without
regressions.  I've also checked cpu2006 (the non-fortran part) for
correctness, not yet for performance.  In the end it should probably only
be enabled for -O3+ (although if the whole loop body is conditional it
makes sense to also have it with -O2 because code growth is very small
then).
Very curious on the performance side, so if you could get some #s on 
that, it'd be greatly appreciated.


I'd be comfortable with this at -O2, but won't object if you'd prefer -O3.




So, okay for trunk?


Ciao,
Michael.
* passes.def (pass_loop_split): Add.
* timevar.def (TV_LOOP_SPLIT): Add.
* tree-pass.h (make_pass_loop_split): Declare.
* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
(split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
split_loop, tree_ssa_split_loops,
make_pass_loop_split): New functions.
(pass_data_loop_split): New.
(pass_loop_split): New.

testsuite/
* gcc.dg/loop-split.c: New test.
Please clean up the #if 0/#if 1 code in the new tests.  You might also 
want to clean out the TRACE stuff.  Essentially the tests look like you 
just dropped in a test you'd been running by hand until now :-)


I don't see any negative tests -- ie tests that should not be split due 
to boundary conditions.  Do you have any from development?  If so it'd 
be good to have those too.




Index: tree-ssa-loop-manip.h
===
--- tree-ssa-loop-manip.h   (revision 229763)
+++ tree-ssa-loop-manip.h   (working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc

  extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator 
*,
   bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+   struct loop *);
  extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
  extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
  extern void verify_loop_closed_ssa (bool);
Index: tree-ssa-loop-unswitch.c
===
--- tree-ssa-loop-unswitch.c(revision 229763)
+++ tree-ssa-loop-unswitch.c(working copy)
Given the amount of new code, unless there's a strong need, I'd prefer 
this transformation to be implemented in its own file.





+
+/* Give an induction variable GUARD_IV, and its affine descriptor IV,
+   find the loop phi node in LOOP defining it directly, or create
+   such phi node.  Return that phi node.  */
+
+static gphi *
+find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
+{
+  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
+  gphi *phi;
+  if ((phi = dyn_cast  (def))
+  && gimple_bb (phi) == loop->header)
+return phi;
+
+  /* XXX Create the PHI instead.  */
+  return NULL;
So right now we just punt if we need to create the PHI?  Does that 
happen with any kind of regularity in practice?




+}
+
+/* Checks if LOOP contains an conditional block whose condition
+   depends on which side in the iteration space it is, and if so
+   splits the iteration space into two loops.  Returns true if the
+   loop was split.  NITER must contain the iteration descriptor for the
+   single exit of LOOP.  */
+
+static bool
+split_loop (struct loop *loop, s

Gimple loop splitting

2015-11-12 Thread Michael Matz
Hello,

this new pass implements loop iteration space splitting for loops that 
contain a conditional that's always true for one part of the iteration 
space and false for the other, i.e. such situations:

  for (i = beg; i < end; i++)
if (i < p)
  dothis();
else
  dothat();

this is transformed into roughly:

  for (i = beg; i < p; i++)
dothis();
  for (; i < end; i++)
dothat();

Of course, not quite the above as there needs to be provisions for the 
border conditions, if e.g. 'p' is outside the original iteration space, or 
the conditional doesn't directly use the control IV, but some other, or 
the IV runs backwards.  The testcase checks many of these border 
conditions.

This transformation is in itself a good one but can also be an enabler for 
the vectorizer.  It does increase code size, when the loop body contains 
also unconditional code (that one is duplicated), so we only transform hot 
loops.  I'm a bit unsure of the placement of the new pass, or if it should 
be an own pass at all.  Right now I've placed it after unswitching and 
scev_cprop, before loop distribution.  Ideally I think all three, together 
with loop fusion and an gimple unroller should be integrated into one loop 
nest optimizer, alas, we aren't there yet.

I'm planning to work on loop fusion in the future as well, but that's not 
for GCC 6.

I've regstrapped this pass enabled with -O2 on x86-64-linux, without 
regressions.  I've also checked cpu2006 (the non-fortran part) for 
correctness, not yet for performance.  In the end it should probably only 
be enabled for -O3+ (although if the whole loop body is conditional it 
makes sense to also have it with -O2 because code growth is very small 
then).

So, okay for trunk?


Ciao,
Michael.
* passes.def (pass_loop_split): Add.
* timevar.def (TV_LOOP_SPLIT): Add.
* tree-pass.h (make_pass_loop_split): Declare.
* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
(split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
split_loop, tree_ssa_split_loops,
make_pass_loop_split): New functions.
(pass_data_loop_split): New.
(pass_loop_split): New.

testsuite/
* gcc.dg/loop-split.c: New test.

Index: passes.def
===
--- passes.def  (revision 229763)
+++ passes.def  (working copy)
@@ -233,6 +233,7 @@ along with GCC; see the file COPYING3.
  NEXT_PASS (pass_dce);
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
+ NEXT_PASS (pass_loop_split);
  NEXT_PASS (pass_record_bounds);
  NEXT_PASS (pass_loop_distribution);
  NEXT_PASS (pass_copy_prop);
Index: timevar.def
===
--- timevar.def (revision 229763)
+++ timevar.def (working copy)
@@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM   , "
 DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching")
+DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting")
 DEFTIMEVAR (TV_COMPLETE_UNROLL   , "complete unrolling")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
 DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization")
Index: tree-pass.h
===
--- tree-pass.h (revision 229763)
+++ tree-pass.h (working copy)
@@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n
 extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
Index: tree-ssa-loop-manip.h
===
--- tree-ssa-loop-manip.h   (revision 229763)
+++ tree-ssa-loop-manip.h   (working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
 
 extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
   bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+   struct loop *);
 extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
 extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
 extern void verify_loop_closed_ssa (bool);
Index: tree-ssa-loop-unswitch