[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression

2014-12-11 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

--- Comment #65 from rguenther at suse dot de rguenther at suse dot de ---
On Wed, 10 Dec 2014, burnus at gcc dot gnu.org wrote:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
 
 --- Comment #64 from Tobias Burnus burnus at gcc dot gnu.org ---
 (In reply to Richard Biener from comment #63)
  Unfortunately for the testcase this doesn't allow moving the division at all
  and we are lucky that we have range information at all because of the 
  fortran
  frontend casting 'n' to unsigned before dividing by it.
 
 If it helps and the semantic is preserved, there is no reason not to 
 completely change what tree code the Fortran FE generates for loops.

The attached patch helps, but only sofar as enabling moving the division
out one loop level - if you add another nest the entry condition on it
will prevent moving the division further.

 [I think one reason for the odd way tree code for loops is generated is: The
 current code makes it simple to permit loops which are always run once, as 
 some
 Fortran 66 compilers did. For instance, DO i = 2, 1 would then be executed
 once. (Such loops are not permitted in F66 - and some compilers executed them
 once others zero times; since F77, such loops are permitted and executed zero
 times. Unsurprisingly, some old code from the 60s relies on the execute once
 feature.)
 
 g77 and some commercial compilers have a compile flag like -f66, gfortran
 hasn't and I don't think it ever will.]

I wondered if

  DO i = 1, 1, 0

is valid (a step of zero).  Note that DO i = 2, 1 wouldn't be executed
once with the current generated code, only DO i = 1, 1 is, but with
step == 0 it would divide by zero.


[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression

2014-12-11 Thread sfilippone at uniroma2 dot it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

--- Comment #66 from Salvatore Filippone sfilippone at uniroma2 dot it ---
As far as I remember(In reply to rguent...@suse.de from comment #65)
 On Wed, 10 Dec 2014, burnus at gcc dot gnu.org wrote:
 
  Fortran 66 compilers did. For instance, DO i = 2, 1 would then be executed
  once. (Such loops are not permitted in F66 - and some compilers executed 
  them
  once others zero times; since F77, such loops are permitted and executed 
  zero
  times. Unsurprisingly, some old code from the 60s relies on the execute once
  feature.)
  
  g77 and some commercial compilers have a compile flag like -f66, gfortran
  hasn't and I don't think it ever will.]
 
 I wondered if
 
   DO i = 1, 1, 0
 
 is valid (a step of zero).  Note that DO i = 2, 1 wouldn't be executed
 once with the current generated code, only DO i = 1, 1 is, but with
 step == 0 it would divide by zero.

Step 0 is not allowed (see e.g. F2003 Handbook, sect. 8.7.2.1.1). 
Salvatore


[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression

2014-12-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

--- Comment #61 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Richard Biener from comment #60)
 Ok, so I have a patch that teaches LIM to move the division by using the
 value-range information we now store.

Which can't be done this way ... (the value-range is only valid in the place
it was computed, not where LIM may move it to)


[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression

2014-12-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

--- Comment #62 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 34238
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34238action=edit
frontend patch

So the issue is that the division is executed conditionally because it is
placed
after the loop entry check.  The frontend generates

  if (step  0)
{
 if (to  from)
   goto exit_label;
 countm1 = (to - from) / step;
}
  else
{
 if (to  from)
   goto exit_label;
 countm1 = (from - to) / -step;
}

which ends up optimized to (as step is known to be positive here):

if (to  from)
 {
   countm1 = (to - from) / step;
   ...loop...

if we instead generate

  if (step  0)
{
 countm1 = (to - from) / step;
 if (to  from)
   goto exit_label;
}
  else
{
 countm1 = (from - to) / -step;
 if (to  from)
   goto exit_label;
}

the division can be hoisted out of the outer loop (at the cost of computing
countm1 even when the loop is not entered -- but that would have happened
if the desired transform happened anyway).

Note that if step  0 cannot be optimized we still have the same issue,
so in theory we could generate

   if (step  0)
 {
   tem1 = (to - from);
   tem2 = step;
 }
   else
 {
   tem1 = (from - to);
   tem2 = -step;
 }
   countm1 = tem1 / tem2;
   if (step  0)
{
  if (to  from)
goto exit_label;
}
   else
{
  if (to  from)
goto exit_label;
}

but hoisting the division with conditional computed operands is a lot more
expensive (but possible for exactly two ops, and likely done).  The
runtime overhead from the above is one extra branch.

If we do that then unfortunately jump threading undoes that change creating
the first variant again.


[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression

2014-12-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

--- Comment #63 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 34239
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34239action=edit
LIM patch

This is a patch to loop invariant motion I was playing with to use range
information to move a stmt up to a place where we can determine the range
information is still correct.  Unfortunately for the testcase this doesn't
allow moving the division at all and we are lucky that we have range
information
at all because of the fortran frontend casting 'n' to unsigned before dividing
by it.

Another possibility is to (finally) teach LIM to hoist code which conditional
execution status needs to be preserved by guarding the hoisted code with that
conditional code.  Thus hoist a / b as either b != 0 ? a / b : 0(?) or
as execution condition ? a / b : 0.  Where for this case the condition
of execution is probably more complex than testing the denominator for zero
(thus testing the condition under which we have to preserve execution).
Followup optimizations then eventually can get rid of that condition.


[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression

2014-12-10 Thread burnus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

--- Comment #64 from Tobias Burnus burnus at gcc dot gnu.org ---
(In reply to Richard Biener from comment #63)
 Unfortunately for the testcase this doesn't allow moving the division at all
 and we are lucky that we have range information at all because of the fortran
 frontend casting 'n' to unsigned before dividing by it.

If it helps and the semantic is preserved, there is no reason not to completely
change what tree code the Fortran FE generates for loops.

[I think one reason for the odd way tree code for loops is generated is: The
current code makes it simple to permit loops which are always run once, as some
Fortran 66 compilers did. For instance, DO i = 2, 1 would then be executed
once. (Such loops are not permitted in F66 - and some compilers executed them
once others zero times; since F77, such loops are permitted and executed zero
times. Unsurprisingly, some old code from the 60s relies on the execute once
feature.)

g77 and some commercial compilers have a compile flag like -f66, gfortran
hasn't and I don't think it ever will.]


[Bug tree-optimization/42108] [4.8/4.9/5 Regression] 50% performance regression

2014-12-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #60 from Richard Biener rguenth at gcc dot gnu.org ---
Ok, so I have a patch that teaches LIM to move the division by using the
value-range information we now store.