[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-05 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #37 from rguenther at suse dot de rguenther at suse dot de 
2011-12-05 08:18:00 UTC ---
On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904
 
 --- Comment #34 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
 17:06:57 UTC ---

[...]

 * * *
 
 Back to the comment 0 issue: I still do not quite understand what the double
 evaluation (on tree level) of __builtin_pow in
   D.1959_82 = ((D.2115_81));
   D.1960_83 = __builtin_pow (D.1959_82, 2.0e+0);
   D.1978_168 = __builtin_pow (D.2115_81, 2.0e+0);
 has to do with the -Ofast slow down. If I have understood it correctly, on 
 tree
 level, there is no reason for it while the slow-down happens on RTL level.

Indeed I can find no other difference on the tree level (thus, no
invariant motion missed optimization that isn't present with both
-f[no-]protect-parens).

 That -fprotect-parens makes it faster is a mere coincidence. Is that a 
 correct rough
 summary?

Yes.

Thus, I think if at the RTL level we see a missed invariant motion then
this is a RTL level bug (esp. if it only triggers with 
-fno-protect-parens).

Richard.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-05 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #38 from rguenther at suse dot de rguenther at suse dot de 
2011-12-05 08:27:08 UTC ---
On Fri, 2 Dec 2011, ebotcazou at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904
 
 --- Comment #35 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-12-02 
 21:21:15 UTC ---
  One thing I notice (and that's the only difference I can spot at the tree
  level) is that we do not CSE the **2s of
 
 There are many missed hoisting opportunities, with or without the switch. 
 There are just a few more with the switch, hence the performance regression.

Most of them (not sure if you mean those) are because they are
considered cheap by LIM and thus are not moved:

vect_px2gauss.123_641 = x2gauss;
  invariant up to level 1, cost 1.

vect_cst_.126_659 = { 1.0e+0, 1.0e+0 };
  invariant up to level 1, cost 1.

vect_cst_.128_661 = {D.2126_109, D.2126_109};
  invariant up to level 1, cost 1.
...

vect_px2gauss.120_20 = vect_px2gauss.123_641 + 16;
  invariant up to level 1, cost 2.
...

ivtmp.176_899 = 1;
  invariant up to level 1, cost 1.
...

vect_px2gauss.120_649 = vect_px2gauss.120_336;
  invariant up to level 1, cost 5.
...

ISTR discussing to remove all cost considerations for tree
level loop invariant motion and simply move everything possible
(PRE for example doesn't consider any costs and moves all
invariants).

If you use --param lim-expensive=1 you get all invariants moved
on the tree level - does that solve the slowdown issue?
The issue is of course that this might increase register pressure
as we are not good in re-materializing for example constants
inside a loop.

I'll give --param lim-expensive=1 a try on SPEC 2k6


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-05 Thread ebotcazou at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #39 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-12-05 
09:21:15 UTC ---
 Thus, I think if at the RTL level we see a missed invariant motion then
 this is a RTL level bug (esp. if it only triggers with  -fno-protect-parens).

Well, how can the RTL level invent load hoisting opportunities?  They are of
course already present at the Tree level, see the .optimized dump:

vect_var_.124_350 = MEM[(real(kind=8)[9] *)x2gauss];

vect_var_.133_823 = MEM[(real(kind=8)[9] *)y2gauss];

vect_var_.157_586 = MEM[(real(kind=8)[9] *)w2gauss];

vect_var_.124_357 = MEM[(real(kind=8)[9] *)x2gauss + 16B];

vect_var_.133_363 = MEM[(real(kind=8)[9] *)y2gauss + 16B];

vect_var_.157_874 = MEM[(real(kind=8)[9] *)w2gauss + 16B];

vect_var_.124_405 = MEM[(real(kind=8)[9] *)x2gauss + 32B];

vect_var_.133_594 = MEM[(real(kind=8)[9] *)y2gauss + 32B];

vect_var_.157_610 = MEM[(real(kind=8)[9] *)w2gauss + 32B];

vect_var_.124_651 = MEM[(real(kind=8)[9] *)x2gauss + 48B];

vect_var_.133_680 = MEM[(real(kind=8)[9] *)y2gauss + 48B];

vect_var_.157_805 = MEM[(real(kind=8)[9] *)w2gauss + 48B];


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-05 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #40 from rguenther at suse dot de rguenther at suse dot de 
2011-12-05 09:55:47 UTC ---
On Mon, 5 Dec 2011, ebotcazou at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904
 
 --- Comment #39 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-12-05 
 09:21:15 UTC ---
  Thus, I think if at the RTL level we see a missed invariant motion then
  this is a RTL level bug (esp. if it only triggers with  
  -fno-protect-parens).
 
 Well, how can the RTL level invent load hoisting opportunities?  They are of
 course already present at the Tree level, see the .optimized dump:
 
 vect_var_.124_350 = MEM[(real(kind=8)[9] *)x2gauss];

They are considered dependent because they are still decomposed as

  vect_px2gauss.123_680 = x2gauss;
...
  vect_var_.124_350 = MEM[(real(kind=8)[9] *)vect_px2gauss.123_680];

during LIM3.  Let me check why we don't fix that up in LIM dependence
checking.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-05 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #41 from Dominique d'Humieres dominiq at lps dot ens.fr 
2011-12-05 10:12:39 UTC ---
Using --param lim-expensive=1 when compiling induct.f90 does not change the
timing, as for today (r181994):

gfc -Ofast induct.f90  
  - 14.62s
gfc -Ofast induct.f90 --param lim-expensive=1   -
14.61s
gfc -fprotect-parens -Ofast induct.f90 
- 14.11s
gfc -fprotect-parens -Ofast induct.f90 --param lim-expensive=1- 14.12s

(a ~0.15s improvement over the timing in comment #1).


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-05 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #42 from Richard Guenther rguenth at gcc dot gnu.org 2011-12-05 
10:19:11 UTC ---
Argh.  It seems LIM didn't get proper lifting both at tuplification and
alias-improvements time.  So it's memory handling (everything it does
with VOPs) is a little very much conservative (read: it doesn't really work).

I'll look into this.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-03 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #36 from Dominique d'Humieres dominiq at lps dot ens.fr 
2011-12-03 14:54:40 UTC ---
 Kind of, though, -ffast-math by itself already is on the verge of violating 
 the
 standard. 

I disagree with this statement at least for codes that does not use IEEE
intrinsic modules (not yet implemented in gfortran).  Indeed the interaction
between -ffast-math and IEEE intrinsic modules will have to be discussed and
documented when these modules will be implemented (see pr50724 for the kind of
problems).

 I think -fno-protect-parens could be enabled by -ffast-math as that
 means that one does not really care about the exact value.

PLEEEASE DON'T. The situation is bad enough with -Ofast to not make it worse:
with -fno-protect-parens gfortran no longer complies with the Fortran standard
(7.1.5.2.4 quoted in comment #23) and IMO SHOULD NOT be part of any compound
option.

Note that if I am using -ffast-math, it is not because I do not really care
about the exact value. It is mostly because I KNOW that exceptions will be the
signature of either a bug in my code and/or a bad choice of the parameters
leading to numerical instabilities. In top of that, I think that the concept of
exact value for floating-point numbers is ill-posed and as a consequence I do
accept that the least significant digits may depend on the way I write the code
or it is optimized (small fluctuations for well-posed methods, large ones
otherwise).

 If we want to add add extra protection for
  tem = x + 2.d0**52
  x = tem - 2.d0**52
 we probably need to add yet another flag as there are surely users, which want
 to have protected parentheses but allow for optimizations in the 'tmp' case.

If I need an extra protection, I'll put parentheses and I don't need yet
another flag. However, since the logic of the optimization is surprising the
first time you hit it, it could (should) be documented.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #20 from rguenther at suse dot de rguenther at suse dot de 
2011-12-02 09:49:39 UTC ---
On Thu, 1 Dec 2011, ebotcazou at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904
 
 Eric Botcazou ebotcazou at gcc dot gnu.org changed:
 
What|Removed |Added
 
  Status|NEW |ASSIGNED
  CC|ebotcazou at gcc dot|
|gnu.org |
   Component|tree-optimization   |rtl-optimization
  AssignedTo|unassigned at gcc dot   |ebotcazou at gcc dot
|gnu.org |gnu.org
 
 --- Comment #19 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-12-01 
 19:53:15 UTC ---
  lim3 was added as a hack, now yes, cunroll needs ccp after it (but it's
  there in the form of DOM and VRP).  It's a pass ordering issue that we
  cannot ever solve.
 
 OK, but that doesn't explain why LIM isn't able to hoist the loads...

If the expressions only become invariant after unrolling then the issue
is that without CCP LIM does not see they are invariant I suppose.
I'll have a closer look.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread ebotcazou at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #21 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-12-02 
10:54:45 UTC ---
 If the expressions only become invariant after unrolling then the issue
 is that without CCP LIM does not see they are invariant I suppose.

No, adding a CCP pass doesn't help (at least immediately).

 I'll have a closer look.

Thanks in advance.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #22 from Richard Guenther rguenth at gcc dot gnu.org 2011-12-02 
11:50:39 UTC ---
One thing I notice (and that's the only difference I can spot at the tree
level) is that we do not CSE the **2s of


  a = sqrt((rect_inductor%v2%x - rect_inductor%v4%x)**2 +
(rect_inductor%v2%y - 
   rect_inductor%v4%y)**2 + (rect_inductor%v2%z -
rect_inductor%v4%z)**2)

and

  xxvec = rect_inductor%v2%x - rect_inductor%v4%x
  xyvec = rect_inductor%v2%y - rect_inductor%v4%y
  xzvec = rect_inductor%v2%z - rect_inductor%v4%z
  magnitude = sqrt(xxvec**2 + xyvec**2 + xzvec**2)

because while the former has PAREN_EXPRs the latter does not and we do not
consider

bb 8:
  D.2113_79 = rect_inductor_78(D)-v2.x;
  D.2114_80 = rect_inductor_78(D)-v4.x;
  D.2115_81 = D.2113_79 - D.2114_80;
  D.1959_82 = ((D.2115_81));
  D.1960_83 = __builtin_pow (D.1959_82, 2.0e+0);
...
  D.1978_168 = __builtin_pow (D.2115_81, 2.0e+0);

D.1960_83 and D.1978_168 as equivalent (they are, value-wise, but we cannot
easily replace one with the other using our current value-numbering
machinery).  We could clevery see that ((x))**2 is equal to ((x**2))
but that would not help for seeing the CSE opportunity of the following
sum and sqrt either.

I wonder if Fortran, with -fprotect-parens, really has different
semantics for

 tem = 2 * a;
 c = b / tem;

vs.

 c = b / (2 * a);

?  Thus, is not every statement supposed to be wrapped in parens with
-fprotect-parens?  So that

 tem = 2 * a;

becomes

 tem = ( 2 * a );

implicitely?  I see that placing ()s at the toplevel of the relevant
stmts in the source has the desired effect of enabling CSE and
the tree level optimization differences vanish.

Thus, this is a question of 1) correctness of the -fprotect-parens
implementation in the frontend, 2) a question on what optimizations
we want to perform on protected expressions.

Relevant transform is CSE

 sqrt (x*x + y*y + z*z)

and

 sqrt (((x))*((x)) + ((y))*((y)) + ((z))*((z)))


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

Tobias Burnus burnus at gcc dot gnu.org changed:

   What|Removed |Added

 CC||burnus at gcc dot gnu.org,
   ||kargl at gcc dot gnu.org,
   ||pault at gcc dot gnu.org

--- Comment #23 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
14:03:41 UTC ---
(In reply to comment #22)
 I wonder if Fortran [...]

Well, let's start with the Fortran standard (Fortran 2008):

7.1.5.2.4 Evaluation of numeric intrinsic operations

The execution of any numeric operation whose result is not defined by the
arithmetic used by the processor is prohibited. Raising a negative-valued
primary of type real to a real power is prohibited.

Once the interpretation of a numeric intrinsic operation is established, the
processor may evaluate any mathematically equivalent expression, provided that
the integrity of parentheses is not violated.

Two expressions of a numeric type are mathematically equivalent if, for all
possible values of their primaries, their mathematical values are equal.
However, mathematically equivalent expressions of numeric type may produce
different computational results.

[The section then contains a few non-normative notes; cf.
http://gcc.gnu.org/wiki/GFortranStandards#Fortran_2008 ]

And for the assignment:

7.2.1 Assignment statement [...]
R732  assignment-stmt  is  variable = expr
[...]
Execution of an intrinsic assignment causes, in effect, the evaluation of the
expression expr and all expressions within variable (7.1), the possible
conversion of expr to the type and type parameters of the variable (Table 7.9),
and the definition of the variable with the resulting value. The execution of
the assignment shall have the same effect as if the evaluation of expr and the
evaluation of all expressions in variable occurred before any portion of the
variable is defined by the assignment. The evaluation of expressions within
variable shall neither affect nor be affected by the evaluation of expr.


 with -fprotect-parens, really has different semantics for
  tem = 2 * a;
  c = b / tem;
 vs.
  c = b / (2 * a);
 ?

 Thus, is not every statement supposed to be wrapped in parens with
 -fprotect-parens?  So that
  tem = 2 * a;
 becomes
  tem = ( 2 * a );
 implicitely?
[...]
 Thus, this is a question of 1) correctness of the -fprotect-parens
 implementation in the frontend, 2) a question on what optimizations
 we want to perform on protected expressions.

It somehow looks as if one needs to add implicitly parentheses; this gets more
complicated, if one takes the scalarizer or inlining into account.

Contrary to the explicit parentheses, I am not aware of a program which breaks
with the extra temporary, but that's does not tell much. (Side note: I think
the majority of users doesn't care [or know] about the protection of either
parentheses or the separate assignment statements - and is happy as long the
result is mathematical the same. Though, some users do care as with unprotected
parentheses their program breaks.)


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #24 from rguenther at suse dot de rguenther at suse dot de 
2011-12-02 14:31:27 UTC ---
On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904
 
 Tobias Burnus burnus at gcc dot gnu.org changed:
 
What|Removed |Added
 
  CC||burnus at gcc dot gnu.org,
||kargl at gcc dot gnu.org,
||pault at gcc dot gnu.org
 
 --- Comment #23 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
 14:03:41 UTC ---
 (In reply to comment #22)
  I wonder if Fortran [...]
 
 Well, let's start with the Fortran standard (Fortran 2008):
 
 7.1.5.2.4 Evaluation of numeric intrinsic operations
 
 The execution of any numeric operation whose result is not defined by the
 arithmetic used by the processor is prohibited. Raising a negative-valued
 primary of type real to a real power is prohibited.
 
 Once the interpretation of a numeric intrinsic operation is established, the
 processor may evaluate any mathematically equivalent expression, provided that
 the integrity of parentheses is not violated.
 
 Two expressions of a numeric type are mathematically equivalent if, for all
 possible values of their primaries, their mathematical values are equal.
 However, mathematically equivalent expressions of numeric type may produce
 different computational results.
 
 [The section then contains a few non-normative notes; cf.
 http://gcc.gnu.org/wiki/GFortranStandards#Fortran_2008 ]
 
 And for the assignment:
 
 7.2.1 Assignment statement [...]
 R732  assignment-stmt  is  variable = expr
 [...]
 Execution of an intrinsic assignment causes, in effect, the evaluation of the
 expression expr and all expressions within variable (7.1), the possible
 conversion of expr to the type and type parameters of the variable (Table 
 7.9),
 and the definition of the variable with the resulting value. The execution of
 the assignment shall have the same effect as if the evaluation of expr and the
 evaluation of all expressions in variable occurred before any portion of the
 variable is defined by the assignment. The evaluation of expressions within
 variable shall neither affect nor be affected by the evaluation of expr.
 
 
  with -fprotect-parens, really has different semantics for
   tem = 2 * a;
   c = b / tem;
  vs.
   c = b / (2 * a);
  ?
 
  Thus, is not every statement supposed to be wrapped in parens with
  -fprotect-parens?  So that
   tem = 2 * a;
  becomes
   tem = ( 2 * a );
  implicitely?
 [...]
  Thus, this is a question of 1) correctness of the -fprotect-parens
  implementation in the frontend, 2) a question on what optimizations
  we want to perform on protected expressions.
 
 It somehow looks as if one needs to add implicitly parentheses; this gets more
 complicated, if one takes the scalarizer or inlining into account.
 
 Contrary to the explicit parentheses, I am not aware of a program which breaks
 with the extra temporary, but that's does not tell much. (Side note: I think
 the majority of users doesn't care [or know] about the protection of either
 parentheses or the separate assignment statements - and is happy as long the
 result is mathematical the same. Though, some users do care as with 
 unprotected
 parentheses their program breaks.)

Every program that would break with non honoring explicit parantheses
would also break if the bracketed expression would be explicitely
computed into a temporary (without explicit parantheses).  So it
should be easy to construct a testcase if you have one that breaks
without -fno-protect-parens.

Richard.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

Tobias Burnus burnus at gcc dot gnu.org changed:

   What|Removed |Added

 CC|bur...@net-b.de |dominiq at lps dot ens.fr

--- Comment #25 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
14:40:48 UTC ---
(In reply to comment #24)
 Every program that would break with non honoring explicit parantheses
 would also break if the bracketed expression would be explicitely
 computed into a temporary (without explicit parantheses).  So it
 should be easy to construct a testcase if you have one that breaks
 without -fno-protect-parens.

I vaguely recall that one of the Polyhedron benchmarks gets minutely out of the
correctness-check tolerance range with -fno-protect-parens while it stays
within without. I think Dominique has a program where the effect is more
disastrous.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #26 from rguenther at suse dot de rguenther at suse dot de 
2011-12-02 15:02:27 UTC ---
On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904
 
 Tobias Burnus burnus at gcc dot gnu.org changed:
 
What|Removed |Added
 
  CC|bur...@net-b.de |dominiq at lps dot ens.fr
 
 --- Comment #25 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
 14:40:48 UTC ---
 (In reply to comment #24)
  Every program that would break with non honoring explicit parantheses
  would also break if the bracketed expression would be explicitely
  computed into a temporary (without explicit parantheses).  So it
  should be easy to construct a testcase if you have one that breaks
  without -fno-protect-parens.
 
 I vaguely recall that one of the Polyhedron benchmarks gets minutely out of 
 the
 correctness-check tolerance range with -fno-protect-parens while it stays
 within without. I think Dominique has a program where the effect is more
 disastrous.

The trivial example is (x + 2**52) - 2**52 which rounds x to
an integer.  Without parens we optimize away that rounding effect.
Thus,

  real*8 x, tem
  x = 1.3d
  tem = x + 2.d**52
  x = tem - 2.d**52
  if (x.ne.1.0d)
call abort

should not fail (minus my fortran coding errors ;)) with
-fprotect-parens


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #27 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
16:02:45 UTC ---
(In reply to comment #26)
 The trivial example is (x + 2**52) - 2**52 which rounds x to
 an integer.  Without parens we optimize away that rounding effect.

Corrected example. The result I get with other compilers matches the current
behaviour of GCC/gfortran:

- GCC: Gives (of course independent of -fno-protect-parens): 1.3 with -O1
-ffast-math, 1.0 without -ffast-math.

- Intel ifort 12.2: -O1 has 1.0, -O2 has 1.3, -assume protect_parens does not
help but -fp-model strict does (with -O2: 1.0).

- PGI pgf95 11.5-0: 1.0 with up to -O4.

- Crayftn 7.1.4.111: 1.0 for -O0, 1.3 for -O1. Option -O fp0 gives 1.0 while
already -O fp1 gives 1.3.

- PathScale pathf95 3.2.99: 1.0 for up to -O3, -Ofast prints 1.3. As with GCC,
-OPT:fast_math={on,off} toggles between 1.0 and 1.3

- NAG f95: 1.0 for up to -O4, 1.3 with -Ounsafe.

- Sun Fortran 95 8.3: 1.0 for -O4, 1.3 for -fast.

program test
  implicit none
  real(8), volatile :: y
  y = 1.3d0
  call sub(y)
  print *, y
! if (y /= 1.0d0) 
!   call abort
contains
  subroutine sub(x)
real*8 x, tem
tem = x + 2.d0**52
x = tem - 2.d0**52
  end subroutine sub
end program test


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread howarth at nitro dot med.uc.edu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

Jack Howarth howarth at nitro dot med.uc.edu changed:

   What|Removed |Added

 CC||howarth at nitro dot
   ||med.uc.edu

--- Comment #28 from Jack Howarth howarth at nitro dot med.uc.edu 2011-12-02 
16:10:59 UTC ---
The failing polyhedron 2005 benchmark is linpk which can be seen with -Ofast on
x86_64-apple-darwin11...

 Value= 25.114499300 Target= 23.1 Tolerance= 2.00
FAIL 
 Value=0.27880142600E-10 Target=0.27858826400E-10 Tolerance=0.100E-09
 Value=0.22204460500E-15 Target=0.22204460500E-15 Tolerance=0.100E-14
 Value= 1.00 Target= 1.00 Tolerance=0.100E-07
 Value= 1.00 Target= 1.00 Tolerance=0.100E-07

linpk FAILED1 fails and4 passes


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #29 from rguenther at suse dot de rguenther at suse dot de 
2011-12-02 16:13:25 UTC ---
On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904
 
 --- Comment #27 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
 16:02:45 UTC ---
 (In reply to comment #26)
  The trivial example is (x + 2**52) - 2**52 which rounds x to
  an integer.  Without parens we optimize away that rounding effect.
 
 Corrected example. The result I get with other compilers matches the current
 behaviour of GCC/gfortran:
 
 - GCC: Gives (of course independent of -fno-protect-parens): 1.3 with -O1
 -ffast-math, 1.0 without -ffast-math.

Indeed GCC does not perform FP association without some sub-flags
enabled by -ffast-math (it assumes then intermediate rounding is
to be preserved).

 - Intel ifort 12.2: -O1 has 1.0, -O2 has 1.3, -assume protect_parens does not
 help but -fp-model strict does (with -O2: 1.0).
 
 - PGI pgf95 11.5-0: 1.0 with up to -O4.
 
 - Crayftn 7.1.4.111: 1.0 for -O0, 1.3 for -O1. Option -O fp0 gives 1.0 while
 already -O fp1 gives 1.3.
 
 - PathScale pathf95 3.2.99: 1.0 for up to -O3, -Ofast prints 1.3. As with GCC,
 -OPT:fast_math={on,off} toggles between 1.0 and 1.3
 
 - NAG f95: 1.0 for up to -O4, 1.3 with -Ounsafe.
 
 - Sun Fortran 95 8.3: 1.0 for -O4, 1.3 for -fast.
 
 program test
   implicit none
   real(8), volatile :: y
   y = 1.3d0
   call sub(y)
   print *, y
 ! if (y /= 1.0d0) 
 !   call abort
 contains
   subroutine sub(x)
 real*8 x, tem
 tem = x + 2.d0**52
 x = tem - 2.d0**52
   end subroutine sub
 end program test

And for the sake of completeness the evaluation of sub above and

   subroutine sub2(x)
 real*8 x
 x = (x + 2.d0**52) - 2.d0**52
   end subroutine sub2

should behave consistently if I read your Fortran standard
quotations correctly.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #30 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
16:29:46 UTC ---
(In reply to comment #29)
 And for the sake of completeness the evaluation of sub above and
  x = (x + 2.d0**52) - 2.d0**52
 should behave consistently if I read your Fortran standard
 quotations correctly.

Well, it kind of does, only when mixing (in GCC) -funsafe-math-optimizations
with -fprotect-parens or (in ifort) -assume protect_parens with a non-strict
-fp-model, you get a different results: 1.0 with the () version and 1.3 with
the 'tmp' version.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #31 from rguenther at suse dot de rguenther at suse dot de 
2011-12-02 16:32:52 UTC ---
On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904
 
 --- Comment #30 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
 16:29:46 UTC ---
 (In reply to comment #29)
  And for the sake of completeness the evaluation of sub above and
   x = (x + 2.d0**52) - 2.d0**52
  should behave consistently if I read your Fortran standard
  quotations correctly.
 
 Well, it kind of does, only when mixing (in GCC) -funsafe-math-optimizations
 with -fprotect-parens or (in ifort) -assume protect_parens with a non-strict
 -fp-model, you get a different results: 1.0 with the () version and 1.3 with
 the 'tmp' version.

Ok, which is, I suppose, a bug in both compilers.

Richard.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #32 from Dominique d'Humieres dominiq at lps dot ens.fr 
2011-12-02 16:37:37 UTC ---
 And for the sake of completeness the evaluation of sub above and

subroutine sub2(x)
  real*8 x
  x = (x + 2.d0**52) - 2.d0**52
end subroutine sub2

 should behave consistently if I read your Fortran standard
 quotations correctly.

According my reading of

 Once the interpretation of a numeric intrinsic operation is established, the
 processor may evaluate any mathematically equivalent expression, provided that
 the integrity of parentheses is not violated.

this is different from

   subroutine sub(x)
 real*8 x, tem
 tem = x + 2.d0**52
 x = tem - 2.d0**52
   end subroutine sub

where 'x=tem-2.d0**52' can be evaluated as 'x=x+2.d0**52-2.d0**52' then as 'x'
(as long as x and tmp are of the same kind(?)), while in the former case '(x +
2.d0**52) - 2.d0**52' is prohibited by the standard to be evaluated as 'x'.

Note that if I replace 'tem = x + 2.d0**52' with 'tem = (x + 2.d0**52)', I get
1.0 unless I use -fno-protect-parens.

All this has been discussed previously, but the only pr I have been to find is
pr32172.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #33 from Dominique d'Humieres dominiq at lps dot ens.fr 
2011-12-02 16:45:24 UTC ---
 The failing polyhedron 2005 benchmark is linpk which can be seen with -Ofast 
 on
 x86_64-apple-darwin11...

  Value= 25.114499300 Target= 23.1 Tolerance= 2.00
 F

I think this test is not relevant: the target is already a residual error,
hence very sensitive to the way the computation is performed and a 10%
tolerance cannot be used to evaluate the accuracy of the residual.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #34 from Tobias Burnus burnus at gcc dot gnu.org 2011-12-02 
17:06:57 UTC ---
(In reply to comment #31)
 Ok, which is, I suppose, a bug in both compilers.

Kind of, though, -ffast-math by itself already is on the verge of violating the
standard. I think -fno-protect-parens could be enabled by -ffast-math as that
means that one does not really care about the exact value.

However, there are users which want to have one or the other. Namely, most
users are happy with a default -fno-protect-parens but don't dare to use
-ffast-math, while others want to have -ffast-math optimizations but with
honored parentheses.

If we want to add add extra protection for
 tem = x + 2.d0**52
 x = tem - 2.d0**52
we probably need to add yet another flag as there are surely users, which want
to have protected parentheses but allow for optimizations in the 'tmp' case.
[Even if, as this PR shows, the extra optimization opportunity might lead to a
missed opportunity.] In any case, handling that well for function calls,
inlining and the scalarizer seems to be difficult. And frankly, I am not sure
whether there is any user; -ffast-math plus -fprotect-parens is already special
(cf. comment 32 for one user). Having -ffast-math plus parentheses plus
protected assignments might have even fewer users.

I believe most users simply use -O2, -O3 [-ffast-math], or -Ofast without
thinking (very) much about the options. [I also use typically either -O2, -O3
or -Ofast.]

* * *

Back to the comment 0 issue: I still do not quite understand what the double
evaluation (on tree level) of __builtin_pow in
  D.1959_82 = ((D.2115_81));
  D.1960_83 = __builtin_pow (D.1959_82, 2.0e+0);
  D.1978_168 = __builtin_pow (D.2115_81, 2.0e+0);
has to do with the -Ofast slow down. If I have understood it correctly, on tree
level, there is no reason for it while the slow-down happens on RTL level. That
-fprotect-parens makes it faster is a mere coincidence. Is that a correct rough
summary?


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-02 Thread ebotcazou at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #35 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-12-02 
21:21:15 UTC ---
 One thing I notice (and that's the only difference I can spot at the tree
 level) is that we do not CSE the **2s of

There are many missed hoisting opportunities, with or without the switch. 
There are just a few more with the switch, hence the performance regression.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-12-01 Thread ebotcazou at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

Eric Botcazou ebotcazou at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC|ebotcazou at gcc dot|
   |gnu.org |
  Component|tree-optimization   |rtl-optimization
 AssignedTo|unassigned at gcc dot   |ebotcazou at gcc dot
   |gnu.org |gnu.org

--- Comment #19 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-12-01 
19:53:15 UTC ---
 lim3 was added as a hack, now yes, cunroll needs ccp after it (but it's
 there in the form of DOM and VRP).  It's a pass ordering issue that we
 cannot ever solve.

OK, but that doesn't explain why LIM isn't able to hoist the loads...

 Please - it seems like a missed optimization there, too.

More of an acknowledged limitation I'd say.  And RTL passes aren't supposed to
be enhanced to plug holes in the Tree passes, but let's try anyway.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-11-11 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #14 from Venkataramanan Kumar venkataramanan.kumar.gnu at gmail 
dot com 2011-11-11 22:58:01 UTC ---
I ran polyhedron benchmarks with -march=bdver1 and -Ofast. Induct run time was
brought down to 53.45 sec from 70.93 sec. Other benchmarks are not affected
much.

I am planning to test on older machine.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-11-09 Thread ebotcazou at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

Eric Botcazou ebotcazou at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #25748|0   |1
is obsolete||

--- Comment #12 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-11-09 
08:57:37 UTC ---
Created attachment 25764
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25764
Tentative fix (3)

Final version.  Can someone try it on his favorite Fortran benchmark?


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-11-09 Thread venkataramanan.kumar.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #13 from Venkataramanan Kumar venkataramanan.kumar.gnu at gmail 
dot com 2011-11-09 10:22:39 UTC ---
(In reply to comment #12)
 Created attachment 25764 [details]
 Tentative fix (3)
 Final version.  Can someone try it on his favorite Fortran benchmark?

Ok I will check and let you know the results.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-11-07 Thread ebotcazou at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

Eric Botcazou ebotcazou at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #25731|0   |1
is obsolete||

--- Comment #11 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-11-08 
00:33:24 UTC ---
Created attachment 25748
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25748
Tentative fix (2)

This one has a small glitch corrected.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-11-06 Thread ebotcazou at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #10 from Eric Botcazou ebotcazou at gcc dot gnu.org 2011-11-07 
00:32:49 UTC ---
Created attachment 25731
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25731
Tentative fix

This enhances RTL PRE.  You need an up-to-date tree to apply it.

The other approaches are TER throttling and machine description fiddling.


[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast

2011-11-05 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |4.7.0

--- Comment #9 from Richard Guenther rguenth at gcc dot gnu.org 2011-11-05 
11:52:55 UTC ---
Fortran enabling -fno-protect-parens would be the regression, the RTL opt
problem likely isn't.  Keeping at P2 for now.