[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 Richard Guenther changed: What|Removed |Added Priority|P3 |P2 Target Milestone|--- |4.7.0 --- Comment #9 from Richard Guenther 2011-11-05 11:52:55 UTC --- Fortran enabling -fno-protect-parens would be the regression, the RTL opt problem likely isn't. Keeping at P2 for now.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #10 from Eric Botcazou 2011-11-07 00:32:49 UTC --- Created attachment 25731 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25731 Tentative fix This enhances RTL PRE. You need an up-to-date tree to apply it. The other approaches are TER throttling and machine description fiddling.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 Eric Botcazou changed: What|Removed |Added Attachment #25731|0 |1 is obsolete|| --- Comment #11 from Eric Botcazou 2011-11-08 00:33:24 UTC --- Created attachment 25748 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25748 Tentative fix (2) This one has a small glitch corrected.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 Eric Botcazou changed: What|Removed |Added Attachment #25748|0 |1 is obsolete|| --- Comment #12 from Eric Botcazou 2011-11-09 08:57:37 UTC --- Created attachment 25764 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25764 Tentative fix (3) Final version. Can someone try it on his favorite Fortran benchmark?
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #13 from Venkataramanan Kumar 2011-11-09 10:22:39 UTC --- (In reply to comment #12) > Created attachment 25764 [details] > Tentative fix (3) > Final version. Can someone try it on his favorite Fortran benchmark? Ok I will check and let you know the results.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #14 from Venkataramanan Kumar 2011-11-11 22:58:01 UTC --- I ran polyhedron benchmarks with -march=bdver1 and -Ofast. Induct run time was brought down to 53.45 sec from 70.93 sec. Other benchmarks are not affected much. I am planning to test on older machine.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 Eric Botcazou changed: What|Removed |Added Status|NEW |ASSIGNED CC|ebotcazou at gcc dot| |gnu.org | Component|tree-optimization |rtl-optimization AssignedTo|unassigned at gcc dot |ebotcazou at gcc dot |gnu.org |gnu.org --- Comment #19 from Eric Botcazou 2011-12-01 19:53:15 UTC --- > lim3 was added as a "hack", now yes, cunroll needs ccp after it (but it's > there in the form of DOM and VRP). It's a pass ordering issue that we > cannot ever solve. OK, but that doesn't explain why LIM isn't able to hoist the loads... > Please - it seems like a missed optimization there, too. More of an acknowledged limitation I'd say. And RTL passes aren't supposed to be enhanced to plug holes in the Tree passes, but let's try anyway.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #20 from rguenther at suse dot de 2011-12-02 09:49:39 UTC --- On Thu, 1 Dec 2011, ebotcazou at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 > > Eric Botcazou changed: > >What|Removed |Added > > Status|NEW |ASSIGNED > CC|ebotcazou at gcc dot| >|gnu.org | > Component|tree-optimization |rtl-optimization > AssignedTo|unassigned at gcc dot |ebotcazou at gcc dot >|gnu.org |gnu.org > > --- Comment #19 from Eric Botcazou 2011-12-01 > 19:53:15 UTC --- > > lim3 was added as a "hack", now yes, cunroll needs ccp after it (but it's > > there in the form of DOM and VRP). It's a pass ordering issue that we > > cannot ever solve. > > OK, but that doesn't explain why LIM isn't able to hoist the loads... If the expressions only become invariant after unrolling then the issue is that without CCP LIM does not see they are invariant I suppose. I'll have a closer look.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #21 from Eric Botcazou 2011-12-02 10:54:45 UTC --- > If the expressions only become invariant after unrolling then the issue > is that without CCP LIM does not see they are invariant I suppose. No, adding a CCP pass doesn't help (at least immediately). > I'll have a closer look. Thanks in advance.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #22 from Richard Guenther 2011-12-02 11:50:39 UTC --- One thing I notice (and that's the only difference I can spot at the tree level) is that we do not CSE the **2s of a = sqrt((rect_inductor%v2%x - rect_inductor%v4%x)**2 + (rect_inductor%v2%y - & rect_inductor%v4%y)**2 + (rect_inductor%v2%z - rect_inductor%v4%z)**2) and xxvec = rect_inductor%v2%x - rect_inductor%v4%x xyvec = rect_inductor%v2%y - rect_inductor%v4%y xzvec = rect_inductor%v2%z - rect_inductor%v4%z magnitude = sqrt(xxvec**2 + xyvec**2 + xzvec**2) because while the former has PAREN_EXPRs the latter does not and we do not consider : D.2113_79 = rect_inductor_78(D)->v2.x; D.2114_80 = rect_inductor_78(D)->v4.x; D.2115_81 = D.2113_79 - D.2114_80; D.1959_82 = ((D.2115_81)); D.1960_83 = __builtin_pow (D.1959_82, 2.0e+0); ... D.1978_168 = __builtin_pow (D.2115_81, 2.0e+0); D.1960_83 and D.1978_168 as equivalent (they are, value-wise, but we cannot easily replace one with the other using our current value-numbering machinery). We could clevery see that ((x))**2 is equal to ((x**2)) but that would not help for seeing the CSE opportunity of the following sum and sqrt either. I wonder if Fortran, with -fprotect-parens, really has different semantics for tem = 2 * a; c = b / tem; vs. c = b / (2 * a); ? Thus, is not every statement supposed to be wrapped in parens with -fprotect-parens? So that tem = 2 * a; becomes tem = ( 2 * a ); implicitely? I see that placing ()s at the toplevel of the relevant stmts in the source has the desired effect of enabling CSE and the tree level optimization differences vanish. Thus, this is a question of 1) correctness of the -fprotect-parens implementation in the frontend, 2) a question on what optimizations we want to perform on protected expressions. Relevant transform is CSE sqrt (x*x + y*y + z*z) and sqrt (((x))*((x)) + ((y))*((y)) + ((z))*((z)))
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 Tobias Burnus changed: What|Removed |Added CC||burnus at gcc dot gnu.org, ||kargl at gcc dot gnu.org, ||pault at gcc dot gnu.org --- Comment #23 from Tobias Burnus 2011-12-02 14:03:41 UTC --- (In reply to comment #22) > I wonder if Fortran [...] Well, let's start with the Fortran standard (Fortran 2008): "7.1.5.2.4 Evaluation of numeric intrinsic operations "The execution of any numeric operation whose result is not defined by the arithmetic used by the processor is prohibited. Raising a negative-valued primary of type real to a real power is prohibited. "Once the interpretation of a numeric intrinsic operation is established, the processor may evaluate any mathematically equivalent expression, provided that the integrity of parentheses is not violated. "Two expressions of a numeric type are mathematically equivalent if, for all possible values of their primaries, their mathematical values are equal. However, mathematically equivalent expressions of numeric type may produce different computational results." [The section then contains a few non-normative notes; cf. http://gcc.gnu.org/wiki/GFortranStandards#Fortran_2008 ] And for the assignment: "7.2.1 Assignment statement" [...] "R732 assignment-stmt is variable = expr" [...] "Execution of an intrinsic assignment causes, in effect, the evaluation of the expression expr and all expressions within variable (7.1), the possible conversion of expr to the type and type parameters of the variable (Table 7.9), and the definition of the variable with the resulting value. The execution of the assignment shall have the same effect as if the evaluation of expr and the evaluation of all expressions in variable occurred before any portion of the variable is defined by the assignment. The evaluation of expressions within variable shall neither affect nor be affected by the evaluation of expr." > with -fprotect-parens, really has different semantics for > tem = 2 * a; > c = b / tem; > vs. > c = b / (2 * a); > ? > > Thus, is not every statement supposed to be wrapped in parens with > -fprotect-parens? So that > tem = 2 * a; > becomes > tem = ( 2 * a ); > implicitely? [...] > Thus, this is a question of 1) correctness of the -fprotect-parens > implementation in the frontend, 2) a question on what optimizations > we want to perform on protected expressions. It somehow looks as if one needs to add implicitly parentheses; this gets more complicated, if one takes the scalarizer or inlining into account. Contrary to the explicit parentheses, I am not aware of a program which breaks with the extra temporary, but that's does not tell much. (Side note: I think the majority of users doesn't care [or know] about the protection of either parentheses or the separate assignment statements - and is happy as long the result is mathematical the same. Though, some users do care as with unprotected parentheses their program breaks.)
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #24 from rguenther at suse dot de 2011-12-02 14:31:27 UTC --- On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 > > Tobias Burnus changed: > >What|Removed |Added > > CC||burnus at gcc dot gnu.org, >||kargl at gcc dot gnu.org, >||pault at gcc dot gnu.org > > --- Comment #23 from Tobias Burnus 2011-12-02 > 14:03:41 UTC --- > (In reply to comment #22) > > I wonder if Fortran [...] > > Well, let's start with the Fortran standard (Fortran 2008): > > "7.1.5.2.4 Evaluation of numeric intrinsic operations > > "The execution of any numeric operation whose result is not defined by the > arithmetic used by the processor is prohibited. Raising a negative-valued > primary of type real to a real power is prohibited. > > "Once the interpretation of a numeric intrinsic operation is established, the > processor may evaluate any mathematically equivalent expression, provided that > the integrity of parentheses is not violated. > > "Two expressions of a numeric type are mathematically equivalent if, for all > possible values of their primaries, their mathematical values are equal. > However, mathematically equivalent expressions of numeric type may produce > different computational results." > > [The section then contains a few non-normative notes; cf. > http://gcc.gnu.org/wiki/GFortranStandards#Fortran_2008 ] > > And for the assignment: > > "7.2.1 Assignment statement" [...] > "R732 assignment-stmt is variable = expr" > [...] > "Execution of an intrinsic assignment causes, in effect, the evaluation of the > expression expr and all expressions within variable (7.1), the possible > conversion of expr to the type and type parameters of the variable (Table > 7.9), > and the definition of the variable with the resulting value. The execution of > the assignment shall have the same effect as if the evaluation of expr and the > evaluation of all expressions in variable occurred before any portion of the > variable is defined by the assignment. The evaluation of expressions within > variable shall neither affect nor be affected by the evaluation of expr." > > > > with -fprotect-parens, really has different semantics for > > tem = 2 * a; > > c = b / tem; > > vs. > > c = b / (2 * a); > > ? > > > > Thus, is not every statement supposed to be wrapped in parens with > > -fprotect-parens? So that > > tem = 2 * a; > > becomes > > tem = ( 2 * a ); > > implicitely? > [...] > > Thus, this is a question of 1) correctness of the -fprotect-parens > > implementation in the frontend, 2) a question on what optimizations > > we want to perform on protected expressions. > > It somehow looks as if one needs to add implicitly parentheses; this gets more > complicated, if one takes the scalarizer or inlining into account. > > Contrary to the explicit parentheses, I am not aware of a program which breaks > with the extra temporary, but that's does not tell much. (Side note: I think > the majority of users doesn't care [or know] about the protection of either > parentheses or the separate assignment statements - and is happy as long the > result is mathematical the same. Though, some users do care as with > unprotected > parentheses their program breaks.) Every program that would break with non honoring explicit parantheses would also break if the bracketed expression would be explicitely computed into a temporary (without explicit parantheses). So it should be easy to construct a testcase if you have one that breaks without -fno-protect-parens. Richard.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 Tobias Burnus changed: What|Removed |Added CC|bur...@net-b.de |dominiq at lps dot ens.fr --- Comment #25 from Tobias Burnus 2011-12-02 14:40:48 UTC --- (In reply to comment #24) > Every program that would break with non honoring explicit parantheses > would also break if the bracketed expression would be explicitely > computed into a temporary (without explicit parantheses). So it > should be easy to construct a testcase if you have one that breaks > without -fno-protect-parens. I vaguely recall that one of the Polyhedron benchmarks gets minutely out of the correctness-check tolerance range with -fno-protect-parens while it stays within without. I think Dominique has a program where the effect is more disastrous.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #26 from rguenther at suse dot de 2011-12-02 15:02:27 UTC --- On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 > > Tobias Burnus changed: > >What|Removed |Added > > CC|bur...@net-b.de |dominiq at lps dot ens.fr > > --- Comment #25 from Tobias Burnus 2011-12-02 > 14:40:48 UTC --- > (In reply to comment #24) > > Every program that would break with non honoring explicit parantheses > > would also break if the bracketed expression would be explicitely > > computed into a temporary (without explicit parantheses). So it > > should be easy to construct a testcase if you have one that breaks > > without -fno-protect-parens. > > I vaguely recall that one of the Polyhedron benchmarks gets minutely out of > the > correctness-check tolerance range with -fno-protect-parens while it stays > within without. I think Dominique has a program where the effect is more > disastrous. The trivial example is (x + 2**52) - 2**52 which rounds x to an integer. Without parens we optimize away that rounding effect. Thus, real*8 x, tem x = 1.3d tem = x + 2.d**52 x = tem - 2.d**52 if (x.ne.1.0d) call abort should not fail (minus my fortran coding errors ;)) with -fprotect-parens
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #27 from Tobias Burnus 2011-12-02 16:02:45 UTC --- (In reply to comment #26) > The trivial example is (x + 2**52) - 2**52 which rounds x to > an integer. Without parens we optimize away that rounding effect. Corrected example. The result I get with other compilers matches the current behaviour of GCC/gfortran: - GCC: Gives (of course independent of -fno-protect-parens): 1.3 with "-O1 -ffast-math", 1.0 without -ffast-math. - Intel ifort 12.2: -O1 has 1.0, -O2 has 1.3, -assume protect_parens does not help but -fp-model strict does (with -O2: 1.0). - PGI pgf95 11.5-0: 1.0 with up to -O4. - Crayftn 7.1.4.111: 1.0 for -O0, 1.3 for -O1. Option "-O fp0" gives 1.0 while already "-O fp1" gives 1.3. - PathScale pathf95 3.2.99: 1.0 for up to -O3, -Ofast prints 1.3. As with GCC, -OPT:fast_math={on,off} toggles between 1.0 and 1.3 - NAG f95: 1.0 for up to -O4, 1.3 with -Ounsafe. - Sun Fortran 95 8.3: 1.0 for -O4, 1.3 for -fast. program test implicit none real(8), volatile :: y y = 1.3d0 call sub(y) print *, y ! if (y /= 1.0d0) & ! call abort contains subroutine sub(x) real*8 x, tem tem = x + 2.d0**52 x = tem - 2.d0**52 end subroutine sub end program test
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 Jack Howarth changed: What|Removed |Added CC||howarth at nitro dot ||med.uc.edu --- Comment #28 from Jack Howarth 2011-12-02 16:10:59 UTC --- The failing polyhedron 2005 benchmark is linpk which can be seen with -Ofast on x86_64-apple-darwin11... > Value= 25.114499300 Target= 23.1 Tolerance= 2.00 FAIL > Value=0.27880142600E-10 Target=0.27858826400E-10 Tolerance=0.100E-09 > Value=0.22204460500E-15 Target=0.22204460500E-15 Tolerance=0.100E-14 > Value= 1.00 Target= 1.00 Tolerance=0.100E-07 > Value= 1.00 Target= 1.00 Tolerance=0.100E-07 linpk FAILED1 fails and4 passes
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #29 from rguenther at suse dot de 2011-12-02 16:13:25 UTC --- On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 > > --- Comment #27 from Tobias Burnus 2011-12-02 > 16:02:45 UTC --- > (In reply to comment #26) > > The trivial example is (x + 2**52) - 2**52 which rounds x to > > an integer. Without parens we optimize away that rounding effect. > > Corrected example. The result I get with other compilers matches the current > behaviour of GCC/gfortran: > > - GCC: Gives (of course independent of -fno-protect-parens): 1.3 with "-O1 > -ffast-math", 1.0 without -ffast-math. Indeed GCC does not perform FP association without some sub-flags enabled by -ffast-math (it assumes then intermediate rounding is to be preserved). > - Intel ifort 12.2: -O1 has 1.0, -O2 has 1.3, -assume protect_parens does not > help but -fp-model strict does (with -O2: 1.0). > > - PGI pgf95 11.5-0: 1.0 with up to -O4. > > - Crayftn 7.1.4.111: 1.0 for -O0, 1.3 for -O1. Option "-O fp0" gives 1.0 while > already "-O fp1" gives 1.3. > > - PathScale pathf95 3.2.99: 1.0 for up to -O3, -Ofast prints 1.3. As with GCC, > -OPT:fast_math={on,off} toggles between 1.0 and 1.3 > > - NAG f95: 1.0 for up to -O4, 1.3 with -Ounsafe. > > - Sun Fortran 95 8.3: 1.0 for -O4, 1.3 for -fast. > > program test > implicit none > real(8), volatile :: y > y = 1.3d0 > call sub(y) > print *, y > ! if (y /= 1.0d0) & > ! call abort > contains > subroutine sub(x) > real*8 x, tem > tem = x + 2.d0**52 > x = tem - 2.d0**52 > end subroutine sub > end program test And for the sake of completeness the evaluation of sub above and subroutine sub2(x) real*8 x x = (x + 2.d0**52) - 2.d0**52 end subroutine sub2 should behave consistently if I read your Fortran standard quotations correctly.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #30 from Tobias Burnus 2011-12-02 16:29:46 UTC --- (In reply to comment #29) > And for the sake of completeness the evaluation of sub above and > x = (x + 2.d0**52) - 2.d0**52 > should behave consistently if I read your Fortran standard > quotations correctly. Well, it kind of does, only when mixing (in GCC) -funsafe-math-optimizations with -fprotect-parens or (in ifort) "-assume protect_parens" with a non-strict -fp-model, you get a different results: 1.0 with the () version and 1.3 with the 'tmp' version.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #31 from rguenther at suse dot de 2011-12-02 16:32:52 UTC --- On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 > > --- Comment #30 from Tobias Burnus 2011-12-02 > 16:29:46 UTC --- > (In reply to comment #29) > > And for the sake of completeness the evaluation of sub above and > > x = (x + 2.d0**52) - 2.d0**52 > > should behave consistently if I read your Fortran standard > > quotations correctly. > > Well, it kind of does, only when mixing (in GCC) -funsafe-math-optimizations > with -fprotect-parens or (in ifort) "-assume protect_parens" with a non-strict > -fp-model, you get a different results: 1.0 with the () version and 1.3 with > the 'tmp' version. Ok, which is, I suppose, a bug in both compilers. Richard.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #32 from Dominique d'Humieres 2011-12-02 16:37:37 UTC --- > And for the sake of completeness the evaluation of sub above and > >subroutine sub2(x) > real*8 x > x = (x + 2.d0**52) - 2.d0**52 >end subroutine sub2 > > should behave consistently if I read your Fortran standard > quotations correctly. According my reading of > "Once the interpretation of a numeric intrinsic operation is established, the > processor may evaluate any mathematically equivalent expression, provided that > the integrity of parentheses is not violated." this is different from > subroutine sub(x) > real*8 x, tem > tem = x + 2.d0**52 > x = tem - 2.d0**52 > end subroutine sub where 'x=tem-2.d0**52' can be evaluated as 'x=x+2.d0**52-2.d0**52' then as 'x' (as long as x and tmp are of the same kind(?)), while in the former case '(x + 2.d0**52) - 2.d0**52' is prohibited by the standard to be evaluated as 'x'. Note that if I replace 'tem = x + 2.d0**52' with 'tem = (x + 2.d0**52)', I get 1.0 unless I use -fno-protect-parens. All this has been discussed previously, but the only pr I have been to find is pr32172.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #33 from Dominique d'Humieres 2011-12-02 16:45:24 UTC --- > The failing polyhedron 2005 benchmark is linpk which can be seen with -Ofast > on > x86_64-apple-darwin11... > > > Value= 25.114499300 Target= 23.1 Tolerance= 2.00 > F I think this test is not relevant: the "target" is already a residual error, hence very sensitive to the way the computation is performed and a 10% tolerance cannot be used to evaluate the "accuracy" of the residual.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #34 from Tobias Burnus 2011-12-02 17:06:57 UTC --- (In reply to comment #31) > Ok, which is, I suppose, a bug in both compilers. Kind of, though, -ffast-math by itself already is on the verge of violating the standard. I think -fno-protect-parens could be enabled by -ffast-math as that means that one does not really care about the exact value. However, there are users which want to have one or the other. Namely, most users are happy with a default -fno-protect-parens but don't dare to use -ffast-math, while others want to have -ffast-math optimizations but with honored parentheses. If we want to add add extra protection for tem = x + 2.d0**52 x = tem - 2.d0**52 we probably need to add yet another flag as there are surely users, which want to have protected parentheses but allow for optimizations in the 'tmp' case. [Even if, as this PR shows, the extra optimization opportunity might lead to a missed opportunity.] In any case, handling that well for function calls, inlining and the scalarizer seems to be difficult. And frankly, I am not sure whether there is any user; -ffast-math plus -fprotect-parens is already special (cf. comment 32 for one user). Having -ffast-math plus parentheses plus protected assignments might have even fewer users. I believe most users simply use -O2, -O3 [-ffast-math], or -Ofast without thinking (very) much about the options. [I also use typically either -O2, -O3 or -Ofast.] * * * Back to the comment 0 issue: I still do not quite understand what the double evaluation (on tree level) of __builtin_pow in D.1959_82 = ((D.2115_81)); D.1960_83 = __builtin_pow (D.1959_82, 2.0e+0); D.1978_168 = __builtin_pow (D.2115_81, 2.0e+0); has to do with the -Ofast slow down. If I have understood it correctly, on tree level, there is no reason for it while the slow-down happens on RTL level. That -fprotect-parens makes it faster is a mere coincidence. Is that a correct rough summary?
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #35 from Eric Botcazou 2011-12-02 21:21:15 UTC --- > One thing I notice (and that's the only difference I can spot at the tree > level) is that we do not CSE the **2s of There are many missed hoisting opportunities, with or without the switch. There are just a few more with the switch, hence the performance regression.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #36 from Dominique d'Humieres 2011-12-03 14:54:40 UTC --- > Kind of, though, -ffast-math by itself already is on the verge of violating > the > standard. I disagree with this statement at least for codes that does not use IEEE intrinsic modules (not yet implemented in gfortran). Indeed the interaction between -ffast-math and IEEE intrinsic modules will have to be discussed and documented when these modules will be implemented (see pr50724 for the kind of problems). > I think -fno-protect-parens could be enabled by -ffast-math as that > means that one does not really care about the exact value. PLEEEASE DON'T. The situation is bad enough with -Ofast to not make it worse: with -fno-protect-parens gfortran no longer complies with the Fortran standard (7.1.5.2.4 quoted in comment #23) and IMO SHOULD NOT be part of any compound option. Note that if I am using -ffast-math, it is not because I do "not really care about the exact value". It is mostly because I KNOW that exceptions will be the signature of either a bug in my code and/or a bad choice of the parameters leading to numerical instabilities. In top of that, I think that the concept of "exact value" for floating-point numbers is ill-posed and as a consequence I do accept that the least significant digits may depend on the way I write the code or it is optimized (small fluctuations for well-posed methods, large ones otherwise). > If we want to add add extra protection for > tem = x + 2.d0**52 > x = tem - 2.d0**52 > we probably need to add yet another flag as there are surely users, which want > to have protected parentheses but allow for optimizations in the 'tmp' case. If I need an extra protection, I'll put parentheses and I don't need yet another flag. However, since the logic of the optimization is surprising the first time you hit it, it could (should) be documented.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #37 from rguenther at suse dot de 2011-12-05 08:18:00 UTC --- On Fri, 2 Dec 2011, burnus at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 > > --- Comment #34 from Tobias Burnus 2011-12-02 > 17:06:57 UTC --- [...] > * * * > > Back to the comment 0 issue: I still do not quite understand what the double > evaluation (on tree level) of __builtin_pow in > D.1959_82 = ((D.2115_81)); > D.1960_83 = __builtin_pow (D.1959_82, 2.0e+0); > D.1978_168 = __builtin_pow (D.2115_81, 2.0e+0); > has to do with the -Ofast slow down. If I have understood it correctly, on > tree > level, there is no reason for it while the slow-down happens on RTL level. Indeed I can find no other difference on the tree level (thus, no invariant motion missed optimization that isn't present with both -f[no-]protect-parens). > That -fprotect-parens makes it faster is a mere coincidence. Is that a > correct rough > summary? Yes. Thus, I think if at the RTL level we see a missed invariant motion then this is a RTL level bug (esp. if it only triggers with -fno-protect-parens). Richard.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #38 from rguenther at suse dot de 2011-12-05 08:27:08 UTC --- On Fri, 2 Dec 2011, ebotcazou at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 > > --- Comment #35 from Eric Botcazou 2011-12-02 > 21:21:15 UTC --- > > One thing I notice (and that's the only difference I can spot at the tree > > level) is that we do not CSE the **2s of > > There are many missed hoisting opportunities, with or without the switch. > There are just a few more with the switch, hence the performance regression. Most of them (not sure if you mean those) are because they are considered "cheap" by LIM and thus are not moved: vect_px2gauss.123_641 = &x2gauss; invariant up to level 1, cost 1. vect_cst_.126_659 = { 1.0e+0, 1.0e+0 }; invariant up to level 1, cost 1. vect_cst_.128_661 = {D.2126_109, D.2126_109}; invariant up to level 1, cost 1. ... vect_px2gauss.120_20 = vect_px2gauss.123_641 + 16; invariant up to level 1, cost 2. ... ivtmp.176_899 = 1; invariant up to level 1, cost 1. ... vect_px2gauss.120_649 = vect_px2gauss.120_336; invariant up to level 1, cost 5. ... ISTR discussing to remove all cost considerations for tree level loop invariant motion and simply move everything possible (PRE for example doesn't consider any costs and moves all invariants). If you use --param lim-expensive=1 you get all invariants moved on the tree level - does that solve the slowdown issue? The issue is of course that this might increase register pressure as we are not good in re-materializing for example constants inside a loop. I'll give --param lim-expensive=1 a try on SPEC 2k6
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #39 from Eric Botcazou 2011-12-05 09:21:15 UTC --- > Thus, I think if at the RTL level we see a missed invariant motion then > this is a RTL level bug (esp. if it only triggers with -fno-protect-parens). Well, how can the RTL level invent load hoisting opportunities? They are of course already present at the Tree level, see the .optimized dump: vect_var_.124_350 = MEM[(real(kind=8)[9] *)&x2gauss]; vect_var_.133_823 = MEM[(real(kind=8)[9] *)&y2gauss]; vect_var_.157_586 = MEM[(real(kind=8)[9] *)&w2gauss]; vect_var_.124_357 = MEM[(real(kind=8)[9] *)&x2gauss + 16B]; vect_var_.133_363 = MEM[(real(kind=8)[9] *)&y2gauss + 16B]; vect_var_.157_874 = MEM[(real(kind=8)[9] *)&w2gauss + 16B]; vect_var_.124_405 = MEM[(real(kind=8)[9] *)&x2gauss + 32B]; vect_var_.133_594 = MEM[(real(kind=8)[9] *)&y2gauss + 32B]; vect_var_.157_610 = MEM[(real(kind=8)[9] *)&w2gauss + 32B]; vect_var_.124_651 = MEM[(real(kind=8)[9] *)&x2gauss + 48B]; vect_var_.133_680 = MEM[(real(kind=8)[9] *)&y2gauss + 48B]; vect_var_.157_805 = MEM[(real(kind=8)[9] *)&w2gauss + 48B];
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #40 from rguenther at suse dot de 2011-12-05 09:55:47 UTC --- On Mon, 5 Dec 2011, ebotcazou at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 > > --- Comment #39 from Eric Botcazou 2011-12-05 > 09:21:15 UTC --- > > Thus, I think if at the RTL level we see a missed invariant motion then > > this is a RTL level bug (esp. if it only triggers with > > -fno-protect-parens). > > Well, how can the RTL level invent load hoisting opportunities? They are of > course already present at the Tree level, see the .optimized dump: > > vect_var_.124_350 = MEM[(real(kind=8)[9] *)&x2gauss]; They are considered dependent because they are still decomposed as vect_px2gauss.123_680 = &x2gauss; ... vect_var_.124_350 = MEM[(real(kind=8)[9] *)vect_px2gauss.123_680]; during LIM3. Let me check why we don't fix that up in LIM dependence checking.
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #41 from Dominique d'Humieres 2011-12-05 10:12:39 UTC --- Using --param lim-expensive=1 when compiling induct.f90 does not change the timing, as for today (r181994): gfc -Ofast induct.f90 -> 14.62s gfc -Ofast induct.f90 --param lim-expensive=1 -> 14.61s gfc -fprotect-parens -Ofast induct.f90 -> 14.11s gfc -fprotect-parens -Ofast induct.f90 --param lim-expensive=1-> 14.12s (a ~0.15s improvement over the timing in comment #1).
[Bug rtl-optimization/50904] [4.7 regression] pessimization when -fno-protect-parens is enabled by -Ofast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904 --- Comment #42 from Richard Guenther 2011-12-05 10:19:11 UTC --- Argh. It seems LIM didn't get proper lifting both at tuplification and alias-improvements time. So it's memory handling (everything it does with VOPs) is a little very much conservative (read: it doesn't really work). I'll look into this.