[Bug tree-optimization/54073] [4.7 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2014-06-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||4.8.0
 Resolution|--- |FIXED
   Target Milestone|4.7.4   |4.8.0
  Known to fail|4.8.0   |4.7.4

--- Comment #19 from Richard Biener  ---
Fixed for 4.8.0.


[Bug tree-optimization/54073] [4.7 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2013-04-11 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



Richard Biener  changed:



   What|Removed |Added



   Target Milestone|4.7.3   |4.7.4



--- Comment #18 from Richard Biener  2013-04-11 
07:59:21 UTC ---

GCC 4.7.3 is being released, adjusting target milestone.


[Bug tree-optimization/54073] [4.7 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2013-02-17 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



--- Comment #17 from Uros Bizjak  2013-02-17 08:40:52 
UTC ---

(In reply to comment #16)



> I have done quite a bit of analysis on cmov performance across x86

> architectures, so I will share here in case it helps:



I have moved this discussion to PR56309. Let's keep this PR open for eventual

backport of the patch in Comment #13 to 4.7 branch.


[Bug tree-optimization/54073] [4.7 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2013-02-16 Thread jake.stine at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



Jake Stine  changed:



   What|Removed |Added



 CC||jake.stine at gmail dot com



--- Comment #16 from Jake Stine  2013-02-16 
19:12:05 UTC ---

Hi,



I have done quite a bit of analysis on cmov performance across x86

architectures, so I will share here in case it helps:



Quick summary: Conditional moves on Intel Core/Xeon and AMD Bulldozer

architectures should probably be avoided "as a rule."



History: Conditional moves were beneficial for the Intel Pentium 4, and also

(but less-so) for AMD Athlon/Phenom chips.  In the AMD Athlon/Phenom case the

performance of cmov vs cmp+branch is determined more by the alignment of the

target of the branch, than by the prediction rate of the branch.  The

instruction decoders would incur penalties on certain types of unaligned branch

targets (when taken), or when decoding sequences of instructions that contained

multiple branches within a 16byte "fetch" window (taken or not).  cmov was

sometimes handy for avoiding those.



With regard to more current Intel Core and AMD Bulldozer/Bobcat architecture:



I have found that use of conditional moves (cmov) is only beneficial if the

branch that the move is replacing is badly mis-predicted.  In my tests, the

cmov only became clearly "optimal" when the branch was predicted correctly less

than 92% of the time, which is abysmal by modern branch predictor standards and

rarely occurs in practice.  Above 97% prediction rates, cmov is typically

slower than cmp+branch. Inside loops that contain branches with prediction

rates approaching 100% (as is the case presented by the OP), cmov becomes a

severe performance bottleneck.  This holds true for both Core and Bulldozer. 

Bulldozer has less efficient branching than the i7, but is also severely

bottlenecked by its limited fetch/decode.  Cmov requires executing more total

instructions, and that makes Bulldozer very unhappy.



Note that my tests involved relatively simple loops that did not suffer from

the added register pressure that cmov introduces.  In practice, the prognosis

for cmov being "optimal" is even worse than what I've observed in a controlled

environment.  Furthermore, to my knowledge the status of cmov vs. branch

performance on x86 will not be changing anytime soon.  cmov will continue to be

a liability well into the next couple architecture releases from Intel and AMD.

 Piledriver will have added fetch/decode resources but should also have a

smaller mispredict penalty, so its doubtful cmov will gain much advantages

there either.



Therefore I would recommend setting -fno-tree-loop-if-convert for all -march

matching Intel Core and AMD Bulldozer/Bobcat families.





There is one good use-case for cmov on x86:  Mis-predicted conditions inside of

loops.  Currently there's no way to force that behavior in situations where I,

the programmer, am fully aware that the condition is chaotic/random.  A builtin

cmov or condition hint would be nice.  For now I'm forced to address those

(fortunately infrequent) situations via inline asm.


[Bug tree-optimization/54073] [4.7 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-12-31 Thread pinskia at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



Andrew Pinski  changed:



   What|Removed |Added



 CC||dominiq at lps dot ens.fr



--- Comment #15 from Andrew Pinski  2012-12-31 
09:40:29 UTC ---

*** Bug 53346 has been marked as a duplicate of this bug. ***


[Bug tree-optimization/54073] [4.7 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-11-16 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



Jakub Jelinek  changed:



   What|Removed |Added



Summary|[4.7/4.8 Regression]|[4.7 Regression] SciMark

   |SciMark Monte Carlo test|Monte Carlo test

   |performance has seriously   |performance has seriously

   |decreased in recent GCC |decreased in recent GCC

   |releases|releases



--- Comment #14 from Jakub Jelinek  2012-11-16 
14:50:32 UTC ---

Hopefully fixed on the trunk, not planning to backport it right now.