http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54776
Bug #: 54776 Summary: [4.8 Regression] tramp3d-v4: 20% performance regression using -O3 Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: mar...@trippelsdorf.de With gcc-4.8 (--enable-checking=release): markus@x4 ~ % time c++ -w -O3 tramp3d-v4.cpp c++ -w -O3 tramp3d-v4.cpp 24.87s user 0.34s system 99% cpu 25.293 total markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20 ... Time spent in iteration: 7.35642 With gcc-4.7.2: markus@x4 ~ % time c++ -w -O3 tramp3d-v4.cpp c++ -w -O3 tramp3d-v4.cpp 25.15s user 0.33s system 99% cpu 25.568 total markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20 ... Time spent in iteration: 5.81199 LTO doesn't help much (gcc-4.8): markus@x4 ~ % time c++ -w -O3 -flto tramp3d-v4.cpp c++ -w -O3 -flto tramp3d-v4.cpp 45.78s user 0.95s system 99% cpu 47.012 total markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20 ... Time spent in iteration: 7.2111 (For comparison here are some clang results: markus@x4 ~ % time clang++ -w -O3 tramp3d-v4.cpp clang++ -w -O3 tramp3d-v4.cpp 14.67s user 0.12s system 99% cpu 14.874 total markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20 ... Time spent in iteration: 6.1923 markus@x4 ~ % time clang++ -w -O3 -flto tramp3d-v4.cpp clang++ -w -O3 -flto tramp3d-v4.cpp 20.28s user 0.16s system 99% cpu 20.535 total markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20 ... Time spent in iteration: 4.47936 That's an almost 28% improvement due to -flto)