------- Comment #16 from jv244 at cam dot ac dot uk 2009-09-01 09:13 ------- (In reply to comment #15) > Please try -O2 and -O2 -funroll-loops too, since -O3 is not always good for > speed. (It would be even better if -O2 is not slower and you can find out > what > the culprit is at -O3; this is not necessarily possible though).
you're right that, without -fschedule-insns -O2 is faster than -O3 on this case, but nothing comes close to 4.3 performance. adding '-fschedule-insns' to the fastest -O2 choice makes it 20% slower. All numbers with trunk: -O2 -march=native -funroll-loops -ffast-math: 4.032 -O2 -march=native -funroll-loops -ffast-math -fschedule-insns: 4.712 -O3 -march=native -funroll-loops -ffast-math: 4.408 -O2 -march=native -ffast-math: 11.373 -O2 -march=native -ffast-math -fschedule-insns: 11.409 -O3 -march=native -ffast-math: 4.296 -O3 -march=native -ffast-math -fschedule-insns: 4.656 I can test other flags if you've a hint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306