http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25621
--- Comment #13 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> --- (In reply to Joost VandeVondele from comment #12) > Both compilers fail to notice that S32 is basically the same code > hand-unrolled. with gcc 4.9 > ./a.out default loop 0.54291800000000001 hand optimized loop 0.54291700000000009 so, some progress, both versions of the loop give the same performance. Still not quite as good as ifort, however.