http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #42 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-01-27 16:30:52 UTC --- Comparing -O3 -ffast-math -funroll-loops -fno-inline -fno-partial-inlining (thus generic arch, without prefetching): trunk: df live regs : 4.22 ( 6%) usr 0.04 ( 2%) sys 4.11 ( 5%) wall 0 kB ( 0%) ggc tree iv optimization : 3.92 ( 5%) usr 0.13 ( 5%) sys 4.29 ( 6%) wall 91066 kB (11%) ggc integrated RA : 5.57 ( 8%) usr 0.10 ( 4%) sys 5.93 ( 8%) wall 26408 kB ( 3%) ggc scheduling 2 : 3.73 ( 5%) usr 0.04 ( 2%) sys 3.85 ( 5%) wall 939 kB ( 0%) ggc TOTAL : 73.68 2.37 76.91 852775 kB 4.5: df live regs : 4.60 ( 7%) usr 0.02 ( 1%) sys 4.62 ( 6%) wall 0 kB ( 0%) ggc expand : 3.94 ( 6%) usr 0.17 ( 8%) sys 3.94 ( 6%) wall 62218 kB ( 8%) ggc integrated RA : 5.73 ( 8%) usr 0.02 ( 1%) sys 5.76 ( 8%) wall 22920 kB ( 3%) ggc reload : 3.78 ( 5%) usr 0.08 ( 4%) sys 3.86 ( 5%) wall 9291 kB ( 1%) ggc TOTAL : 68.98 2.01 71.22 828137 kB it would be nice to confirm that we are indeed much better with optimizing bounds-checking code. The prefetching issue is tracked as PR44688. So I'd close this either as a dup or as wontfix (it's a feature that we optimize loops with bounds-checking).