http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56295
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2013-02-12 AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org |gnu.org | Target Milestone|--- |4.8.0 Ever Confirmed|0 |1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-12 12:21:19 UTC --- With -O3 -flto the loop is not peeled before loop-distribution and thus the inner loop is identified as memcpy. With -O3 the inner loop is peeled. That is because at cunrolli we see with -flto Statement _8 = b[i_1][j_2]; is probably executed at most 3 (bounded by 3) + 1 times in loop 2. vs. without -flto Statement _8 = b[i_1][j_2]; is executed at most 3 (bounded by 3) + 1 times in loop 1. With -flto we run into /* For arrays at the end of the structure, we are not guaranteed that they do not really extend over their declared size. However, for arrays of size greater than one, this is unlikely to be intended. */ if (array_at_struct_end_p (base)) { at_end = true; upper = false; for b[i_1][j_2] (where it is really a MEM[&b][i_1][j_2]). That's because array_at_struct_end_p doesn't consider MEM_REFs and we (in this case) needlessly wrap b in a MEM_REF during streaming out (so that at input time prevailing decl replacement does not change aliasing / tree code validity). We should probably undo this at streaming in time where possible. I have a patch that does this.