http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49442
Summary: [4.5/4.6/4.7 Regression] Misaligned store support pessimization Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ja...@gcc.gnu.org CC: i...@gcc.gnu.org, revit...@gcc.gnu.org Target: x86_64-linux __attribute__((noinline, noclone)) void baz (double *out1, double *out2, double *out3, double *in1, double *in2, int len) { for (int i = 0; i < len; ++i) { out1[i] = in1[i] * in2[i]; out2[i] = in1[i] + in2[i]; out3[i] = in1[i] - in2[i]; } } double a[50000] __attribute__((aligned (32))); int main () { int i; for (i = 0; i < 500000; i++) baz (a + 0, a + 10000, a + 20000, a + 30000, a + 40000, 10000); return 0; } is measurably slower in 4.6 compared to 4.4 with -m64 -O3 -mtune=generic, apparently starting with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=148211 at least on Intel CPUs. with r148210: Strip out best and worst realtime result minimum: 6.603036509 sec real / 0.000086529 sec CPU maximum: 6.720307841 sec real / 0.000159148 sec CPU average: 6.629486345 sec real / 0.000133896 sec CPU stdev : 0.024886889 sec real / 0.000020014 sec CPU with r148211: Strip out best and worst realtime result minimum: 6.969550715 sec real / 0.000072647 sec CPU maximum: 7.564913575 sec real / 0.000162211 sec CPU average: 7.192333688 sec real / 0.000135634 sec CPU stdev : 0.101616835 sec real / 0.000022659 sec CPU