http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49442

           Summary: [4.5/4.6/4.7 Regression] Misaligned store support
                    pessimization
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: ja...@gcc.gnu.org
                CC: i...@gcc.gnu.org, revit...@gcc.gnu.org
            Target: x86_64-linux


__attribute__((noinline, noclone))
void baz (double *out1, double *out2, double *out3, double *in1, double *in2,
int len)
{
  for (int i = 0; i < len; ++i)
    {
      out1[i] = in1[i] * in2[i];
      out2[i] = in1[i] + in2[i];
      out3[i] = in1[i] - in2[i];
    }
}

double a[50000] __attribute__((aligned (32)));
int
main ()
{
  int i;
  for (i = 0; i < 500000; i++)
    baz (a + 0, a + 10000, a + 20000, a + 30000, a + 40000, 10000);
  return 0;
}

is measurably slower in 4.6 compared to 4.4 with -m64 -O3 -mtune=generic,
apparently starting with
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=148211
at least on Intel CPUs.
with r148210:
Strip out best and worst realtime result
minimum: 6.603036509 sec real / 0.000086529 sec CPU
maximum: 6.720307841 sec real / 0.000159148 sec CPU
average: 6.629486345 sec real / 0.000133896 sec CPU
stdev  : 0.024886889 sec real / 0.000020014 sec CPU
with r148211:
Strip out best and worst realtime result
minimum: 6.969550715 sec real / 0.000072647 sec CPU
maximum: 7.564913575 sec real / 0.000162211 sec CPU
average: 7.192333688 sec real / 0.000135634 sec CPU
stdev  : 0.101616835 sec real / 0.000022659 sec CPU

Reply via email to