[Bug ada/46006] New: vectorization outside of loops

jakub at gcc dot gnu.org Wed, 13 Oct 2010 07:24:06 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46006


           Summary: vectorization outside of loops
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ada
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: ja...@gcc.gnu.org
                CC: i...@gcc.gnu.org


Are there any plans to try to vectorize parts of code like:
struct A
{
  double x, y, z;
};

struct B
{
  struct A a, b;
};

struct C
{
  struct A c;
  double d;
};

__attribute__((noinline, noclone)) int
foo (const struct C *u, struct B v)
{
  double a, b, c, d;

  a = v.b.x * v.b.x + v.b.y * v.b.y + v.b.z * v.b.z;
  b = 2.0 * v.b.x * (v.a.x - u->c.x)
      + 2.0 * v.b.y * (v.a.y - u->c.y) + 2.0 * v.b.z * (v.a.z - u->c.z);
  c = u->c.x * u->c.x + u->c.y * u->c.y + u->c.z * u->c.z
      + v.a.x * v.a.x + v.a.y * v.a.y + v.a.z * v.a.z
      + 2.0 * (-u->c.x * v.a.x - u->c.y * v.a.y - u->c.z * v.a.z)
      - u->d * u->d;
  if ((d = b * b - 4.0 * a * c) < 0.0)
    return 0;
  return d;
}

int
main (void)
{
  int i, j;
  struct C c = { { 1.0, 1.0, 1.0 }, 1.0 };
  struct B b = { { 1.0, 1.0, 1.0 }, { 1.0, 1.0, 1.0 } };
  for (i = 0; i < 100000000; i++)
    {
      asm volatile ("" : : "r" (&c), "r" (&b) : "memory");
      j = foo (&c, b);
      asm volatile ("" : : "r" (j));
    }
  return 0;
}
(this is the hot spot from c-ray benchmark, the function is actually larger but
at least according to callgrind in most cases the early return on < 0.0
happens;
as the function is large and called from multiple spots, it isn't inlined).
I'd say (though, haven't tried to code it by hand using intrinsics) that by
doing many of the multiplications/additions in parallel (especially for AVX)
there could be significant speedups (-O3 -ffast-math).

[Bug ada/46006] New: vectorization outside of loops

Reply via email to