http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58280

            Bug ID: 58280
           Summary: Missed Opportunity for Aligned Vectorized Load
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: freddie at witherden dot org

Consider

void foo(int nr, int nc, int ldim,
         double *__restrict a, double *__restrict b)
{
    a = __builtin_assume_aligned(a, 32);
    b = __builtin_assume_aligned(b, 32);

    ldim = (ldim >> 5) << 5;

    for (int i = 0; i < nr; i++)
        for (int j = 0; j < nc; j++)
            a[i*ldim + j] += b[i*ldim + j];
}

Both GCC 4.7 and 4.8 on an AVX capable system with -march=native and -O3
vectorize the inner loop but utilise unaligned loads and stores.  It should be
possible to reason that as "a" and "b" are aligned and ldim is a multiple of 32
bytes that "a + i*ldim" and "b + i*ldim" are also 32-byte aligned.  This would
permit the inner loop to be vectorized with aligned loads.

Reply via email to