http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58280
Bug ID: 58280 Summary: Missed Opportunity for Aligned Vectorized Load Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Consider void foo(int nr, int nc, int ldim, double *__restrict a, double *__restrict b) { a = __builtin_assume_aligned(a, 32); b = __builtin_assume_aligned(b, 32); ldim = (ldim >> 5) << 5; for (int i = 0; i < nr; i++) for (int j = 0; j < nc; j++) a[i*ldim + j] += b[i*ldim + j]; } Both GCC 4.7 and 4.8 on an AVX capable system with -march=native and -O3 vectorize the inner loop but utilise unaligned loads and stores. It should be possible to reason that as "a" and "b" are aligned and ldim is a multiple of 32 bytes that "a + i*ldim" and "b + i*ldim" are also 32-byte aligned. This would permit the inner loop to be vectorized with aligned loads.