Hello, I have been playing with gcc's new (to me) auto vectorization optimizations. I have a particular loop for which I have made external provisions to ensure that the data is 16-byte aligned. I have tried everything I can think of to give gcc the hint that it is operating on aligned data, but still the vectorizer warns that it is operating on unaligned data and generates the less efficient MOVLPS/MOVUPS instead of MOVAPS.
The code is like this: #define SSE __attribute__((aligned (16))) typedef float matrix_t[100][1024]; matrix_t aa SSE, bb SSE, cc SSE; void calc(float *a, float *b, float *c) { int i, n = 1024; for (i=0; i<n; ++i) { a[i] = b[i] / (b[i] + c[i]); } } int main(void) { int i, n = 100; for (i=0; i<n; ++i) { calc(a[i], b[i], c[i]); } } gcc rejects if I specify alignment attributes on the formal parameters (obviously it was dumb to even try that), and there does not seem to be a way to get the alignment hint to apply to the object referenced by the pointer instead of the pointer itself. In my application it is important that the function doing the computations remains abstracted away from the data definitions, as there is over 1G of data dynamically arranged and the actual alignment provisions are made with mmap(). Does anyone have a suggestion? Regards, Michael James