On Wed, Jan 09, 2019 at 11:56:03AM +0100, Kay F. Jahnke wrote: > The above is a typical example. So, to give a complete source 'vec_sqrt.cc': > > #include <cmath> > > extern float data [ 32768 ] ; > > extern void vf1() > { > #pragma vectorize enable > for ( int i = 0 ; i < 32768 ; i++ ) > data [ i ] = std::sqrt ( data [ i ] ) ; > } > > This has a large trip count, the loop is trivial. It's an ideal candidate > for autovectorization. When I compile this source, using > > g++ -O3 -mavx2 -S -o sqrt.s sqrt_gcc.cc
Generally you want -Ofast or -ffast-math or at least some suboptions of that if you want to vectorize floating point loops, because vectorization in many cases changes where FPU exceptions would be generated, can affect precision by reordering the ops etc. In the above case it is just that glibc declares the vector math functions for #ifdef __FAST_MATH__ only, as they have worse precision. Note, gcc doesn't recognize #pragma vectorize, you can use e.g. #pragma omp simd or #pragma GCC ivdep if you want to assert some properties of the loop the compiler can't easily prove itself that would help the vectorization. Jakub