On Wed, Jan 09, 2019 at 11:56:03AM +0100, Kay F. Jahnke wrote:
> The above is a typical example. So, to give a complete source 'vec_sqrt.cc':
> 
> #include <cmath>
> 
> extern float data [ 32768 ] ;
> 
> extern void vf1()
> {
>   #pragma vectorize enable
>   for ( int i = 0 ; i < 32768 ; i++ )
>     data [ i ] = std::sqrt ( data [ i ] ) ;
> }
> 
> This has a large trip count, the loop is trivial. It's an ideal candidate
> for autovectorization. When I compile this source, using
> 
> g++ -O3 -mavx2 -S -o sqrt.s sqrt_gcc.cc

Generally you want -Ofast or -ffast-math or at least some suboptions of that
if you want to vectorize floating point loops, because vectorization in many
cases changes where FPU exceptions would be generated, can affect precision
by reordering the ops etc. In the above case it is just that glibc
declares the vector math functions for #ifdef __FAST_MATH__ only, as they
have worse precision.

Note, gcc doesn't recognize #pragma vectorize, you can use e.g.
#pragma omp simd
or
#pragma GCC ivdep
if you want to assert some properties of the loop the compiler can't easily
prove itself that would help the vectorization.

        Jakub

Reply via email to