"Du, Frank" <frank...@intel.com> writes: > The PR I committed provide a basic support for runtime dispatching. I > agree that complier should generate good vectorize for the non-null > data part but in fact it didn't, jedbrown point to it can force > complier to SIMD using some additional pragmas, something like > "#pragma omp simd reduction(+:sum)", I will try this pragma later but > need figure out if it need a linking against OpenMP.
It does not require linking OpenMP. You just compile with -fopenmp-simd (gcc/clang) or -qopenmp-simd (icc) so that it interprets the "omp simd" pragmas. (These can be captured in macros using _Pragma.) Note that you get automatic vectorization for this sort of thing without any OpenMP if you add -funsafe-math-optimizations (included in -ffast-math). https://gcc.godbolt.org/z/8thgru Many projects don't want -funsafe-math-optimizations because there are places where it can hurt numerical stability. ICC includes unsafe math in normal optimization levels while GCC and Clang are more conservative.