https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363
--- Comment #24 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 26 May 2014, vincenzo.innocente at cern dot ch wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363 > > --- Comment #23 from vincenzo Innocente <vincenzo.innocente at cern dot ch> > --- > Which Syntax? > I want to reuse the same code for the various architecture and let gcc deal > with vectorization details. > The best I manage to do to share code is something like this > > namespace { > inline > float _sum0(float const * x, > float const * y, float const * z) { > float sum=0; > for (int i=0; i!=1024; ++i) > sum += z[i]+x[i]*y[i]; > return sum; > } > } > > > float __attribute__ ((__target__ ("arch=haswell"))) > sum1(float const * x, > float const * y, float const * z) { > return _sum0(x,y,z); > } > > float __attribute__ ((__target__ ("arch=nehalem"))) > sum1(float const * x, > float const * y, float const * z) { > return _sum0(x,y,z); > } I think that's the desired interface (it was designed with the expectation you'd use intrinsics in the special functions, not simply let the autovectorizer do its work IIRC).