https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363

--- Comment #24 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 26 May 2014, vincenzo.innocente at cern dot ch wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363
> 
> --- Comment #23 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 
> ---
> Which Syntax?
> I want to reuse the same code for the various architecture and let gcc deal
> with vectorization details.
> The best I manage to do to share code is something like this
> 
> namespace {
> inline
> float _sum0(float const *  x,
>            float const *  y, float const *  z) {
>   float sum=0;
>   for (int i=0; i!=1024; ++i)
>     sum += z[i]+x[i]*y[i];
>   return sum;
> }
> }
> 
> 
> float  __attribute__ ((__target__ ("arch=haswell")))
> sum1(float const *  x,
>      float const *  y, float const *  z) {
>   return _sum0(x,y,z);
> }
> 
> float  __attribute__ ((__target__ ("arch=nehalem")))
> sum1(float const *  x,
>      float const *  y, float const *  z) {
>   return _sum0(x,y,z);
> }

I think that's the desired interface (it was designed with the
expectation you'd use intrinsics in the special functions, not
simply let the autovectorizer do its work IIRC).

Reply via email to