On Wed, Feb 26, 2020 at 3:31 PM Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Wed, Feb 26, 2020 at 07:55:53AM -0600, Bill Schmidt wrote:
> > The hope is that we can create a vectorized version that returns values
> > in registers rather than the by-ref parameters, and add code to GCC to
> > copy things around correctly following the call.  Ideally the signature of
> > the vectorized version would be sth like
> >
> >   struct retval {vector double, vector double};
> >   retval vecsincos (vector double);
> >
> > In the typical case where calls to sincos are of the form
> >
> >   sincos (val[i], &sinval[i], &cosval[i]);
> >
> > this would allow us to only store the values in the caller upon return,
> > rather than store them in the callee and potentially reload them
> > immediately in the caller.  On some Power CPUs, the latter behavior can
> > result in somewhat costly stalls if the consecutive accesses hit a timing
> > window.
>
> But can't you do
> #pragma omp declare simd linear(sinp, cosp)
> void sincos (double x, double *sinp, double *cosp);
> ?
> That is something the vectorizer code could handle and for
>   for (int i = 0; i < 1024; i++)
>     sincos (val[i], &sinval[i], &cosval[i]);
> just vectorize it as
>   for (int i = 0; i < 1024; i += vf)
>     _ZGVbN8vl8l8_sincos (*(vector double *)&val[i], &sinval[i], &cosval[i]);
> Anything else will need specialized code to handle sincos specially in the
> vectorizer.

I guess we'll need special code in the vectorizer anyway because in
GIMPLE we'll have

  for (int i = 0; i < 1024; i++)
   {
      _Complex double tem = __builtin_cexpi (val[i]);
      sinval[i] = __real tem;
      cosval[i] = __imag tem;
   }

we'd have to promote tem back to memory and the call to
sincos (val[i], &__real tem, &__imag tem) virtually or
explicitely.  The vectorizer is currently not happy seeing
_Complex (but dataref analysis would not be happy to see
sincos).  So we do need changes to the vectorizer.

> > If you feel it isn't possible to do this, then we can abandon it.  Right
> > now my understanding is that GCC doesn't vectorize calls to sincos yet
> > for any targets, so it would be moot except that we really should define
> > what happens for the future.
> >
> > This calling convention would also be useful in the future for vectorizing
> > functions that return complex values either by value or by reference.
>
> Only by value, you really don't know what the code does if something is
> passed by reference, whether it is read, written into, or both etc.
> And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
> pass them, just GCC isn't able to do that right now.

Ah, ok.  So what's missing is the standard function cexpi both GCC and
libmvec can use.

> > Well, as a matter of practicality, we don't have any of that implemented
> > in the rs6000 back end, and we don't have any free resources to do that
> > in GCC 11.  Is there any documentation about what needs to be done to
> > support this?  I've always been under the impression that vectorizing for
> > masking when there isn't any hardware support is a losing proposition, so
> > we've not investigated it.
>
> You don't need to do pretty much anything, except set
> clonei->mask_mode = VOIDmode, I think the generic code should handle that
> everything beyond that, in particular add the mask argument and use it
> both on the caller side and on the expansion of the to be vectorized clone.
>
>         Jakub
>

Reply via email to