On Wed, Feb 26, 2020 at 3:31 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Wed, Feb 26, 2020 at 07:55:53AM -0600, Bill Schmidt wrote: > > The hope is that we can create a vectorized version that returns values > > in registers rather than the by-ref parameters, and add code to GCC to > > copy things around correctly following the call. Ideally the signature of > > the vectorized version would be sth like > > > > struct retval {vector double, vector double}; > > retval vecsincos (vector double); > > > > In the typical case where calls to sincos are of the form > > > > sincos (val[i], &sinval[i], &cosval[i]); > > > > this would allow us to only store the values in the caller upon return, > > rather than store them in the callee and potentially reload them > > immediately in the caller. On some Power CPUs, the latter behavior can > > result in somewhat costly stalls if the consecutive accesses hit a timing > > window. > > But can't you do > #pragma omp declare simd linear(sinp, cosp) > void sincos (double x, double *sinp, double *cosp); > ? > That is something the vectorizer code could handle and for > for (int i = 0; i < 1024; i++) > sincos (val[i], &sinval[i], &cosval[i]); > just vectorize it as > for (int i = 0; i < 1024; i += vf) > _ZGVbN8vl8l8_sincos (*(vector double *)&val[i], &sinval[i], &cosval[i]); > Anything else will need specialized code to handle sincos specially in the > vectorizer.
I guess we'll need special code in the vectorizer anyway because in GIMPLE we'll have for (int i = 0; i < 1024; i++) { _Complex double tem = __builtin_cexpi (val[i]); sinval[i] = __real tem; cosval[i] = __imag tem; } we'd have to promote tem back to memory and the call to sincos (val[i], &__real tem, &__imag tem) virtually or explicitely. The vectorizer is currently not happy seeing _Complex (but dataref analysis would not be happy to see sincos). So we do need changes to the vectorizer. > > If you feel it isn't possible to do this, then we can abandon it. Right > > now my understanding is that GCC doesn't vectorize calls to sincos yet > > for any targets, so it would be moot except that we really should define > > what happens for the future. > > > > This calling convention would also be useful in the future for vectorizing > > functions that return complex values either by value or by reference. > > Only by value, you really don't know what the code does if something is > passed by reference, whether it is read, written into, or both etc. > And for _Complex {float,double}, e.g. the Intel ABI already specifies how to > pass them, just GCC isn't able to do that right now. Ah, ok. So what's missing is the standard function cexpi both GCC and libmvec can use. > > Well, as a matter of practicality, we don't have any of that implemented > > in the rs6000 back end, and we don't have any free resources to do that > > in GCC 11. Is there any documentation about what needs to be done to > > support this? I've always been under the impression that vectorizing for > > masking when there isn't any hardware support is a losing proposition, so > > we've not investigated it. > > You don't need to do pretty much anything, except set > clonei->mask_mode = VOIDmode, I think the generic code should handle that > everything beyond that, in particular add the mask argument and use it > both on the caller side and on the expansion of the to be vectorized clone. > > Jakub >