On 2/26/20 2:18 AM, Jakub Jelinek wrote:
On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:
The reason that homogeneous aggregates matter (at least somewhat) is that
the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
mangling formula that includes the length of parameters and return values.
Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
has a superior one, we chose to use ELFv2's calling convention and make use
of homogeneous aggregates for return values in registers for the case of
vectorized sincos.
Can you please explain how do you want to pass the
void sincos (double, double *, double *);
arguments? I must say it isn't entirely clear from the document.
You talk there about double[2], but sincos certainly doesn't have such an
argument.
The hope is that we can create a vectorized version that returns values
in registers rather than the by-ref parameters, and add code to GCC to
copy things around correctly following the call. Ideally the signature of
the vectorized version would be sth like
struct retval {vector double, vector double};
retval vecsincos (vector double);
In the typical case where calls to sincos are of the form
sincos (val[i], &sinval[i], &cosval[i]);
this would allow us to only store the values in the caller upon return,
rather than store them in the callee and potentially reload them
immediately in the caller. On some Power CPUs, the latter behavior can
result in somewhat costly stalls if the consecutive accesses hit a timing
window.
If you feel it isn't possible to do this, then we can abandon it. Right
now my understanding is that GCC doesn't vectorize calls to sincos yet
for any targets, so it would be moot except that we really should define
what happens for the future.
This calling convention would also be useful in the future for vectorizing
functions that return complex values either by value or by reference.
Also, I'd say ignoring the masked variants is a mistake, are you going to
warn any time the user uses inbranch or even doesn't specify notinbranch?
The masking can be implemented even without highly specialized instructions,
e.g. on x86 only AVX512F has full masking support, for older ISAs all that
is there is conditional store or e.g. for integral operations that can't
trap/raise exceptions just doing blend-like operations (or even and/or) is
all that is needed; just let the vectorizer do its job.
Well, as a matter of practicality, we don't have any of that implemented
in the rs6000 back end, and we don't have any free resources to do that
in GCC 11. Is there any documentation about what needs to be done to
support this? I've always been under the impression that vectorizing for
masking when there isn't any hardware support is a losing proposition, so
we've not investigated it.
Thanks,
Bill
Even if you don't want it for libmvec, just use
__attribute__((simd ("notinbranch")))
for those, but allow the user to use it where it makes sense.
Jakub