On 2/27/20 4:52 AM, Segher Boessenkool wrote:
On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:
The reason that homogeneous aggregates matter (at least somewhat) is that
the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
mangling formula that includes the length of parameters and return values.
Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
has a superior one, we chose to use ELFv2's calling convention and make use
of homogeneous aggregates for return values in registers for the case of
vectorized sincos.

Please look at the document to see the constraints we're under to fit into
the different OpenMP clauses and attributes.  It seems to me that we can
only define this for both powerpc64 and powerpc64le by establishing two
different calling conventions, which provides two different vector length
calculations for the sincos return value, and therefore requires two
different function implementations with different mangled names.  (Either
that, or we cripple vectorized sincos by requiring it to return values
through memory.)
I still don't see it.  For all ABIs the length of the arguments and
return value is the same, and homogeneous aggregates doesn't factor
in at all; that is just a detail whether something is passed in
registers or memory (as we have with many other ABIs as well, fwiw).

So why make this part of the mangling rules?

It is perfectly fine to design this with ELFv2 in mind, of course, but
making a dependency on the (current!) (very complex!) ELFv2 rules for
absolutely no reason at all is a mistake, in my opinion.

Upon reflection, I agree.  Bert, we need to make changes to the document to
reflect this:

(1) "Calling convention" should refer to ELFv1 for powerpc64 and ELFv2 for
powerpc64le.
(2) "Vector Length" should remove bullet 3, strike the word
"nonhomogeneous" in bullet 4, and strike the parenthetical clause in
bullet 4.
(3) "Ordering of Vector Arguments" should remove the example involving
homogeneous aggregates.

It also occurs to me that for bullets 4 and 5 in "Vector Length", the
CDT should be long long, not int, since we pass aggregates in pieces in
64-bit registers and/or chunks of memory.

Other small bugs:
 - Bullet 4 says "the CDT determine by a) or b) above", but the referents
should be "(1) or (2)" instead.
 - First line of "Compiler generated variants of vector functions" has
a typo ("umasked").

Segher, thanks for smacking my recalcitrant head until it understands...

Thanks,
Bill



Segher

Reply via email to