Re: GLIBC libmvec status

2020-03-05 Thread Segher Boessenkool
On Fri, Feb 28, 2020 at 05:31:56PM +0100, Jakub Jelinek wrote:
> On Fri, Feb 28, 2020 at 04:23:03PM +, GT wrote:
> > Do we want to change the name and title of the document since Segher 
> > doesn't believe it
> > is an ABI. My initial suggestion: "POWER Architecture Specification of 
> > Scalar Function
> > to Vector Function Mapping".
> 
> It is an ABI, similarly like e.g. the C++ Itanium ABI is an ABI, it specifies
> mangling of certain functions and how the function argument types and return
> types are transformed.

It does not say anything about the machine code generated, or about the
binary format generated, other than the naming of symbols.  It is
confusing to call this an "ABI": you still need to have an actual ABI
underneath, and this itself is not a "binary interface".  In some other
contexts similar things are called "binding", but that is not a very
good name either :-/


Segher


Re: GLIBC libmvec status

2020-03-04 Thread GT
‐‐‐ Original Message ‐‐‐
On Monday, March 2, 2020 12:14 PM, Bill Schmidt  wrote:

> On 3/2/20 11:10 AM, Tulio Magno Quites Machado Filho wrote:
>
> > Bill Schmidt  writes:
> >
> > > One tiny nit on the document:  For the "b"  value, let's just say 
> > > "VSX" rather than
> > > "VSX as defined in PowerISA v2.07)."  We will plan to only change  
> > > values in case
> > > a different vector length is defined in future.
> >
> > That change would have more implications: all libmvec functions would have 
> > to
> > work on Power ISA v2.06 HW too.  But half of the functions do use v2.07
> > instructions now.
>
> Ah, I see.  Well, then language such as "VSX defined at least at the level of
> PowerISA v2.07" would be appropriate.  We want to define a minimum subset 
> without
> further implied constraint.  (Higher levels can be handled with ifunc without
> needing to reference this in the ABI, as previously discussed.)
>

Changed description of 'b' ISA to exactly the quoted sentence above.

Bert.


Re: GLIBC libmvec status

2020-03-03 Thread GT
‐‐‐ Original Message ‐‐‐
On Monday, March 2, 2020 4:59 PM, Jakub Jelinek  wrote:

> Indeed, there aren't any yet on the vectorizer side, I thought I've 
> implemented it
> already in the vectorizer but apparently didn't, just the omp-simd-clone.c 
> part is
> implemented (the more important part, as it matters for the ABI).

What is in omp-simd-clone.c? What is missing from the vectorizer? My assumption 
was that
the implementation of vector function masking was complete, but no test was 
created to
verify the functionality.

> A testcase could
> be something along the lines of
> #pragma omp declare simd
> int foo (int, int);
>
> void
> bar (int *a, int *b, int *c)
> {
> #pragma omp simd
> for (int i = 0; i < 1024; i++)
> {
> int d = b[i], e = c[i], f;
> if (b[i] < 20)
> f = foo (d, e);
> else
> f = d + e;
> }
> }

I thought the test would be more like:

#pragma omp declare simd
int
foo (int *x, int *y)
{
  *y = *x + 2;
}

void
bar (int *a, float *b, float *c)
{
#pragma omp simd
for (int i = 0; i < 1024; i++)
{
int d = b[i], e = c[i], f;
if ( i % 2)
  f = foo (d, e);
}
}

The point being that only items at odd indices are updated. That would require
masking to avoid altering items at even indices.

Bert.


Re: GLIBC libmvec status

2020-03-02 Thread Jakub Jelinek
On Mon, Mar 02, 2020 at 09:40:59PM +, GT wrote:
> Searching openmp.org located document "OpenMP API Examples". The relevant 
> example
> for inbranch/notinbranch shows very simple functions (SIMD.6.c). GCC testsuite
> functions are similarly simple.
> Wouldn't the same effect be achieved by letting GCC inline such functions and 
> having
> the loop autovectorizer handle the resulting code?

If it is defined in headers and inlinable, sure, then you don't need to mark
it any way.
The pragmas are mainly for functions that aren't inlinable for whatever
reason.

> > There are various tests that cover it, look e.g. at tests that require 
> > vect_simd_clones
> > effective target (e.g. in gcc/testsuite//{vect,gomp}/ and 
> > libgomp/testsuite//).
> >
> 
> Sorry, I can't identify any test that ensures a masked vector function 
> variant produces
> expected results. I'll check again but I need more help here.

Indeed, there aren't any yet on the vectorizer side, I thought I've implemented 
it
already in the vectorizer but apparently didn't, just the omp-simd-clone.c part 
is
implemented (the more important part, as it matters for the ABI).  A testcase 
could
be something along the lines of
#pragma omp declare simd
int foo (int, int);

void
bar (int *a, int *b, int *c)
{
  #pragma omp simd
  for (int i = 0; i < 1024; i++)
{
  int d = b[i], e = c[i], f;
  if (b[i] < 20)
f = foo (d, e);
  else
f = d + e;
}
}
To make this work, one would need to tweak tree-if-conv.c (invent some way
how to represent the conditional calls in the IL during the vect pass,
probably some new internal function) and then handle that in
vectorizable_simd_clone_call.

Jakub



Re: GLIBC libmvec status

2020-03-02 Thread GT
‐‐‐ Original Message ‐‐‐
On Monday, March 2, 2020 3:31 PM, Jakub Jelinek  wrote:

> On Mon, Mar 02, 2020 at 08:20:01PM +, GT wrote:
>
> > Which raises the question: what use-case motivated allowing the compiler
> > to auto-vectorize user defined functions? From having manually created 
> > vector
>
> The feature made it into the OpenMP standard (already OpenMP 4.0) and so got
> implemented as part of the OpenMP 4.0 implementation.
>
> > versions of sin, cos and other libmvec functions, I'm wondering how GCC is 
> > able to
> > autovectorize a non-trivial user defined function.
>

Searching openmp.org located document "OpenMP API Examples". The relevant 
example
for inbranch/notinbranch shows very simple functions (SIMD.6.c). GCC testsuite
functions are similarly simple.
Wouldn't the same effect be achieved by letting GCC inline such functions and 
having
the loop autovectorizer handle the resulting code?


> There are various tests that cover it, look e.g. at tests that require 
> vect_simd_clones
> effective target (e.g. in gcc/testsuite//{vect,gomp}/ and 
> libgomp/testsuite//).
>

Sorry, I can't identify any test that ensures a masked vector function variant 
produces
expected results. I'll check again but I need more help here.

Bert.


Re: GLIBC libmvec status

2020-03-02 Thread GT
‐‐‐ Original Message ‐‐‐
On Thursday, February 27, 2020 9:52 AM, Jakub Jelinek  wrote:

> On Thu, Feb 27, 2020 at 08:47:19AM -0600, Bill Schmidt wrote:
>
> > But is this actually a good idea? It seems to me this will generate lousy
> > code in the absence of hardware support. Won't we be better off warning and
> > ignoring the directive, leaving the code in scalar form?
>
> Depends on the exact code, I think sometimes it will be just fine and will
> allow vectorizing something that really couldn't be otherwise.
> Isn't it better to leave it for the user to decide?
> They can always ask for it not to be generated (add notinbranch) if it isn't
> worthwhile.
>

I'm trying to understand what the x86_64 implementation does w.r.t. masked 
versions
of user defined functions. I haven't found any test under directory testsuite 
which verifies
that compiler-generated versions (from inbranch being specified) produce 
expected
results.

Which raises the question: what use-case motivated allowing the compiler
to auto-vectorize user defined functions? From having manually created vector
versions of sin, cos and other libmvec functions, I'm wondering how GCC is able 
to
autovectorize a non-trivial user defined function.

Any pointers to relevant tests and documentation will be really appreciated.

Thanks.
Bert.


Re: GLIBC libmvec status

2020-03-02 Thread Jakub Jelinek
On Mon, Mar 02, 2020 at 08:20:01PM +, GT wrote:
> Which raises the question: what use-case motivated allowing the compiler
> to auto-vectorize user defined functions? From having manually created vector

The feature made it into the OpenMP standard (already OpenMP 4.0) and so got
implemented as part of the OpenMP 4.0 implementation.

> versions of sin, cos and other libmvec functions, I'm wondering how GCC is 
> able to
> autovectorize a non-trivial user defined function.

There are various tests that cover it, look e.g. at tests that require 
vect_simd_clones
effective target (e.g. in gcc/testsuite/*/{vect,gomp}/ and 
libgomp/testsuite/*/).

In the OpenMP standard, see e.g.
https://www.openmp.org/spec-html/5.0/openmpsu42.html#x65-1390002.9.3

Jakub



Re: GLIBC libmvec status

2020-03-02 Thread Bill Schmidt

On 3/2/20 11:10 AM, Tulio Magno Quites Machado Filho wrote:

Bill Schmidt  writes:


One tiny nit on the document:  For the "b"  value, let's just say "VSX" 
rather than
"VSX as defined in PowerISA v2.07)."  We will plan to only change  values 
in case
a different vector length is defined in future.

That change would have more implications: all libmvec functions would have to
work on Power ISA v2.06 HW too.  But half of the functions do use v2.07
instructions now.


Ah, I see.  Well, then language such as "VSX defined at least at the level of
PowerISA v2.07" would be appropriate.  We want to define a minimum subset 
without
further implied constraint.  (Higher levels can be handled with ifunc without
needing to reference this in the ABI, as previously discussed.)

Thanks,
Bill



Re: GLIBC libmvec status

2020-03-02 Thread Tulio Magno Quites Machado Filho
Bill Schmidt  writes:

> One tiny nit on the document:  For the "b"  value, let's just say "VSX" 
> rather than
> "VSX as defined in PowerISA v2.07)."  We will plan to only change  
> values in case
> a different vector length is defined in future.

That change would have more implications: all libmvec functions would have to
work on Power ISA v2.06 HW too.  But half of the functions do use v2.07
instructions now.

-- 
Tulio Magno


Re: GLIBC libmvec status

2020-03-02 Thread Bill Schmidt

In 2/28/20 10:31 AM, Jakub Jelinek wrote:

On Fri, Feb 28, 2020 at 04:23:03PM +, GT wrote:

Do we want to change the name and title of the document since Segher doesn't 
believe it
is an ABI. My initial suggestion: "POWER Architecture Specification of Scalar 
Function
to Vector Function Mapping".

It is an ABI, similarly like e.g. the C++ Itanium ABI is an ABI, it specifies
mangling of certain functions and how the function argument types and return
types are transformed.


Agreed, let's leave that as is.

One tiny nit on the document:  For the "b"  value, let's just say "VSX" 
rather than
"VSX as defined in PowerISA v2.07)."  We will plan to only change  values 
in case
a different vector length is defined in future.

Looks good otherwise!

Thanks,
Bill



Jakub



Re: GLIBC libmvec status

2020-02-28 Thread GT
‐‐‐ Original Message ‐‐‐
On Thursday, February 27, 2020 4:32 PM, Bill Schmidt  
wrote:

> On 2/27/20 2:21 PM, Bill Schmidt wrote:
>
> > On 2/27/20 12:48 PM, GT wrote:
> >
> > > Done.
> > >
> > > The updated document is at:
> > > https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html
>
> Looks good.  Can you please also remove the 'c' ABI from the mangling, as 
> earlier agreed?
>

1. Reference to 'c' ABI deleted.
2. In final paragraph of section "Vector Function ABI Overview", removed 
reference to
ELFv2 specification. Replaced with reference to OpenPOWER IBM Power ISA v2.07.
3. Cleaned up display of angle brackets in section "Vector Function Name 
Mangling".

Question:
Do we want to change the name and title of the document since Segher doesn't 
believe it
is an ABI. My initial suggestion: "POWER Architecture Specification of Scalar 
Function
to Vector Function Mapping".

Bert.


Re: GLIBC libmvec status

2020-02-28 Thread Jakub Jelinek
On Fri, Feb 28, 2020 at 04:23:03PM +, GT wrote:
> Do we want to change the name and title of the document since Segher doesn't 
> believe it
> is an ABI. My initial suggestion: "POWER Architecture Specification of Scalar 
> Function
> to Vector Function Mapping".

It is an ABI, similarly like e.g. the C++ Itanium ABI is an ABI, it specifies
mangling of certain functions and how the function argument types and return
types are transformed.

Jakub



Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt

On 2/27/20 2:21 PM, Bill Schmidt wrote:



On 2/27/20 12:48 PM, GT wrote:


Done.

The updated document is at:
https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html


Looks good.  Can you please also remove the 'c' ABI from the mangling, as 
earlier agreed?

Thanks!
Bill



Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt



On 2/27/20 12:48 PM, GT wrote:

‐‐‐ Original Message ‐‐‐
On Thursday, February 27, 2020 9:26 AM, Bill Schmidt  
wrote:


Upon reflection, I agree.  Bert, we need to make changes to the document to
reflect this:

(1) "Calling convention" should refer to ELFv1 for powerpc64 and ELFv2 for
powerpc64le.

Done. Have provided names and links to respective ABI documents but no longer
explicitly refer to ELF version.


(2) "Vector Length" should remove bullet 3, strike the word
"nonhomogeneous" in bullet 4, and strike the parenthetical clause in
bullet 4.
(3) "Ordering of Vector Arguments" should remove the example involving
homogeneous aggregates.


Done.


It also occurs to me that for bullets 4 and 5 in "Vector Length", the
CDT should be long long, not int, since we pass aggregates in pieces in
64-bit registers and/or chunks of memory.


That determination of Vector Length is common for all architectures and is
implemented in function simd_clone_compute_base_data_type. If we do really
need PPC64 to be different, we'll have to allow the function to be replaced
by architecture-specific versions. Before we do that, do you have
an example of code which ends up with incorrect vectorization with the
existing CDT of int?


No, and I'll withdraw the suggestion.  It seems rather arbitrary in any event.

Thanks for the updates!

Bill




Other small bugs:
  - Bullet 4 says "the CDT determine by a) or b) above", but the referents
should be "(1) or (2)" instead.
  - First line of "Compiler generated variants of vector functions" has
a typo ("umasked").


Done.

The updated document is at:
https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html


Re: GLIBC libmvec status

2020-02-27 Thread GT
‐‐‐ Original Message ‐‐‐
On Thursday, February 27, 2020 9:26 AM, Bill Schmidt  
wrote:

>
> Upon reflection, I agree.  Bert, we need to make changes to the document to
> reflect this:
>
> (1) "Calling convention" should refer to ELFv1 for powerpc64 and ELFv2 for
> powerpc64le.

Done. Have provided names and links to respective ABI documents but no longer
explicitly refer to ELF version.

> (2) "Vector Length" should remove bullet 3, strike the word
> "nonhomogeneous" in bullet 4, and strike the parenthetical clause in
> bullet 4.
> (3) "Ordering of Vector Arguments" should remove the example involving
> homogeneous aggregates.
>

Done.

> It also occurs to me that for bullets 4 and 5 in "Vector Length", the
> CDT should be long long, not int, since we pass aggregates in pieces in
> 64-bit registers and/or chunks of memory.
>

That determination of Vector Length is common for all architectures and is
implemented in function simd_clone_compute_base_data_type. If we do really
need PPC64 to be different, we'll have to allow the function to be replaced
by architecture-specific versions. Before we do that, do you have
an example of code which ends up with incorrect vectorization with the
existing CDT of int?

> Other small bugs:
>  - Bullet 4 says "the CDT determine by a) or b) above", but the referents
> should be "(1) or (2)" instead.
>  - First line of "Compiler generated variants of vector functions" has
> a typo ("umasked").
>

Done.

The updated document is at:
https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html


Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt



On 2/27/20 9:30 AM, Jakub Jelinek wrote:

On Thu, Feb 27, 2020 at 09:19:25AM -0600, Bill Schmidt wrote:

On 2/27/20 8:52 AM, Jakub Jelinek wrote:

On Thu, Feb 27, 2020 at 08:47:19AM -0600, Bill Schmidt wrote:

But is this actually a good idea?  It seems to me this will generate lousy
code in the absence of hardware support.  Won't we be better off warning and
ignoring the directive, leaving the code in scalar form?

Depends on the exact code, I think sometimes it will be just fine and will
allow vectorizing something that really couldn't be otherwise.
Isn't it better to leave it for the user to decide?
They can always ask for it not to be generated (add notinbranch) if it isn't
worthwhile.

You need a high ratio of unguarded code to guarded code in order to pay for all
those vector extract and reconstruct operations.  Sure, some code will be fine,
but a lot of code will be lousy.  This will be particularly true on older
hardware with a less exhaustive set of vector operations.

Why?  E.g. for integral code other than division or memory loads/stores where
nothing will really trap, you can just perform it unguarded.
Just use whatever the vectorizer does right now for conditional code, and if
that isn't as efficient as it could be given a particular HW/ISA, try to improve
it?


If that's how the vectorizer is working today, then my concerns are certainly
lessened.  It's been a while since I've seen how the vectorizer and 
if-conversion
interact, so my perspective is probably outdated.  We'll take a look at it.

Thanks for the discussion!

Bill



I really don't see how is it different say from SSE2 on x86 or even AVX.

Jakub



Re: GLIBC libmvec status

2020-02-27 Thread Jakub Jelinek
On Thu, Feb 27, 2020 at 09:19:25AM -0600, Bill Schmidt wrote:
> On 2/27/20 8:52 AM, Jakub Jelinek wrote:
> > On Thu, Feb 27, 2020 at 08:47:19AM -0600, Bill Schmidt wrote:
> > > But is this actually a good idea?  It seems to me this will generate lousy
> > > code in the absence of hardware support.  Won't we be better off warning 
> > > and
> > > ignoring the directive, leaving the code in scalar form?
> > Depends on the exact code, I think sometimes it will be just fine and will
> > allow vectorizing something that really couldn't be otherwise.
> > Isn't it better to leave it for the user to decide?
> > They can always ask for it not to be generated (add notinbranch) if it isn't
> > worthwhile.
> 
> You need a high ratio of unguarded code to guarded code in order to pay for 
> all
> those vector extract and reconstruct operations.  Sure, some code will be 
> fine,
> but a lot of code will be lousy.  This will be particularly true on older
> hardware with a less exhaustive set of vector operations.

Why?  E.g. for integral code other than division or memory loads/stores where
nothing will really trap, you can just perform it unguarded.
Just use whatever the vectorizer does right now for conditional code, and if
that isn't as efficient as it could be given a particular HW/ISA, try to improve
it?

I really don't see how is it different say from SSE2 on x86 or even AVX.

Jakub



Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt

On 2/27/20 8:52 AM, Jakub Jelinek wrote:

On Thu, Feb 27, 2020 at 08:47:19AM -0600, Bill Schmidt wrote:

But is this actually a good idea?  It seems to me this will generate lousy
code in the absence of hardware support.  Won't we be better off warning and
ignoring the directive, leaving the code in scalar form?

Depends on the exact code, I think sometimes it will be just fine and will
allow vectorizing something that really couldn't be otherwise.
Isn't it better to leave it for the user to decide?
They can always ask for it not to be generated (add notinbranch) if it isn't
worthwhile.


You need a high ratio of unguarded code to guarded code in order to pay for all
those vector extract and reconstruct operations.  Sure, some code will be fine,
but a lot of code will be lousy.  This will be particularly true on older
hardware with a less exhaustive set of vector operations.

In the lousy-code case, my concern is that the user won't be savvy enough to
understand they should add notinbranch.  They'll just notice that their code
runs badly on Power and either complain (good, then we can explain it) or
abandon porting existing code to Power (very bad, and we may never know).
I don't like the downside, and the upside is quite unpredictable.

Bill



Jakub



Re: GLIBC libmvec status

2020-02-27 Thread Jakub Jelinek
On Thu, Feb 27, 2020 at 08:47:19AM -0600, Bill Schmidt wrote:
> But is this actually a good idea?  It seems to me this will generate lousy
> code in the absence of hardware support.  Won't we be better off warning and
> ignoring the directive, leaving the code in scalar form?

Depends on the exact code, I think sometimes it will be just fine and will
allow vectorizing something that really couldn't be otherwise.
Isn't it better to leave it for the user to decide?
They can always ask for it not to be generated (add notinbranch) if it isn't
worthwhile.

Jakub



Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt

On 2/26/20 8:31 AM, Jakub Jelinek wrote:

On Wed, Feb 26, 2020 at 07:55:53AM -0600, Bill Schmidt wrote:

The hope is that we can create a vectorized version that returns values
in registers rather than the by-ref parameters, and add code to GCC to
copy things around correctly following the call.  Ideally the signature of
the vectorized version would be sth like

   struct retval {vector double, vector double};
   retval vecsincos (vector double);

In the typical case where calls to sincos are of the form

   sincos (val[i], &sinval[i], &cosval[i]);

this would allow us to only store the values in the caller upon return,
rather than store them in the callee and potentially reload them
immediately in the caller.  On some Power CPUs, the latter behavior can
result in somewhat costly stalls if the consecutive accesses hit a timing
window.

But can't you do
#pragma omp declare simd linear(sinp, cosp)
void sincos (double x, double *sinp, double *cosp);
?
That is something the vectorizer code could handle and for
   for (int i = 0; i < 1024; i++)
 sincos (val[i], &sinval[i], &cosval[i]);
just vectorize it as
   for (int i = 0; i < 1024; i += vf)
 _ZGVbN8vl8l8_sincos (*(vector double *)&val[i], &sinval[i], &cosval[i]);
Anything else will need specialized code to handle sincos specially in the
vectorizer.


After reading all the discussion on this thread, yes, I agree for now.
It will be good for everybody if we can get the vectorized cexpi sorted
out at some point, which will give us a superior interface.


If you feel it isn't possible to do this, then we can abandon it.  Right
now my understanding is that GCC doesn't vectorize calls to sincos yet
for any targets, so it would be moot except that we really should define
what happens for the future.

This calling convention would also be useful in the future for vectorizing
functions that return complex values either by value or by reference.

Only by value, you really don't know what the code does if something is
passed by reference, whether it is read, written into, or both etc.
And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
pass them, just GCC isn't able to do that right now.


Per the fork of the thread with Segher, I've cried uncle on the specifics
of the calling convention. :)




Well, as a matter of practicality, we don't have any of that implemented
in the rs6000 back end, and we don't have any free resources to do that
in GCC 11.  Is there any documentation about what needs to be done to
support this?  I've always been under the impression that vectorizing for
masking when there isn't any hardware support is a losing proposition, so
we've not investigated it.

You don't need to do pretty much anything, except set
clonei->mask_mode = VOIDmode, I think the generic code should handle that
everything beyond that, in particular add the mask argument and use it
both on the caller side and on the expansion of the to be vectorized clone.


But is this actually a good idea?  It seems to me this will generate lousy
code in the absence of hardware support.  Won't we be better off warning and
ignoring the directive, leaving the code in scalar form?

If and when we have hardware support for vector masking, I'll be happy to
remove this restriction, but I need more convincing to do it now.

Thanks,
Bill



Jakub



Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt



On 2/27/20 4:52 AM, Segher Boessenkool wrote:

On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:

The reason that homogeneous aggregates matter (at least somewhat) is that
the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
mangling formula that includes the length of parameters and return values.
Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
has a superior one, we chose to use ELFv2's calling convention and make use
of homogeneous aggregates for return values in registers for the case of
vectorized sincos.

Please look at the document to see the constraints we're under to fit into
the different OpenMP clauses and attributes.  It seems to me that we can
only define this for both powerpc64 and powerpc64le by establishing two
different calling conventions, which provides two different vector length
calculations for the sincos return value, and therefore requires two
different function implementations with different mangled names.  (Either
that, or we cripple vectorized sincos by requiring it to return values
through memory.)

I still don't see it.  For all ABIs the length of the arguments and
return value is the same, and homogeneous aggregates doesn't factor
in at all; that is just a detail whether something is passed in
registers or memory (as we have with many other ABIs as well, fwiw).

So why make this part of the mangling rules?

It is perfectly fine to design this with ELFv2 in mind, of course, but
making a dependency on the (current!) (very complex!) ELFv2 rules for
absolutely no reason at all is a mistake, in my opinion.


Upon reflection, I agree.  Bert, we need to make changes to the document to
reflect this:

(1) "Calling convention" should refer to ELFv1 for powerpc64 and ELFv2 for
powerpc64le.
(2) "Vector Length" should remove bullet 3, strike the word
"nonhomogeneous" in bullet 4, and strike the parenthetical clause in
bullet 4.
(3) "Ordering of Vector Arguments" should remove the example involving
homogeneous aggregates.

It also occurs to me that for bullets 4 and 5 in "Vector Length", the
CDT should be long long, not int, since we pass aggregates in pieces in
64-bit registers and/or chunks of memory.

Other small bugs:
 - Bullet 4 says "the CDT determine by a) or b) above", but the referents
should be "(1) or (2)" instead.
 - First line of "Compiler generated variants of vector functions" has
a typo ("umasked").

Segher, thanks for smacking my recalcitrant head until it understands...

Thanks,
Bill




Segher


Re: GLIBC libmvec status

2020-02-27 Thread Jakub Jelinek
On Thu, Feb 27, 2020 at 11:56:49AM +0100, Richard Biener wrote:
> > > This calling convention would also be useful in the future for vectorizing
> > > functions that return complex values either by value or by reference.
> >
> > Only by value, you really don't know what the code does if something is
> > passed by reference, whether it is read, written into, or both etc.
> > And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
> > pass them, just GCC isn't able to do that right now.
> 
> Ah, ok.  So what's missing is the standard function cexpi both GCC and
> libmvec can use.

That, plus adjust omp-simd-clone.c and the backends so that they do support
the complex modes and essentially transform those into passing/returning of
either vector of the complex elts with twice as many subparts, or twice as
many vectors, like e.g. the Intel ABI specifies.  E.g. for return type
adjustment, right now we have:
  t = TREE_TYPE (TREE_TYPE (fndecl));
  if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
veclen = node->simdclone->vecsize_int;
  else
veclen = node->simdclone->vecsize_float;
  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
  if (veclen > node->simdclone->simdlen)
veclen = node->simdclone->simdlen;
  if (POINTER_TYPE_P (t))
t = pointer_sized_int_node;
  if (veclen == node->simdclone->simdlen)
t = build_vector_type (t, node->simdclone->simdlen);
  else
{
  t = build_vector_type (t, veclen);
  t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
}
and we'd need to deal with the complex types accordingly.
And of course then to teach the vectorizer.

The Intel ABI e.g. for SSE2 (their 'x' letter, which roughly matches our 'b'
letter) they have:
sizeof  VLEN=2  VLEN=4  VLEN=8  VLEN=16
float   4   1*MS128 1*MS128 2*MS128 4*MS128
double  8   1*MD128 2*MD128 4*MD128 8*MD128
float
complex 8   1*MS128 2*MS128 4*MS128 8*MS128
double
complex 16  2*MD128 4*MD128 8*MD128 16*MD128
where MS128 is __m128 and MD128 __m128d, i.e. float
__attribute__((vector_size (16))) and double __attribute__((vector_size (16))).

I'll need to check ICC on godbolt how they actually pass the complex,
whether it is real0 imag0 real1 imag1 real2 imag2 real3 imag3 or
real0 real1 real2 real3 imag0 imag1 imag2 imag3.

Jakub



Re: GLIBC libmvec status

2020-02-27 Thread Richard Biener
On Wed, Feb 26, 2020 at 3:31 PM Jakub Jelinek  wrote:
>
> On Wed, Feb 26, 2020 at 07:55:53AM -0600, Bill Schmidt wrote:
> > The hope is that we can create a vectorized version that returns values
> > in registers rather than the by-ref parameters, and add code to GCC to
> > copy things around correctly following the call.  Ideally the signature of
> > the vectorized version would be sth like
> >
> >   struct retval {vector double, vector double};
> >   retval vecsincos (vector double);
> >
> > In the typical case where calls to sincos are of the form
> >
> >   sincos (val[i], &sinval[i], &cosval[i]);
> >
> > this would allow us to only store the values in the caller upon return,
> > rather than store them in the callee and potentially reload them
> > immediately in the caller.  On some Power CPUs, the latter behavior can
> > result in somewhat costly stalls if the consecutive accesses hit a timing
> > window.
>
> But can't you do
> #pragma omp declare simd linear(sinp, cosp)
> void sincos (double x, double *sinp, double *cosp);
> ?
> That is something the vectorizer code could handle and for
>   for (int i = 0; i < 1024; i++)
> sincos (val[i], &sinval[i], &cosval[i]);
> just vectorize it as
>   for (int i = 0; i < 1024; i += vf)
> _ZGVbN8vl8l8_sincos (*(vector double *)&val[i], &sinval[i], &cosval[i]);
> Anything else will need specialized code to handle sincos specially in the
> vectorizer.

I guess we'll need special code in the vectorizer anyway because in
GIMPLE we'll have

  for (int i = 0; i < 1024; i++)
   {
  _Complex double tem = __builtin_cexpi (val[i]);
  sinval[i] = __real tem;
  cosval[i] = __imag tem;
   }

we'd have to promote tem back to memory and the call to
sincos (val[i], &__real tem, &__imag tem) virtually or
explicitely.  The vectorizer is currently not happy seeing
_Complex (but dataref analysis would not be happy to see
sincos).  So we do need changes to the vectorizer.

> > If you feel it isn't possible to do this, then we can abandon it.  Right
> > now my understanding is that GCC doesn't vectorize calls to sincos yet
> > for any targets, so it would be moot except that we really should define
> > what happens for the future.
> >
> > This calling convention would also be useful in the future for vectorizing
> > functions that return complex values either by value or by reference.
>
> Only by value, you really don't know what the code does if something is
> passed by reference, whether it is read, written into, or both etc.
> And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
> pass them, just GCC isn't able to do that right now.

Ah, ok.  So what's missing is the standard function cexpi both GCC and
libmvec can use.

> > Well, as a matter of practicality, we don't have any of that implemented
> > in the rs6000 back end, and we don't have any free resources to do that
> > in GCC 11.  Is there any documentation about what needs to be done to
> > support this?  I've always been under the impression that vectorizing for
> > masking when there isn't any hardware support is a losing proposition, so
> > we've not investigated it.
>
> You don't need to do pretty much anything, except set
> clonei->mask_mode = VOIDmode, I think the generic code should handle that
> everything beyond that, in particular add the mask argument and use it
> both on the caller side and on the expansion of the to be vectorized clone.
>
> Jakub
>


Re: GLIBC libmvec status

2020-02-27 Thread Segher Boessenkool
On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:
> The reason that homogeneous aggregates matter (at least somewhat) is that
> the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
> mangling formula that includes the length of parameters and return values.
> Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
> has a superior one, we chose to use ELFv2's calling convention and make use
> of homogeneous aggregates for return values in registers for the case of
> vectorized sincos.
> 
> Please look at the document to see the constraints we're under to fit into
> the different OpenMP clauses and attributes.  It seems to me that we can
> only define this for both powerpc64 and powerpc64le by establishing two
> different calling conventions, which provides two different vector length
> calculations for the sincos return value, and therefore requires two
> different function implementations with different mangled names.  (Either
> that, or we cripple vectorized sincos by requiring it to return values
> through memory.)

I still don't see it.  For all ABIs the length of the arguments and
return value is the same, and homogeneous aggregates doesn't factor
in at all; that is just a detail whether something is passed in
registers or memory (as we have with many other ABIs as well, fwiw).

So why make this part of the mangling rules?

It is perfectly fine to design this with ELFv2 in mind, of course, but
making a dependency on the (current!) (very complex!) ELFv2 rules for
absolutely no reason at all is a mistake, in my opinion.


Segher


Re: GLIBC libmvec status

2020-02-26 Thread Segher Boessenkool
Hi!

On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:
> On 2/25/20 12:45 PM, Segher Boessenkool wrote:
> >I don't agree we should have a new ABI, and an API (which this *is* as
> >far as I can tell) works fine on *any* ABI.  Homogeneous aggregates has
> >nothing to do with anything either.
> >
> >It is fine to only *support* powerpc64le-linux, sure.  But don't fragment
> >the implementation, it only hurts, never helps -- we will end up having
> >to support ten or twenty different compilers, instead of one compiler
> >with a few (mostly) orthogonal variations.  And yes, we should also test
> >everything everywhere, whenever reasonable.
> 
> Thanks, Segher.  Let me ask for some clarification here on how you'd like
> us to proceed.
> 
> The reason that homogeneous aggregates matter (at least somewhat) is that
> the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
> mangling formula that includes the length of parameters and return values.

I don't see how that matters?  A function that returns a struct works
fine on any implementation.  Sure, for small structs it can be returned
in just registers on ELFv2, while some other ABIs push stuff through
memory.  But that is just an implementation detail of the *actual* ABI.
It is good to know about this when designing the mvec stuff, sure, but
this will *work* on *any* ABI.

> Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
> has a superior one, we chose to use ELFv2's calling convention and make use
> of homogeneous aggregates for return values in registers for the case of
> vectorized sincos.

I don't understand this.  You designed this API with the ELFv2 ABI in
mind, sure, but that does not magically make homogeneous aggreggates
appear on other ABIs, nor do you need them there.

I'll read the docs again, someone is missing something here, and it
probably is me ;-)


Segher


Re: GLIBC libmvec status

2020-02-26 Thread Jakub Jelinek
On Wed, Feb 26, 2020 at 07:55:53AM -0600, Bill Schmidt wrote:
> The hope is that we can create a vectorized version that returns values
> in registers rather than the by-ref parameters, and add code to GCC to
> copy things around correctly following the call.  Ideally the signature of
> the vectorized version would be sth like
> 
>   struct retval {vector double, vector double};
>   retval vecsincos (vector double);
> 
> In the typical case where calls to sincos are of the form
> 
>   sincos (val[i], &sinval[i], &cosval[i]);
> 
> this would allow us to only store the values in the caller upon return,
> rather than store them in the callee and potentially reload them
> immediately in the caller.  On some Power CPUs, the latter behavior can
> result in somewhat costly stalls if the consecutive accesses hit a timing
> window.

But can't you do
#pragma omp declare simd linear(sinp, cosp)
void sincos (double x, double *sinp, double *cosp);
?
That is something the vectorizer code could handle and for
  for (int i = 0; i < 1024; i++)
sincos (val[i], &sinval[i], &cosval[i]);
just vectorize it as
  for (int i = 0; i < 1024; i += vf)
_ZGVbN8vl8l8_sincos (*(vector double *)&val[i], &sinval[i], &cosval[i]);
Anything else will need specialized code to handle sincos specially in the
vectorizer.

> If you feel it isn't possible to do this, then we can abandon it.  Right
> now my understanding is that GCC doesn't vectorize calls to sincos yet
> for any targets, so it would be moot except that we really should define
> what happens for the future.
> 
> This calling convention would also be useful in the future for vectorizing
> functions that return complex values either by value or by reference.

Only by value, you really don't know what the code does if something is
passed by reference, whether it is read, written into, or both etc.
And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
pass them, just GCC isn't able to do that right now.

> Well, as a matter of practicality, we don't have any of that implemented
> in the rs6000 back end, and we don't have any free resources to do that
> in GCC 11.  Is there any documentation about what needs to be done to
> support this?  I've always been under the impression that vectorizing for
> masking when there isn't any hardware support is a losing proposition, so
> we've not investigated it.

You don't need to do pretty much anything, except set
clonei->mask_mode = VOIDmode, I think the generic code should handle that
everything beyond that, in particular add the mask argument and use it
both on the caller side and on the expansion of the to be vectorized clone.

Jakub



Re: GLIBC libmvec status

2020-02-26 Thread Bill Schmidt

On 2/26/20 2:18 AM, Jakub Jelinek wrote:

On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:

The reason that homogeneous aggregates matter (at least somewhat) is that
the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
mangling formula that includes the length of parameters and return values.
Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
has a superior one, we chose to use ELFv2's calling convention and make use
of homogeneous aggregates for return values in registers for the case of
vectorized sincos.

Can you please explain how do you want to pass the
void sincos (double, double *, double *);
arguments?  I must say it isn't entirely clear from the document.
You talk there about double[2], but sincos certainly doesn't have such an
argument.


The hope is that we can create a vectorized version that returns values
in registers rather than the by-ref parameters, and add code to GCC to
copy things around correctly following the call.  Ideally the signature of
the vectorized version would be sth like

  struct retval {vector double, vector double};
  retval vecsincos (vector double);

In the typical case where calls to sincos are of the form

  sincos (val[i], &sinval[i], &cosval[i]);

this would allow us to only store the values in the caller upon return,
rather than store them in the callee and potentially reload them
immediately in the caller.  On some Power CPUs, the latter behavior can
result in somewhat costly stalls if the consecutive accesses hit a timing
window.

If you feel it isn't possible to do this, then we can abandon it.  Right
now my understanding is that GCC doesn't vectorize calls to sincos yet
for any targets, so it would be moot except that we really should define
what happens for the future.

This calling convention would also be useful in the future for vectorizing
functions that return complex values either by value or by reference.



Also, I'd say ignoring the masked variants is a mistake, are you going to
warn any time the user uses inbranch or even doesn't specify notinbranch?
The masking can be implemented even without highly specialized instructions,
e.g. on x86 only AVX512F has full masking support, for older ISAs all that
is there is conditional store or e.g. for integral operations that can't
trap/raise exceptions just doing blend-like operations (or even and/or) is
all that is needed; just let the vectorizer do its job.


Well, as a matter of practicality, we don't have any of that implemented
in the rs6000 back end, and we don't have any free resources to do that
in GCC 11.  Is there any documentation about what needs to be done to
support this?  I've always been under the impression that vectorizing for
masking when there isn't any hardware support is a losing proposition, so
we've not investigated it.

Thanks,
Bill



Even if you don't want it for libmvec, just use
__attribute__((simd ("notinbranch")))
for those, but allow the user to use it where it makes sense.

Jakub



Re: GLIBC libmvec status

2020-02-26 Thread Richard Biener
On Wed, Feb 26, 2020 at 2:46 PM Jakub Jelinek  wrote:
>
> On Wed, Feb 26, 2020 at 10:32:17AM -0300, Tulio Magno Quites Machado Filho 
> wrote:
> > Jakub Jelinek  writes:
> >
> > > Can you please explain how do you want to pass the
> > > void sincos (double, double *, double *);
> > > arguments?  I must say it isn't entirely clear from the document.
> > > You talk there about double[2], but sincos certainly doesn't have such an
> > > argument.
> >
> > The plan [1] is to return a struct instead, i.e.:
> >
> > struct sincosret _ZGVbN2v_sincos (vector double);
> > struct sincosretf _ZGVbN4v_sincosf (vector float);
>
> Ugh, then certainly it shouldn't be mangled as simd variant of sincos, it
> needs to be called something else.
> The ABI can't be written for a single, even if commonly used, function, but
> needs to be generic.
> And if I have a
> #pragma omp declare simd
> void foo (double x, double *y, double *z);
> then from the prototype there is no way to find out that it only uses the
> second two arguments to store a single double through them.  It could very
> well do
> void foo (double x, double *y, double *z) {
>   y[0] = y[1] + x;
>   z[0] = z[1] + x;
> }
> or anything else and then you can't transform it to something like that.

Yeah.  I think you have to approach the sincos case from our internal
__builtin_cexpi which means

_Complex double foo (double);

and how that represents itself with OpenMP SIMD.

Richard.


> Jakub
>


Re: GLIBC libmvec status

2020-02-26 Thread Jakub Jelinek
On Wed, Feb 26, 2020 at 10:32:17AM -0300, Tulio Magno Quites Machado Filho 
wrote:
> Jakub Jelinek  writes:
> 
> > Can you please explain how do you want to pass the
> > void sincos (double, double *, double *);
> > arguments?  I must say it isn't entirely clear from the document.
> > You talk there about double[2], but sincos certainly doesn't have such an
> > argument.
> 
> The plan [1] is to return a struct instead, i.e.:
> 
> struct sincosret _ZGVbN2v_sincos (vector double);
> struct sincosretf _ZGVbN4v_sincosf (vector float);

Ugh, then certainly it shouldn't be mangled as simd variant of sincos, it
needs to be called something else.
The ABI can't be written for a single, even if commonly used, function, but
needs to be generic.
And if I have a
#pragma omp declare simd
void foo (double x, double *y, double *z);
then from the prototype there is no way to find out that it only uses the
second two arguments to store a single double through them.  It could very
well do
void foo (double x, double *y, double *z) {
  y[0] = y[1] + x;
  z[0] = z[1] + x;
}
or anything else and then you can't transform it to something like that.

Jakub



Re: GLIBC libmvec status

2020-02-26 Thread Tulio Magno Quites Machado Filho
Jakub Jelinek  writes:

> Can you please explain how do you want to pass the
> void sincos (double, double *, double *);
> arguments?  I must say it isn't entirely clear from the document.
> You talk there about double[2], but sincos certainly doesn't have such an
> argument.

The plan [1] is to return a struct instead, i.e.:

struct sincosret _ZGVbN2v_sincos (vector double);
struct sincosretf _ZGVbN4v_sincosf (vector float);

Notice however, that change is still missing [2] from the libmvec patch
series [3].

[1] https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
[2] 
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/powerpc/powerpc64/fpu/multiarch/vec_s_sincosf4_vsx.c;hb=refs/heads/tuliom/libmvec
[3] https://sourceware.org/git/?p=glibc.git;a=log;h=refs/heads/tuliom/libmvec

-- 
Tulio Magno


Re: GLIBC libmvec status

2020-02-26 Thread Jakub Jelinek
On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:
> The reason that homogeneous aggregates matter (at least somewhat) is that
> the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
> mangling formula that includes the length of parameters and return values.
> Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
> has a superior one, we chose to use ELFv2's calling convention and make use
> of homogeneous aggregates for return values in registers for the case of
> vectorized sincos.

Can you please explain how do you want to pass the
void sincos (double, double *, double *);
arguments?  I must say it isn't entirely clear from the document.
You talk there about double[2], but sincos certainly doesn't have such an
argument.

Also, I'd say ignoring the masked variants is a mistake, are you going to
warn any time the user uses inbranch or even doesn't specify notinbranch?
The masking can be implemented even without highly specialized instructions,
e.g. on x86 only AVX512F has full masking support, for older ISAs all that
is there is conditional store or e.g. for integral operations that can't
trap/raise exceptions just doing blend-like operations (or even and/or) is
all that is needed; just let the vectorizer do its job.

Even if you don't want it for libmvec, just use
__attribute__((simd ("notinbranch")))
for those, but allow the user to use it where it makes sense.

Jakub



Re: GLIBC libmvec status

2020-02-25 Thread Bill Schmidt



On 2/25/20 12:45 PM, Segher Boessenkool wrote:

Hi!

On Tue, Feb 25, 2020 at 04:53:17PM +, GT wrote:

‐‐‐ Original Message ‐‐‐
On Sunday, February 23, 2020 11:45 AM, Bill Schmidt  
wrote:

As I just wrote on gcc-patches, we should disable libmvec for powerpc64.
The vector ABI as written isn't compatible with ELFv1.  We would need
a modified ABI that doesn't allow homogeneous aggregates of vectors to
be returned in registers in order to support ELFv1.  I do not believe
that is worth pursuing until and unless there is demand for it (which
I do not expect).

Are we all agreed that the POWER Vector Function ABI will be implemented only
for powerpc64le?

I do not agree.

I don't agree we should have a new ABI, and an API (which this *is* as
far as I can tell) works fine on *any* ABI.  Homogeneous aggregates has
nothing to do with anything either.

It is fine to only *support* powerpc64le-linux, sure.  But don't fragment
the implementation, it only hurts, never helps -- we will end up having
to support ten or twenty different compilers, instead of one compiler
with a few (mostly) orthogonal variations.  And yes, we should also test
everything everywhere, whenever reasonable.


Thanks, Segher.  Let me ask for some clarification here on how you'd like
us to proceed.

The reason that homogeneous aggregates matter (at least somewhat) is that
the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
mangling formula that includes the length of parameters and return values.
Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
has a superior one, we chose to use ELFv2's calling convention and make use
of homogeneous aggregates for return values in registers for the case of
vectorized sincos.

Please look at the document to see the constraints we're under to fit into
the different OpenMP clauses and attributes.  It seems to me that we can
only define this for both powerpc64 and powerpc64le by establishing two
different calling conventions, which provides two different vector length
calculations for the sincos return value, and therefore requires two
different function implementations with different mangled names.  (Either
that, or we cripple vectorized sincos by requiring it to return values
through memory.)

Now, we can either write a document that handles both cases now (describes
both calling conventions), and force glibc to have two different functions at
least for the sincos case; or we can restrict this particular document to
ELFv2 and leave open the possibility of writing a very similar but slightly
different document for ELFv1 at such time as someone wants to use ELFv1 for
libmvec.  I'd personally rather push that extra work out until we know
there's a market for it.  That is, I don't want to preclude its use for
ELFv1, but this *particular* API is specific to ELFv2, so we need to
acknowledge that in the code.

Ultimately it's your call, but if we need to rewrite the ABI/API we're
going to need concrete proposals for how to do that.

Thanks,
Bill



For the glibc side I have no opinion.


Segher


Re: GLIBC libmvec status

2020-02-25 Thread Joseph Myers
On Tue, 25 Feb 2020, GT wrote:

> 2. In GCC making SIMD clones available only for powerpc64le should be 
> sufficient to guarantee that the Vector Function ABI is applied only for 
> systems implementing the ELFv2 ABI. Right? Then, which macro is to be 
> tested for in rs6000_simd_clone_usable? I expect that TARGET_VSX, 
> TARGET_P8_VECTOR or TARGET_P9_VECTOR are not specific enough.

I have no advice on whether you should restrict it to ELFv2 in GCC or not.  
But if you do restrict it to ELFv2, that's "DEFAULT_ABI == ABI_ELFv2" 
(not sure if you should also test TARGET_64BIT).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: GLIBC libmvec status

2020-02-25 Thread Tulio Magno Quites Machado Filho
GT  writes:

> Are we all agreed that the POWER Vector Function ABI will be implemented only
> for powerpc64le?
>
> If so, here are a few more questions:
>
> 1. The GLIBC implementation has files Makefile, Versions, configure, 
> configure.ac among others
> in directory sysdeps/powerpc/powerpc64/fpu. Do we need to create a new 
> directory as
> sysdeps/powerpc/powerpc64/powerpc64le/fpu and into it move the aforementioned 
> files?

No, the directory already exists as sysdeps/powerpc/powerpc64/le/fpu/.

If we end up agreeing to restrict it to powerpc64le, we only need to move a
couple of files or contents of files.  Not all of them.
That would require to change most, if not all, of the libmvec patches.
I can change them.

Anyway, IMHO glibc needs to follow the GCC decision here.

-- 
Tulio Magno


Re: GLIBC libmvec status

2020-02-25 Thread Segher Boessenkool
Hi!

On Tue, Feb 25, 2020 at 04:53:17PM +, GT wrote:
> ‐‐‐ Original Message ‐‐‐
> On Sunday, February 23, 2020 11:45 AM, Bill Schmidt  
> wrote:
> > As I just wrote on gcc-patches, we should disable libmvec for powerpc64.
> > The vector ABI as written isn't compatible with ELFv1.  We would need
> > a modified ABI that doesn't allow homogeneous aggregates of vectors to
> > be returned in registers in order to support ELFv1.  I do not believe
> > that is worth pursuing until and unless there is demand for it (which
> > I do not expect).
> 
> Are we all agreed that the POWER Vector Function ABI will be implemented only
> for powerpc64le?

I do not agree.

I don't agree we should have a new ABI, and an API (which this *is* as
far as I can tell) works fine on *any* ABI.  Homogeneous aggregates has
nothing to do with anything either.

It is fine to only *support* powerpc64le-linux, sure.  But don't fragment
the implementation, it only hurts, never helps -- we will end up having
to support ten or twenty different compilers, instead of one compiler
with a few (mostly) orthogonal variations.  And yes, we should also test
everything everywhere, whenever reasonable.

For the glibc side I have no opinion.


Segher


Re: GLIBC libmvec status

2020-02-25 Thread GT
‐‐‐ Original Message ‐‐‐
On Sunday, February 23, 2020 11:45 AM, Bill Schmidt  
wrote:

> On 2/21/20 6:49 AM, Tulio Magno Quites Machado Filho wrote:
>
> > +Bill, +Segher
> >
> > GT  writes:
> >
> > > Can I have until tomorrow morning to figure out exactly where/how to link 
> > > the Power Vector
> > > Function ABI page? My first quick attempt resulted in the html tags being 
> > > rendered on the
> > > page verbatim.
> >
> > Sure!
> > Let me know when you update the wiki and I'll send the patches to 
> > libc-alpha.
> >
> > Meanwhile, let me clarify another point...
> >
> > Bert, Bill, Segher,
> >
> > In the GCC discussion, Jakub pointed out [1] the new vector ABI targets 
> > ELFv2,
> > but there is nothing preventing it from being used on powerpc64-linux or
> > powerpc-linux.
> > On the other hand, the glibc patches enable libmvec on powerpc64le and
> > powerpc64.
> >
> > IMHO, regardless of the decision, GCC and glibc should be in sync.
> >
> > Bert, did you get a chance to test the GCC patches on powerpc64-linux?
> > I've been testing the glibc patches and they work fine, but they require
> > POWER8 (the vector ABI also requires P8).
> >
> > Bill, Segher,
> > What do you think is the best solution from the GCC point of view?
>
> As I just wrote on gcc-patches, we should disable libmvec for powerpc64.
> The vector ABI as written isn't compatible with ELFv1.  We would need
> a modified ABI that doesn't allow homogeneous aggregates of vectors to
> be returned in registers in order to support ELFv1.  I do not believe
> that is worth pursuing until and unless there is demand for it (which
> I do not expect).
>

Are we all agreed that the POWER Vector Function ABI will be implemented only
for powerpc64le?

If so, here are a few more questions:

1. The GLIBC implementation has files Makefile, Versions, configure, 
configure.ac among others
in directory sysdeps/powerpc/powerpc64/fpu. Do we need to create a new 
directory as
sysdeps/powerpc/powerpc64/powerpc64le/fpu and into it move the aforementioned 
files?

2. In GCC making SIMD clones available only for powerpc64le should be 
sufficient to guarantee that
the Vector Function ABI is applied only for systems implementing the ELFv2 ABI. 
Right?
Then, which macro is to be tested for in rs6000_simd_clone_usable? I expect 
that TARGET_VSX,
TARGET_P8_VECTOR or TARGET_P9_VECTOR are not specific enough.

Bert.