Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-11 Thread Tejas Belagod

On 7/10/24 4:37 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
 wrote:


Tejas Belagod  writes:

On 7/10/24 2:38 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:


On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

   bool foo (svint32_t a, svint32_t b)
   {
 svbool_t p = a > b;

 // Here p[2] is not the same as a[2] > b[2].
 return p[2];
   }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

   typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
   typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
  auto r = a < b;
}

produces, with C23:

  vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Restricting __attribute__((vector_mask)) to app

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Tejas Belagod

On 7/10/24 2:38 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:


On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

  bool foo (svint32_t a, svint32_t b)
  {
svbool_t p = a > b;

// Here p[2] is not the same as a[2] > b[2].
return p[2];
  }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

  typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
  typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
   https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
 auto r = a < b;
}

produces, with C23:

 vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Restricting __attribute__((vector_mask)) to apply only to target
intrinsic types? Eg.

On SVE something like:
typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.

On AVX, someth

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-10 Thread Tejas Belagod

On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

 bool foo (svint32_t a, svint32_t b)
 {
   svbool_t p = a > b;

   // Here p[2] is not the same as a[2] > b[2].
   return p[2];
 }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

 typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
 typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
  https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
auto r = a < b;
}

produces, with C23:

vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Restricting __attribute__((vector_mask)) to apply only to target
intrinsic types? Eg.

On SVE something like:
typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.

On AVX, something like:
typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though
this would require more f

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-09 Thread Tejas Belagod

On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

bool foo (svint32_t a, svint32_t b)
{
  svbool_t p = a > b;

  // Here p[2] is not the same as a[2] > b[2].
  return p[2];
}

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
 https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
   auto r = a < b;
}

produces, with C23:

   vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the 
target vector mask layout can pose a challenge - maybe making it a 
generic GNU vector extension was too ambitious. I wonder if there's 
value in pursuing these alternate paths?


1. Can implementing this extension in a 'generic' way i.e. possibly not 
implement it with a target mask, but just a generic int vector, still 
maintain the consistency of GNU predicate vectors within the compiler? I 
know it may not seem very different from how boolean vectors are 
currently implemented (as in your above example), but, having the 
__attribute__((vector_mask)) as a 'property' of the object makes it 
useful to optimize its uses to target predicates in subsequent stages of 
the compiler.


2. Restricting __attribute__((vector_mask)) to apply only to target 
intrinsic types? Eg.


On SVE something like:
typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.

On AVX, something like:
typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though 
this would require more fine-grained defn of lane-size to mask-bits mapping.


Would not be allowed on GNU Vector Extensio

Re: [PING^2] [PATCH 00/11] AArch64/OpenMP: Test SVE ACLE types with various OpenMP constructs.

2024-07-08 Thread Tejas Belagod

Ping^2 on the series please.

Thanks,
Tejas.

On 5/27/24 10:36 AM, Tejas Belagod wrote:

Note: This patch series is based on Richard's initial patch
   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606741.html
and Jakub's suggestion
   https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611892.html

The following patch series handles various scenarios with OpenMP and SVE types.
The starting point for the series follows a suggestion from Jakub to cover all
the possible scenarios that could arise when OMP constructs/clauses etc are
used with SVE ACLE types. Here are a few instances that this patch series tests
and in some cases fixes the expected output.  This patch series does not follow
a formal definition or a spec of how OMP interacts with SVE ACLE types, so it's
more of a proposed behaviour.  Comments and discussion welcome.

This list is not exhaustive, but covers most scenarios of how SVE ACLE types
ought to interact with OMP constructs/clauses.

1. Poly-int structures that represent variable-sized objects and OMP runtime.

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

2. SVE ACLE types in OMP Shared clauses.

We test the behaviour where SVE ACLE type objects are shared in the following
methods into an OMP region:
   a. Explicit Shared clause on SVE ACLE type objects.
   b. Implicit shared clause.
   c. Implicit shared with default clause.
   d. SVE ALCE types in the presence of predetermined (static) shared objects.

The associated tests ensure that all such shared objects are passed by address
into the OMP runtime.  There are runtime tests to verify the functional
correctness of the change.

3. Offloading and SVE ACLE types.

The target clause in OpenMP is used to offload loop kernels to accelerator
peripeherals.  target's 'map' clause is used to move data from and to the
accelarator.  When the data is SVE type, it may not be suitable because of
various reasons i.e. the two SVE targets may not agree on vector size or
some targets don't support variable vector size.  This makes SVE unsuitable
for use in OMP's 'map' clause.  We diagnose all such cases and issue errors
where appropriate.  The cases we cover in this patch are:

   a. Implicitly-mapped SVE ACLE types in OMP target regions are diagnosed.
   b. Explicitly-mapped SVE ACLE types in OMP target regions using map clause
  are diagnosed.
   c. Explicilty-mapped SVLE ACLE types of various directions - to, from, tofrom
  in the map clause are diagnosed.
   d. target enter and exit data clauses with map on SVE ACLE types are
  diagnosed.
   e. target data map with alloc on SVE ACLE types are diagnosed.
   f. target update from clause on SVE ACLE types are diagnosed.
   g. target private firstprivate with SVE ACLE types are diagnosed.
   h. All combinations of target with work-sharing constructs like parallel,
  loop, simd, teams, distribute etc are also diagnosed when SVE ACLE types
  are involved.

3. Lastprivate and SVE ACLE types.

Various OpenMP lastprivate clause scenarios with SVE object types are
diagnosed.  Worksharing constructs like sections, for, distribute bind to an
implicit outer parallel region in whose scope SVE ACLE types are declared and
are therefore default private.  The lastprivate clause list with SVE ACLE type
object items are diagnosed in this scenario.

4. Threadprivate on SVE ACLE type objects.

We ensure threadprivate SVE ACLE type objects are supported. We also ensure
copyin clause is also supported.

5. User-Defined Reductions on SVE ACLE types.

We define a reduction using OMP declare reduction using SVE ACLE intrinsics and
ensure its functional correctness with various work-sharing constructs like
for, simd, parallel, task, taskloop.

6. Uniform and Aligned Clause with SVE ACLE

We ensure the uniform clause's functional correctness with simd construct and
associated SVE ACLE intrinsics in the simd region.  There is no direct
interaction between uniform and SVE ACLE type objects, but we ensure the uniform
clause applies correctly to a region where SVE ACLE intrinsics are present.
Similarly for the aligned clause.

7. Linear clause and SVE ACLE type.

We diagnose if a linear clause list item has SVE ACLE type objects present.
Its doesn't mean much if the linear clause is applied to SVE ACLE types.

8. Depend clause and SVE ACLE objects.

We test for functional correctness many combinations of dependency of shared
SVE ACLE type objects in parallel regions.  We test if in, out dependencies and
anti-dependencies are supported for SVE ACLE type objects using the depend
clause with work-sharing constructs like task.

9. 'doacross' clause and SVE ACLE object types.

doacross is mainly supported for scalars and loop iteration variables.  We
diagnose cases where SVE ACLE objects are used in doacross list items.

Tejas Belagod (11

[RFC] Proposal to support Packed Boolean Vector masks.

2024-07-08 Thread Tejas Belagod

Hi,

Sorry to have dropped the ball on 
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but 
here I've tried to pick it up again and write up a strawman proposal for 
elevating __attribute__((vector_mask)) to the FE from GIMPLE.



Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support 
C/C++ operators on SVE ACLE types. The current vector boolean type that 
ACLE specifies does not adequately disambiguate vector lane sizes which 
they were derived off of. Consider this simple, albeit unrealistic, example:


  bool foo (svint32_t a, svint32_t b)
  {
svbool_t p = a > b;

// Here p[2] is not the same as a[2] > b[2].
return p[2];
  }

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i] 
does not return the bool value corresponding to a[i] > b[i]. This 
necessitates a 'typed' vector boolean value that unambiguously 
represents results of operations

of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector 
produces a new boolean vector type that represents a boolean type that 
is produced as a result of operations on the corresponding base vector 
type. The following is the syntax.


  typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
  typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined 
for its base data vector type.


• Two boolean vector types who's base data vector types have same number 
of elements and lane-width have the same layout and size.


• Consequently, two boolean vectors who's base data vector types have 
different number of elements or different lane-size have different layouts.


This aligns with gnu vector extensions that generate integer vectors as 
a result of comparisons - "The result of the comparison is a vector of 
the same width and number of elements as the comparison operands with a 
signed integral element type." according to

   https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.

Producers and Consumers of PBV
--

With GNU vector extensions, comparisons produce boolean vectors; 
conditional and bitwise operators consume them. Comparison producers 
generate signed integer vectors of the same lane-width as the operands 
of the comparison operator. This means conditionals and bitwise 
operators cannot be applied to mixed vectors that are a result of 
different width operands. Eg.


  v8hi foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
  {
return a > b || c > d; // error!
return a > b || e < f; // OK - no explicit conversion needed.
return a > b || __builtin_convertvector (c > d, v8si); // OK.
return a | b && c | d; // error!
return a | b && __builtin_convertvector (c | d, v8si); // OK.
  }

__builtin_convertvector () needs to be applied to convert vectors to the 
type one wants to do the comparison in. IoW, the integer vectors that 
represent boolean vectors are 'strictly-typed'. If we extend these rules 
to vector_mask, this will look like:


  typedef v8sib v8si __attribute__((vector_mask));
  typedef v8hib v8hi __attribute__((vector_mask));
  typedef v8sfb v8sf __attribute__((vector_mask));

  v8sib foo (v8si a, v8si b, v8hi c, v8hi d, v8sf e, v8sf f)
  {
v8sib psi = a > b;
v8hib phi = c > d;
v8sfb psf = e < f;

return psi || phi; // error!
return psi || psf; // OK - no explicit conversion needed.
return psi || __builtin_convertvector (phi, v8sib); // OK.
return psi | phi; // error!
return psi | __builtin_convertvector (phi, v8sib); // OK.
return psi | psf; // OK - no explicit conversion needed.
  }

Now according to the rules explained above, v8sib and v8hib will have 
different layouts (which is why they can't be used directly without 
conversion if used as operands of operations). OTOH, the same rules 
dictate that the layout of, say v8sib and v8sfb, where v8sfb is the 
float base data vector equivalent of v8sib which when applied ensure 
that v8sib and v8sfb have the same layout and hence can be used as 
operands of operators without explicit conversion. This aligns with the 
GNU vector extensions rules where comparison of 2 v8sf vectors results 
in a v8si of the same lane-width and number of elements as that would 
result in comparison of 2 v8si vectors.


Application of vector_mask to sizeless types


__attribute__((vector_mask)) has the advantage that it can be applied to 
sizeless types seamlessly.  When __attribute__((vector_mask)) is applied 
to a data vector that is a sizeless type, the resulting vector mask also 
becomes a sizeless type.

Eg.

  typedef 

Re: [PATCH v6] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Tejas Belagod

On 6/28/24 6:18 AM, Pengxuan Zheng wrote:

This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target
by adding popcount patterns for vector modes besides QImode, i.e., HImode,
SImode and DImode.

With this patch, we now generate the following for V8HI:
   cnt v1.16b, v0.16b
   uaddlp  v2.8h, v1.16b

For V4HI, we generate:
   cnt v1.8b, v0.8b
   uaddlp  v2.4h, v1.8b

For V4SI, we generate:
   cnt v1.16b, v0.16b
   uaddlp  v2.8h, v1.16b
   uaddlp  v3.4s, v2.8h

For V4SI with TARGET_DOTPROD, we generate the following instead:
   moviv0.4s, #0
   moviv1.16b, #1
   cnt v3.16b, v2.16b
   udotv0.4s, v3.16b, v1.16b

For V2SI, we generate:
   cnt v1.8b, v.8b
   uaddlp  v2.4h, v1.8b
   uaddlp  v3.2s, v2.4h

For V2SI with TARGET_DOTPROD, we generate the following instead:
   moviv0.8b, #0
   moviv1.8b, #1
   cnt v3.8b, v2.8b
   udotv0.2s, v3.8b, v1.8b

For V2DI, we generate:
   cnt v1.16b, v.16b
   uaddlp  v2.8h, v1.16b
   uaddlp  v3.4s, v2.8h
   uaddlp  v4.2d, v3.4s

For V4SI with TARGET_DOTPROD, we generate the following instead:
   moviv0.4s, #0
   moviv1.16b, #1
   cnt v3.16b, v2.16b
   udotv0.4s, v3.16b, v1.16b
   uaddlp  v0.2d, v0.4s

PR target/113859

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_addlp): Rename to...
(@aarch64_addlp): ... This.
(popcount2): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-udot.c: New test.
* gcc.target/aarch64/popcnt-vec.c: New test.

Signed-off-by: Pengxuan Zheng 
---
  gcc/config/aarch64/aarch64-simd.md| 41 ++-
  .../gcc.target/aarch64/popcnt-udot.c  | 58 
  gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69 +++
  3 files changed, 167 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-udot.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 01b084d8ccb..afdf3ec7873 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3461,7 +3461,7 @@ (define_insn 
"*aarch64_addlv_ze"
[(set_attr "type" "neon_reduc_add")]
  )
  
-(define_expand "aarch64_addlp"

+(define_expand "@aarch64_addlp"
[(set (match_operand: 0 "register_operand")
(plus:
  (vec_select:
@@ -3517,6 +3517,45 @@ (define_insn "popcount2"
[(set_attr "type" "neon_cnt")]
  )
  
+(define_expand "popcount2"

+  [(set (match_operand:VDQHSD 0 "register_operand")
+(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
+  "TARGET_SIMD"
+  {
+/* Generate a byte popcount. */


A couple of formatting nits. Two spaces before end of comment.


+machine_mode mode =  == 64 ? V8QImode : V16QImode;
+rtx tmp = gen_reg_rtx (mode);
+auto icode = optab_handler (popcount_optab, mode);
+emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1])));
+
+if (TARGET_DOTPROD
+&& (mode == SImode || mode == DImode))
+  {
+/* For V4SI and V2SI, we can generate a UDOT with a 0 accumulator and a
+   1 multiplicand. For V2DI, another UAADDLP is needed. */


Likewise.


+rtx ones = force_reg (mode, CONST1_RTX (mode));
+auto icode = optab_handler (udot_prod_optab, mode);
+mode =  == 64 ? V2SImode : V4SImode;
+rtx dest = mode == mode ? operands[0] : gen_reg_rtx (mode);
+rtx zeros = force_reg (mode, CONST0_RTX (mode));
+emit_insn (GEN_FCN (icode) (dest, tmp, ones, zeros));
+tmp = dest;
+  }
+
+/* Use a sequence of UADDLPs to accumulate the counts. Each step doubles
+   the element size and halves the number of elements. */


Likewise. Also two spaces after the dot before a new sentence.

You could run your patch through gcc/contrib/check_GNU_style.sh to check 
for formatting nits.


Thanks,
Tejas.


+while (mode != mode)
+  {
+auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE (tmp));
+mode = insn_data[icode].operand[0].mode;
+rtx dest = mode == mode ? operands[0] : gen_reg_rtx (mode);
+emit_insn (GEN_FCN (icode) (dest, tmp));
+tmp = dest;
+  }
+DONE;
+  }
+)
+
  ;; 'across lanes' max and min ops.
  
  ;; Template for outputting a scalar, so we can create __builtins which can be

diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-udot.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt-udot.c
new file mode 100644
index 000..f6a968dae95
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt-udot.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.2-a+dotprod -fno-vect-cost-model 
-fno-schedule-insns -fno-schedule-insns2" } */
+
+/*
+** bar:
+** moviv([0-9]+).16b, 0x1
+** moviv([0-9]+).4s, 0
+** ldr q([0-9]+), \[x0\]
+** cnt v([0-9]+).16b, v\3.16b
+** 

Re: [PATCH 00/11] AArch64/OpenMP: Test SVE ACLE types with various OpenMP constructs.

2024-06-19 Thread Tejas Belagod

PING for the series.

Thanks,
Tejas.

On 5/27/24 10:36 AM, Tejas Belagod wrote:

Note: This patch series is based on Richard's initial patch
   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606741.html
and Jakub's suggestion
   https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611892.html

The following patch series handles various scenarios with OpenMP and SVE types.
The starting point for the series follows a suggestion from Jakub to cover all
the possible scenarios that could arise when OMP constructs/clauses etc are
used with SVE ACLE types. Here are a few instances that this patch series tests
and in some cases fixes the expected output.  This patch series does not follow
a formal definition or a spec of how OMP interacts with SVE ACLE types, so it's
more of a proposed behaviour.  Comments and discussion welcome.

This list is not exhaustive, but covers most scenarios of how SVE ACLE types
ought to interact with OMP constructs/clauses.

1. Poly-int structures that represent variable-sized objects and OMP runtime.

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

2. SVE ACLE types in OMP Shared clauses.

We test the behaviour where SVE ACLE type objects are shared in the following
methods into an OMP region:
   a. Explicit Shared clause on SVE ACLE type objects.
   b. Implicit shared clause.
   c. Implicit shared with default clause.
   d. SVE ALCE types in the presence of predetermined (static) shared objects.

The associated tests ensure that all such shared objects are passed by address
into the OMP runtime.  There are runtime tests to verify the functional
correctness of the change.

3. Offloading and SVE ACLE types.

The target clause in OpenMP is used to offload loop kernels to accelerator
peripeherals.  target's 'map' clause is used to move data from and to the
accelarator.  When the data is SVE type, it may not be suitable because of
various reasons i.e. the two SVE targets may not agree on vector size or
some targets don't support variable vector size.  This makes SVE unsuitable
for use in OMP's 'map' clause.  We diagnose all such cases and issue errors
where appropriate.  The cases we cover in this patch are:

   a. Implicitly-mapped SVE ACLE types in OMP target regions are diagnosed.
   b. Explicitly-mapped SVE ACLE types in OMP target regions using map clause
  are diagnosed.
   c. Explicilty-mapped SVLE ACLE types of various directions - to, from, tofrom
  in the map clause are diagnosed.
   d. target enter and exit data clauses with map on SVE ACLE types are
  diagnosed.
   e. target data map with alloc on SVE ACLE types are diagnosed.
   f. target update from clause on SVE ACLE types are diagnosed.
   g. target private firstprivate with SVE ACLE types are diagnosed.
   h. All combinations of target with work-sharing constructs like parallel,
  loop, simd, teams, distribute etc are also diagnosed when SVE ACLE types
  are involved.

3. Lastprivate and SVE ACLE types.

Various OpenMP lastprivate clause scenarios with SVE object types are
diagnosed.  Worksharing constructs like sections, for, distribute bind to an
implicit outer parallel region in whose scope SVE ACLE types are declared and
are therefore default private.  The lastprivate clause list with SVE ACLE type
object items are diagnosed in this scenario.

4. Threadprivate on SVE ACLE type objects.

We ensure threadprivate SVE ACLE type objects are supported. We also ensure
copyin clause is also supported.

5. User-Defined Reductions on SVE ACLE types.

We define a reduction using OMP declare reduction using SVE ACLE intrinsics and
ensure its functional correctness with various work-sharing constructs like
for, simd, parallel, task, taskloop.

6. Uniform and Aligned Clause with SVE ACLE

We ensure the uniform clause's functional correctness with simd construct and
associated SVE ACLE intrinsics in the simd region.  There is no direct
interaction between uniform and SVE ACLE type objects, but we ensure the uniform
clause applies correctly to a region where SVE ACLE intrinsics are present.
Similarly for the aligned clause.

7. Linear clause and SVE ACLE type.

We diagnose if a linear clause list item has SVE ACLE type objects present.
Its doesn't mean much if the linear clause is applied to SVE ACLE types.

8. Depend clause and SVE ACLE objects.

We test for functional correctness many combinations of dependency of shared
SVE ACLE type objects in parallel regions.  We test if in, out dependencies and
anti-dependencies are supported for SVE ACLE type objects using the depend
clause with work-sharing constructs like task.

9. 'doacross' clause and SVE ACLE object types.

doacross is mainly supported for scalars and loop iteration variables.  We
diagnose cases where SVE ACLE objects are used in doacross list items.

Tejas Belagod (11

Re: [PATCH 02/11] AArch64: Add test cases for SVE types in OpenMP shared clause.

2024-05-31 Thread Tejas Belagod

On 5/30/24 6:08 PM, Richard Sandiford wrote:

Tejas Belagod  writes:

This patch tests various shared clauses with SVE types.  It also adds a test
scaffold to run OpenMP tests in under the gcc.target testsuite.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp: New scaffold.


Hopefully Jakub can comment on whether we should test this in the
GCC testsuite or libgomp testsuite.

On the test:


[...]
+int
+main ()
+{
+  svint32_t x = svindex_s32 (0 ,1);
+  svint32_t y = svindex_s32 (8, 1);
+  svint32_t a, b;
+  svbool_t p;
+
+  /* Implicit shared.  */
+  a = foo (x, y, p);
+  b = implicit_shared_default (x, y, p);


It looks like p is used uninitialised here.  Can you check locally
that using svptrue_b8 () (or whatever) as an initialiser allows the
test to pass while svpfalse_b () causes it to fail?



Oops, thanks for spotting that. Now verified - will wait for Jakub's 
comment on tests' home before I respin.


Thanks,
Tejas.


Thanks,
Richard


+  compare_vec (a, b);
+
+  /* Explicit shared.  */
+  a = foo (x ,y, p);
+  b = explicit_shared (x, y, p);
+  compare_vec (a, b);
+
+  /* Implicit shared with no default clause.  */
+  a = foo (x ,y, p);
+  b = implicit_shared_no_default (x, y, p);
+  compare_vec (a, b);
+
+  /* Mix shared.  */
+  a = foo (x ,y, p);
+  b = mix_shared (y, p);
+  compare_vec (a, b);
+
+  /* Predetermined shared.  */
+  predetermined_shared_static (true);
+  predetermined_shared_static (false);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "value-expr: \*.omp_data_i->a" 10 
"ompexp" } } */




Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-31 Thread Tejas Belagod

On 5/30/24 6:28 PM, Richard Sandiford wrote:

Tejas Belagod  writes:

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

gcc/ChangeLog
* omp-low.c (use_pointer_for_field): Use pointer if the OMP data
structure's field type is a poly-int.
---
  gcc/omp-low.cc | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 1a65229cc37..b15607f4ef5 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -466,7 +466,8 @@ static bool
  use_pointer_for_field (tree decl, omp_context *shared_ctx)
  {
if (AGGREGATE_TYPE_P (TREE_TYPE (decl))
-  || TYPE_ATOMIC (TREE_TYPE (decl)))
+  || TYPE_ATOMIC (TREE_TYPE (decl))
+  || POLY_INT_CST_P (DECL_SIZE (decl)))
  return true;
  
/* We can only use copy-in/copy-out semantics for shared variables




Thanks for the reviews.


Realise this is also true of my original patch, but:

I suppose a question here is whether this function is only ever used for
local interfaces between code generated by the same source code function,
or whether it's ABI in a more general sense.  


I'm not a 100% sure, but AFAICS, 'use_pointer_for_field' seems to be 
used only for local interface between source and generated functions. I 
don't see any backend hooks into this or backend hooking into this 
function for general ABI. Ofcourse, I'm not the expert on OMP lowering, 
so it would be great to get an expert opinion on this.



If the latter, I suppose
we should make sure to handle ACLE types the same way regardless of
whether the SVE vector size is known.



When you say same way, do you mean the way SVE ABI defines the rules for 
SVE types?


Thanks,
Tejas.


(At the moment, the vector size is fixed for a TU, not just a function,
but we should probably plan for relaxing that in future.)

Thanks,
Richard




[PATCH 11/11] AArch64: Diagnose SVE type objects when applied to OpenMP doacross clause.

2024-05-26 Thread Tejas Belagod
This patch tests if SVE type objects when applied to doacross clause are
correctly diagnosed.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/sve/omp/doacross.c: New test.
---
 .../gcc.target/aarch64/sve/omp/doacross.c | 22 +++
 1 file changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c
new file mode 100644
index 000..a311887926b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/doacross.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+
+#include 
+
+int a[256];
+
+__attribute__((noinline, noclone)) int
+f1 (svint32_t va)
+{
+  int j;
+  #pragma omp for ordered (1)
+  for (j = 16; j < 64; j++)
+{
+  #pragma omp ordered doacross(sink: va) /* { dg-error {variable 'va' is 
not an iteration of outermost loop 1, expected 'j'} } */
+  a[j - 1] = j + svaddv_s32 (svptrue_b32 (), va);
+  #pragma omp ordered doacross(source: omp_cur_iteration)
+  j += 4;
+  va = svindex_s32 (0,1);
+}
+  return j;
+}
-- 
2.25.1



[PATCH 06/11] AArch64: Test OpenMP user-defined reductions with SVE types.

2024-05-26 Thread Tejas Belagod
This patch tests user-defined reductions on various constructs with objects
of SVE type.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/sve/omp/udr-sve.c: New test.
---
 .../gcc.target/aarch64/sve/omp/udr-sve.c  | 166 ++
 1 file changed, 166 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c
new file mode 100644
index 000..049fbee9056
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/udr-sve.c
@@ -0,0 +1,166 @@
+/* { dg-do run } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+
+#include 
+
+#pragma omp declare reduction (+:svint32_t: omp_out = svadd_s32_z 
(svptrue_b32(), omp_in, omp_out))
+
+int parallel_reduction ()
+{
+  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
+  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
+  svint32_t va = svld1_s32 (svptrue_b32 (), b);
+  int i = 0;
+  int64_t res;
+
+  #pragma omp parallel reduction (+:va, i)
+{
+  va = svld1_s32 (svptrue_b32 (), a);
+  i++;
+}
+
+  res = svaddv_s32 (svptrue_b32 (), va);
+
+  if (res != i * 8)
+__builtin_abort ();
+
+  return 0;
+}
+
+int for_reduction ()
+{
+  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
+  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
+  svint32_t va = svld1_s32 (svptrue_b32 (), b);
+  int i = 0;
+  int j;
+  int64_t res;
+
+  #pragma omp parallel for reduction (+:va, i)
+  for (j = 0; j < 8; j++)
+{
+  va = svld1_s32 (svptrue_b32 (), a);
+  i++;
+}
+
+  res = svaddv_s32 (svptrue_b32 (), va);
+
+  if (res != i * 8)
+__builtin_abort ();
+
+  return 0;
+}
+
+int simd_reduction ()
+{
+  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
+  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
+  svint32_t va = svld1_s32 (svptrue_b32 (), b);
+  int i = 0;
+  int j;
+  int64_t res;
+
+  /* The list includes va that is already vectorized, so the only impact here
+ is on the scalar variable i.  OMP spec says only scalar variables are
+ allowed in the list.  Should non-scalars be diagnosed?  */
+  #pragma omp simd reduction (+:va, i)
+  for (j = 0; j < 8; j++)
+{
+  va = svld1_s32 (svptrue_b32 (), a);
+  i++;
+}
+
+  res = svaddv_s32 (svptrue_b32 (), va);
+
+  if (res != i)
+__builtin_abort ();
+
+  return 0;
+}
+
+int taskloop_reduction ()
+{
+  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
+  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
+  svint32_t va = svld1_s32 (svptrue_b32 (), b);
+  int i = 0;
+  int j;
+  int64_t res;
+
+  #pragma omp taskloop reduction (+:va, i)
+  for (j = 0; j < 8; j++)
+{
+  svint32_t tva = svld1_s32 (svptrue_b32 (), a);
+  #pragma omp in_reduction (+: va)
+  va = svadd_s32_z (svptrue_b32 (), tva, va);
+  i++;
+}
+
+  res = svaddv_s32 (svptrue_b32 (), va);
+
+  if (res != i * 8)
+__builtin_abort ();
+
+  return 0;
+}
+
+int task_reduction ()
+{
+  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
+  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
+  svint32_t va = svld1_s32 (svptrue_b32 (), b);
+  int i = 0;
+  int j;
+  int64_t res;
+
+  #pragma omp parallel reduction (task,+:va)
+  {
+va = svadd_s32_z (svptrue_b32 (), svld1_s32 (svptrue_b32 (), a), va);
+i++;
+  }
+
+  res = svaddv_s32 (svptrue_b32 (), va);
+
+  if (res != i * 8)
+__builtin_abort ();
+
+  return 0;
+}
+
+int inscan_reduction_incl ()
+{
+  int a[8] = {1 ,1, 1, 1, 1, 1, 1, 1};
+  int b[8] = {0 ,0, 0, 0, 0, 0, 0, 0};
+  svint32_t va = svld1_s32 (svptrue_b32 (), b);
+  int j;
+  int i = 0;
+  int64_t res = 0;
+
+  #pragma omp parallel
+  #pragma omp for reduction (inscan,+:va, i)
+  for (j = 0; j < 8; j++)
+{
+  va = svld1_s32 (svptrue_b32 (), a);
+  i++;
+  #pragma omp scan inclusive (va, i)
+  res += svaddv_s32 (svptrue_b32 (), va);
+}
+
+  if (res != i * 8)
+__builtin_abort ();
+
+  return 0;
+}
+
+int
+main()
+{
+  parallel_reduction ();
+  task_reduction ();
+  inscan_reduction_incl ();
+  taskloop_reduction ();
+  simd_reduction ();
+  for_reduction ();
+
+  return 0;
+}
-- 
2.25.1



[PATCH 10/11] AArch64: Test OpenMP depend clause and its variations on SVE types

2024-05-26 Thread Tejas Belagod
This patch adds a test to test depend clause and its various dependency
variations with SVE type objects.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/sve/omp/depend-1.c: New test.
---
 .../gcc.target/aarch64/sve/omp/depend-1.c | 223 ++
 1 file changed, 223 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/depend-1.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/depend-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/depend-1.c
new file mode 100644
index 000..734c20fb9ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/depend-1.c
@@ -0,0 +1,223 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+
+#include 
+
+int zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0};
+int ones[8] = { 1, 1, 1, 1, 1, 1, 1, 1 };
+int twos[8] = { 2, 2, 2, 2, 2, 2, 2, 2 };
+
+void
+dep (void)
+{
+  svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+
+  #pragma omp parallel
+  #pragma omp single
+  {
+#pragma omp task shared (x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort  ();
+  }
+}
+
+void
+dep2 (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp task shared (x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort  ();
+#pragma omp taskwait
+  }
+}
+
+void
+dep3 (void)
+{
+  #pragma omp parallel
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp single
+{
+  #pragma omp task shared (x) depend(out: x)
+  x = svld1_s32 (svptrue_b32 (), twos);
+  #pragma omp task shared (x) depend(in: x)
+  if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+   __builtin_abort  ();
+}
+  }
+}
+
+void
+firstpriv (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp task depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp task depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 1)))
+  __builtin_abort  ();
+  }
+}
+
+void
+antidep (void)
+{
+  svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+  #pragma omp parallel
+  #pragma omp single
+  {
+#pragma omp task shared(x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 1)))
+  __builtin_abort  ();
+#pragma omp task shared(x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+  }
+}
+
+void
+antidep2 (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp taskgroup
+{
+  #pragma omp task shared(x) depend(in: x)
+  if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 1)))
+   __builtin_abort  ();
+  #pragma omp task shared(x) depend(out: x)
+  x = svld1_s32 (svptrue_b32 (), twos);
+}
+  }
+}
+
+void
+antidep3 (void)
+{
+  #pragma omp parallel
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp single
+{
+  #pragma omp task shared(x) depend(in: x)
+  if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 1)))
+   __builtin_abort  ();
+  #pragma omp task shared(x) depend(out: x)
+  x = svld1_s32 (svptrue_b32 (), twos);
+}
+  }
+}
+
+
+void
+outdep (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), zeros);
+#pragma omp task shared(x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp task shared(x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp taskwait
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort  ();
+  }
+}
+
+void
+concurrent (void)
+{
+  svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+  #pragma omp parallel
+  #pragma omp single
+  {
+#pragma omp task shared (x) depend(out: x)
+x = svld1_s32 (svptrue_b32 (), twos);
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort  ();
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort  ();
+#pragma omp task shared (x) depend(in: x)
+if (svptest_any (svptrue_b32(), svcmpne_n_s32 (svptrue_b32 (), x, 2)))
+  __builtin_abort  ();
+  }
+}
+
+void
+concurrent2 (void)
+{
+  #pragma omp parallel
+  #pragma omp single
+  {
+svint32_t x = svld1_s32 (svptrue_b32 (), ones);
+#pragma omp task shared (x) depend(out: x)
+  

[PATCH 07/11] AArch64: Test OpenMP uniform clause on SVE types.

2024-05-26 Thread Tejas Belagod
This patch tests if simd uniform clause works with SVE types in simd regions.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/sve/omp/simd-uniform.c: New test.
---
 .../gcc.target/aarch64/sve/omp/simd-uniform.c | 71 +++
 1 file changed, 71 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/simd-uniform.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-uniform.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-uniform.c
new file mode 100644
index 000..6256ce9fdc1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-uniform.c
@@ -0,0 +1,71 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+
+#include 
+
+#define N 256
+
+void init(int *a, int *a_ref, int *b, int n)
+{
+   int i;
+   for ( i=0; i

[PATCH 03/11] AArch64: Diagnose OpenMP offloading when SVE types involved.

2024-05-26 Thread Tejas Belagod
The target clause in OpenMP is used to offload loop kernels to accelarator
peripeherals.  target's 'map' clause is used to move data from and to the
accelarator.  When the data is SVE type, it may not be suitable because of
various reasons i.e. the two SVE targets may not agree on vector size or
some targets don't support variable vector size.  This makes SVE unsuitable
for use in OMP's 'map' clause.  This patch diagnoses all such cases and issues
an error where SVE types are not suitable.

Co-authored-by: Andrea Corallo 

gcc/ChangeLog:

* target.h (type_context_kind): Add new context kinds for target 
clauses.
* config/aarch64/aarch64-sve-builtins.cc (verify_type_context): Diagnose
SVE types for a given OpenMP context.
* gimplify.cc (omp_notice_variable):  Diagnose implicitly-mapped SVE
objects in OpenMP regions.
(gimplify_scan_omp_clauses): Diagnose SVE types for various target
clauses.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/omp/offload-1.c: New test.
* gcc.target/aarch64/sve/omp/offload-2.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-parallel-loop.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-parallel.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-simd.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-teams-distribute-simd.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-teams-distribute.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-teams-loop.c: Likewise.
* gcc.target/aarch64/sve/omp/offload-teams.c: Likewise.
* gcc.target/aarch64/sve/omp/target-device.c: Likewise.
* gcc.target/aarch64/sve/omp/target-link.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve-builtins.cc|  31 +++
 gcc/gimplify.cc   |  34 ++-
 gcc/target.h  |  19 +-
 .../gcc.target/aarch64/sve/omp/offload-1.c| 237 ++
 .../gcc.target/aarch64/sve/omp/offload-2.c| 198 +++
 .../aarch64/sve/omp/offload-parallel-loop.c   | 236 +
 .../aarch64/sve/omp/offload-parallel.c| 195 ++
 .../gcc.target/aarch64/sve/omp/offload-simd.c | 236 +
 .../sve/omp/offload-teams-distribute-simd.c   | 237 ++
 .../sve/omp/offload-teams-distribute.c| 236 +
 .../aarch64/sve/omp/offload-teams-loop.c  | 237 ++
 .../aarch64/sve/omp/offload-teams.c   | 195 ++
 .../aarch64/sve/omp/target-device.c   |  97 +++
 .../gcc.target/aarch64/sve/omp/target-link.c  |  48 
 14 files changed, 2234 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/omp/offload-parallel-loop.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-parallel.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-simd.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams-distribute-simd.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams-distribute.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams-loop.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/target-device.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/target-link.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index f3983a123e3..ee1064c3bb7 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -5000,6 +5000,29 @@ bool
 verify_type_context (location_t loc, type_context_kind context,
 const_tree type, bool silent_p)
 {
+  if (aarch64_sve::builtin_type_p (type)
+  || (POINTER_TYPE_P (type)
+ && aarch64_sve::builtin_type_p (TREE_TYPE (type
+switch (context)
+{
+  case TCTX_OMP_MAP:
+   error_at (loc, "SVE type %qT not allowed in map clause", type);
+   return false;
+  case TCTX_OMP_MAP_IMP_REF:
+   return false;
+  case TCTX_OMP_PRIVATE:
+   error_at (loc, "SVE type %qT not allowed in target private clause", 
type);
+   return false;
+  case TCTX_OMP_FIRSTPRIVATE:
+   error_at (loc, "SVE type %qT not allowed in target firstprivate 
clause", type);
+   return false;
+  case TCTX_OMP_DEVICE_ADDR:
+   error_at (loc, "SVE type %qT not allowed in target device clauses", 
type);
+   return false;
+  default:
+   break;
+}
+
   if (!sizeless_type_p (type))
 return true;
 
@@ -5060,6 +5083,14 @@ verify_type_context (location_t loc, type_context_kind 
context,
   if (!silent_p)
error_at (loc, "capture by copy of SVE type %qT", 

[PATCH 08/11] AArch64: Test OpenMP simd aligned clause with SVE types.

2024-05-26 Thread Tejas Belagod
This patch tests simd aligned clause and their interaction with SVE types.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/sve/omp/simd-aligned.c: New test.
---
 .../gcc.target/aarch64/sve/omp/simd-aligned.c | 50 +++
 1 file changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/simd-aligned.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-aligned.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-aligned.c
new file mode 100644
index 000..6c75bb5a714
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/simd-aligned.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+#include 
+
+#define N 256
+
+int a[N] __attribute__((aligned (64)));
+int b[N] __attribute__((aligned (64)));
+
+
+__attribute((noipa))
+void foo (int *p, int *q)
+{
+   svint32_t va, vb, vc;
+   int i;
+   uint64_t sz = svcntw ();
+
+#pragma omp simd aligned(p, q : 64) private (va, vb, vc) nontemporal (va, vb, 
vc)
+  for (i = 0; i < N; i++)
+{
+  if (i % sz == 0)
+   {
+ va = svld1_s32 (svptrue_b32 (), p);
+ vb = svindex_s32 (1, 0);
+ vc = svadd_s32_z (svptrue_b32 (), va, vb);
+ svst1_s32 (svptrue_b32 (), q, vc);
+ q += sz;
+   }
+}
+
+  return;
+}
+
+int main ()
+{
+
+  for (int i = 0;i < N; i++)
+{
+  a[i] = 1;
+  b[i] = 0;
+}
+
+  foo (a, b);
+
+  for (int i = 0;i < N; i++)
+if (b[i] != 2)
+  __builtin_abort ();
+
+  return 0;
+}
-- 
2.25.1



[PATCH 09/11] AArch64: Diagnose OpenMP linear clause for SVE type objects.

2024-05-26 Thread Tejas Belagod
This patch tests if SVE object types if applied to linear clause is diagnosed
as expected.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/sve/omp/linear.c: New test.
---
 .../gcc.target/aarch64/sve/omp/linear.c   | 33 +++
 1 file changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c
new file mode 100644
index 000..77b823a73d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/linear.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+
+#include 
+
+int a[256];
+
+__attribute__((noinline, noclone)) int
+f1 (svint32_t va, int i)
+{
+  #pragma omp parallel for linear (va: 8) linear (i: 4) /* { dg-error {linear 
clause applied to non-integral non-pointer variable with type 'svint32_t'} } */
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+  va = svindex_s32 (0,1);
+}
+  return i;
+}
+
+__attribute__((noinline, noclone)) int
+f2 (svbool_t p, int i)
+{
+  #pragma omp parallel for linear (p: 0) linear (i: 4) /* { dg-error {linear 
clause applied to non-integral non-pointer variable with type 'svbool_t'} } */
+  for (int j = 16; j < 64; j++)
+{
+  a[i] = j;
+  i += 4;
+  p = svptrue_b32 ();
+}
+  return i;
+}
+
-- 
2.25.1



[PATCH 05/11] AArch64: Test OpenMP threadprivate clause on SVE type.

2024-05-26 Thread Tejas Belagod
This patch adds a test for ensuring threadprivate clause works for SVE type
objects.

gcc/testsuite/ChangeLog

* gcc.target/aarch64/sve/omp/threadprivate.c: New test.
---
 .../aarch64/sve/omp/threadprivate.c   | 44 +++
 1 file changed, 44 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/threadprivate.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/threadprivate.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/threadprivate.c
new file mode 100644
index 000..0a46b0a7770
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/threadprivate.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+
+#include 
+
+typedef __SVInt32_t v8si __attribute__((arm_sve_vector_bits(256)));
+
+v8si vec1;
+#pragma omp threadprivate (vec1)
+
+int main()
+{
+  int64_t res = 0;
+
+#pragma omp parallel firstprivate (res) num_threads(10)
+  {
+vec1 = svindex_s32 (1, 0);
+res = svaddv_s32 (svptrue_b32 (), vec1);
+
+#pragma omp barrier
+if (res != 8LL)
+  __builtin_abort ();
+  }
+
+  return 0;
+}
+
+int foo ()
+{
+  int64_t res = 0;
+
+  vec1 = svindex_s32 (1, 0);
+
+#pragma omp parallel copyin (vec1) firstprivate (res) num_threads(10)
+  {
+res = svaddv_s32 (svptrue_b32 (), vec1);
+
+#pragma omp barrier
+if (res != 8LL)
+  __builtin_abort ();
+  }
+
+  return 0;
+}
-- 
2.25.1



[PATCH 04/11] AArch64: Test OpenMP lastprivate clause for various constructs.

2024-05-26 Thread Tejas Belagod
This patch tests various OpenMP lastprivate clause with SVE object types in
various construct contexts.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/omp/lastprivate.c: New test.
---
 .../gcc.target/aarch64/sve/omp/lastprivate.c  | 121 ++
 1 file changed, 121 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/lastprivate.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/lastprivate.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/lastprivate.c
new file mode 100644
index 000..e4ecc58a9c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/lastprivate.c
@@ -0,0 +1,121 @@
+/* { dg-do compile } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+
+#include 
+
+#define N 8
+
+#ifndef CONSTRUCT
+#define CONSTRUCT
+#endif
+
+svint32_t __attribute__ ((noinline))
+omp_lastprivate_sections ()
+{
+
+  int a[N], b[N], c[N];
+  svint32_t va, vb, vc;
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < N; i++)
+{
+  b[i] = i;
+  c[i] = i + 1;
+}
+
+/* This worksharing construct binds to an implicit outer parallel region in
+whose scope va is declared and therefore is default private.  This causes
+the lastprivate clause list item va to be diagnosed as private in the outer
+context.  Similarly for constructs for and distribute.  */
+#pragma omp sections lastprivate (va) /* { dg-error {lastprivate variable 'va' 
is private in outer context} } */
+{
+  #pragma omp section
+  vb = svld1_s32 (svptrue_b32 (), b);
+  #pragma omp section
+  vc = svld1_s32 (svptrue_b32 (), c);
+  #pragma omp section
+  va = svadd_s32_z (svptrue_b32 (), vb, vc);
+}
+
+  return va;
+}
+
+
+svint32_t __attribute__ ((noinline))
+omp_lastprivate_for ()
+{
+
+  int a[N], b[N], c[N];
+  svint32_t va, vb, vc;
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < N; i++)
+{
+  b[i] = i;
+  c[i] = i + 1;
+}
+
+#pragma omp for lastprivate (va) /* { dg-error {lastprivate variable 'va' is 
private in outer context} } */
+  for (i = 0; i < 1; i++)
+{
+  vb = svld1_s32 (svptrue_b32 (), b);
+  vc = svld1_s32 (svptrue_b32 (), c);
+  va = svadd_s32_z (svptrue_b32 (), vb, vc);
+}
+
+  return va;
+}
+
+svint32_t __attribute__ ((noinline))
+omp_lastprivate_simd ()
+{
+
+  int a[N], b[N], c[N];
+  svint32_t va, vb, vc;
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < N; i++)
+{
+  b[i] = i;
+  c[i] = i + 1;
+}
+
+#pragma omp simd lastprivate (va)
+  for (i = 0; i < 1; i++)
+{
+  vb = svld1_s32 (svptrue_b32 (), b);
+  vc = svld1_s32 (svptrue_b32 (), c);
+  va = svadd_s32_z (svptrue_b32 (), vb, vc);
+}
+
+  return va;
+}
+
+svint32_t __attribute__ ((noinline))
+omp_lastprivate_distribute ()
+{
+
+  int a[N], b[N], c[N];
+  svint32_t va, vb, vc;
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < N; i++)
+{
+  b[i] = i;
+  c[i] = i + 1;
+}
+
+#pragma omp distribute lastprivate (va) /* { dg-error {lastprivate variable 
'va' is private in outer context} } */
+  for (i = 0; i < 1; i++)
+{
+  vb = svld1_s32 (svptrue_b32 (), b);
+  vc = svld1_s32 (svptrue_b32 (), c);
+  va = svadd_s32_z (svptrue_b32 (), vb, vc);
+}
+
+  return va;
+}
-- 
2.25.1



[PATCH 02/11] AArch64: Add test cases for SVE types in OpenMP shared clause.

2024-05-26 Thread Tejas Belagod
This patch tests various shared clauses with SVE types.  It also adds a test
scaffold to run OpenMP tests in under the gcc.target testsuite.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp: New scaffold.
* gcc/testsuite/gcc.target/aarch64/sve/omp/shared.c: New test.
---
 .../aarch64/sve/omp/aarch64-sve-omp.exp   |  80 
 .../gcc.target/aarch64/sve/omp/shared.c   | 186 ++
 2 files changed, 266 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/shared.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp
new file mode 100644
index 000..1997c80c334
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp
@@ -0,0 +1,80 @@
+# Copyright (C) 2006-2024 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+if ![check_effective_target_fopenmp] {
+  return
+}
+
+proc omp_link_flags { } {
+global ld_library_path
+global TOOL_OPTIONS
+
+set flags ""
+
+if ![is_remote host] {
+   if [info exists TOOL_OPTIONS] {
+   set gccpath "[get_multilibs ${TOOL_OPTIONS}]"
+   } else {
+   set gccpath "[get_multilibs]"
+   }
+}
+
+if { $gccpath != "" } {
+  if [file exists "${gccpath}/libgomp/libgomp.spec"] {
+ append flags "-B${gccpath}/libgomp/ -L${gccpath}/libgomp/.libs 
-I${gccpath}/libgomp/"
+ append ld_library_path ":${gccpath}/libgomp/.libs"
+  }
+} else {
+  global tool_root_dir
+
+  set libgomp [lookfor_file ${tool_root_dir} libgomp]
+  if { $libgomp != "" } {
+  append flags "-L${libgomp} -B${libgomp}"
+  append ld_library_path ":${libgomp}"
+  }
+}
+
+set_ld_library_path_env_vars
+
+return "$flags"
+}
+
+if { [check_effective_target_aarch64_sve] } {
+set sve_flags ""
+} else {
+set sve_flags "-march=armv8.2-a+sve"
+}
+
+# Main loop.
+dg-runtest [lsort [find $srcdir/$subdir *.c]] "[omp_link_flags] $sve_flags 
-fopenmp" ""
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/omp/shared.c 
b/gcc/testsuite/gcc.target/aarch64/sve/omp/shared.c
new file mode 100644
index 000..3f380d95da4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/shared.c
@@ -0,0 +1,186 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
-fdump-tree-ompexp" } */
+
+#include 
+#include 
+#include 
+#include 
+
+svint32_t
+__attribute__ ((noinline))
+explicit_shared (svint32_t a, svint32_t b, svbool_t p)
+{
+
+#pragma omp parallel shared (a, b, p) num_threads (1)
+  {
+/* 'a', 'b' and 'p' are explicitly shared.  */
+a = svadd_s32_z (p, a, b);
+  }
+
+#pragma omp parallel shared (a, b, p) num_threads (1)
+  {
+a = svadd_s32_z (p, a, b);
+  }
+
+  return a;
+}
+
+svint32_t
+__attribute__ ((noinline))
+implicit_shared_default (svint32_t a, svint32_t b, svbool_t p)
+{
+
+#pragma omp parallel default (shared) num_threads (1)
+  {
+/* 'a', 'b' and 'p' are implicitly shared.  */
+a = svadd_s32_z (p, a, b);
+  }
+
+#pragma omp parallel default (shared) num_threads (1)
+  {
+a = svadd_s32_z (p, a, b);
+  }
+
+  return a;
+}
+
+svint32_t
+__attribute__ ((noinline))
+implicit_shared_no_default (svint32_t a, svint32_t b, svbool_t p)
+{
+
+#pragma omp parallel num_threads (1)
+  {
+/* 'a', 'b' and 'p' are implicitly shared without default clause.  */
+a = svadd_s32_z (p, a, b);
+  }
+
+#pragma omp parallel num_threads (1)
+  {
+a = svadd_s32_z (p, a, b);
+  }
+
+  return a;
+}
+
+svint32_t
+__attribute__ ((noinline))
+mix_shared (svint32_t b, svbool_t p)
+{
+
+  svint32_t a;
+  int32_t *m = (int32_t *)malloc (8 * sizeof (int32_t));
+  int i;
+
+#pragma omp parallel for
+  for (i = 0; i < 8; i++)
+m[i] = i;
+
+#pragma omp parallel
+  {
+/* 'm' is predetermined shared here.  'a' is implicitly shared here.  */
+a = svld1_s32 

[PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-26 Thread Tejas Belagod
Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

gcc/ChangeLog
* omp-low.c (use_pointer_for_field): Use pointer if the OMP data
structure's field type is a poly-int.
---
 gcc/omp-low.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 1a65229cc37..b15607f4ef5 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -466,7 +466,8 @@ static bool
 use_pointer_for_field (tree decl, omp_context *shared_ctx)
 {
   if (AGGREGATE_TYPE_P (TREE_TYPE (decl))
-  || TYPE_ATOMIC (TREE_TYPE (decl)))
+  || TYPE_ATOMIC (TREE_TYPE (decl))
+  || POLY_INT_CST_P (DECL_SIZE (decl)))
 return true;
 
   /* We can only use copy-in/copy-out semantics for shared variables
-- 
2.25.1



[PATCH 00/11] AArch64/OpenMP: Test SVE ACLE types with various OpenMP constructs.

2024-05-26 Thread Tejas Belagod
Note: This patch series is based on Richard's initial patch
  https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606741.html
and Jakub's suggestion
  https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611892.html

The following patch series handles various scenarios with OpenMP and SVE types.
The starting point for the series follows a suggestion from Jakub to cover all 
the possible scenarios that could arise when OMP constructs/clauses etc are 
used with SVE ACLE types. Here are a few instances that this patch series tests
and in some cases fixes the expected output.  This patch series does not follow
a formal definition or a spec of how OMP interacts with SVE ACLE types, so it's 
more of a proposed behaviour.  Comments and discussion welcome.

This list is not exhaustive, but covers most scenarios of how SVE ACLE types
ought to interact with OMP constructs/clauses.

1. Poly-int structures that represent variable-sized objects and OMP runtime.

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

2. SVE ACLE types in OMP Shared clauses.

We test the behaviour where SVE ACLE type objects are shared in the following
methods into an OMP region:
  a. Explicit Shared clause on SVE ACLE type objects.
  b. Implicit shared clause.
  c. Implicit shared with default clause.
  d. SVE ALCE types in the presence of predetermined (static) shared objects.

The associated tests ensure that all such shared objects are passed by address
into the OMP runtime.  There are runtime tests to verify the functional
correctness of the change.

3. Offloading and SVE ACLE types.

The target clause in OpenMP is used to offload loop kernels to accelerator
peripeherals.  target's 'map' clause is used to move data from and to the 
accelarator.  When the data is SVE type, it may not be suitable because of
various reasons i.e. the two SVE targets may not agree on vector size or
some targets don't support variable vector size.  This makes SVE unsuitable
for use in OMP's 'map' clause.  We diagnose all such cases and issue errors
where appropriate.  The cases we cover in this patch are:

  a. Implicitly-mapped SVE ACLE types in OMP target regions are diagnosed.
  b. Explicitly-mapped SVE ACLE types in OMP target regions using map clause
 are diagnosed.
  c. Explicilty-mapped SVLE ACLE types of various directions - to, from, tofrom
 in the map clause are diagnosed.
  d. target enter and exit data clauses with map on SVE ACLE types are 
 diagnosed.
  e. target data map with alloc on SVE ACLE types are diagnosed.
  f. target update from clause on SVE ACLE types are diagnosed.
  g. target private firstprivate with SVE ACLE types are diagnosed.
  h. All combinations of target with work-sharing constructs like parallel,
 loop, simd, teams, distribute etc are also diagnosed when SVE ACLE types
 are involved.

3. Lastprivate and SVE ACLE types.

Various OpenMP lastprivate clause scenarios with SVE object types are 
diagnosed.  Worksharing constructs like sections, for, distribute bind to an
implicit outer parallel region in whose scope SVE ACLE types are declared and 
are therefore default private.  The lastprivate clause list with SVE ACLE type
object items are diagnosed in this scenario.

4. Threadprivate on SVE ACLE type objects.

We ensure threadprivate SVE ACLE type objects are supported. We also ensure
copyin clause is also supported.

5. User-Defined Reductions on SVE ACLE types.

We define a reduction using OMP declare reduction using SVE ACLE intrinsics and
ensure its functional correctness with various work-sharing constructs like
for, simd, parallel, task, taskloop.

6. Uniform and Aligned Clause with SVE ACLE

We ensure the uniform clause's functional correctness with simd construct and
associated SVE ACLE intrinsics in the simd region.  There is no direct
interaction between uniform and SVE ACLE type objects, but we ensure the uniform
clause applies correctly to a region where SVE ACLE intrinsics are present.
Similarly for the aligned clause.

7. Linear clause and SVE ACLE type.

We diagnose if a linear clause list item has SVE ACLE type objects present.
Its doesn't mean much if the linear clause is applied to SVE ACLE types.

8. Depend clause and SVE ACLE objects.

We test for functional correctness many combinations of dependency of shared
SVE ACLE type objects in parallel regions.  We test if in, out dependencies and
anti-dependencies are supported for SVE ACLE type objects using the depend
clause with work-sharing constructs like task.

9. 'doacross' clause and SVE ACLE object types.

doacross is mainly supported for scalars and loop iteration variables.  We
diagnose cases where SVE ACLE objects are used in doacross list items.

Tejas Belagod (11):
  OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.
  AArch64: Add test cases for SVE

[gcc r14-9487] vect: Call vect_convert_output with the right vecitype [PR114108]

2024-03-15 Thread Tejas Belagod via Gcc-cvs
https://gcc.gnu.org/g:81f3d963e05de8b17d4ccc7667ead9ed156193a4

commit r14-9487-g81f3d963e05de8b17d4ccc7667ead9ed156193a4
Author: Tejas Belagod 
Date:   Wed Mar 6 15:30:26 2024 +0530

vect: Call vect_convert_output with the right vecitype [PR114108]

This patch fixes a bug where vect_recog_abd_pattern called 
vect_convert_output
with the incorrect vecitype for the corresponding pattern_stmt.
vect_convert_output expects vecitype to be the vector form of the scalar 
type
of the LHS of pattern_stmt, but we were passing in the vector form of the 
LHS
of the new impending conversion statement.  This caused a skew in ABD's
pattern_stmt having the vectype of the following gimple pattern_stmt.

2024-03-06  Tejas Belagod  

gcc/ChangeLog:

PR middle-end/114108
* tree-vect-patterns.cc (vect_recog_abd_pattern): Call
vect_convert_output with the correct vecitype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr114108.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr114108.c | 19 +++
 gcc/tree-vect-patterns.cc|  5 ++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr114108.c 
b/gcc/testsuite/gcc.dg/vect/pr114108.c
new file mode 100644
index 000..b3075d41398
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114108.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+#include "tree-vect.h"
+
+typedef signed char schar;
+
+__attribute__((noipa, noinline, optimize("O3")))
+void foo(const schar *a, const schar *b, schar *c, int n)
+{
+  for (int i = 0; i < n; i++)
+{   
+  unsigned u = __builtin_abs (a[i] - b[i]);
+  c[i] = u <= 7U ? u : 7U; 
+}   
+}
+
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target aarch64*-*-* 
} } } */
+/* { dg-final { scan-tree-dump "vect_recog_abd_pattern: detected" "vect" { 
target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d562f57920f..4f491c6b833 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1576,9 +1576,8 @@ vect_recog_abd_pattern (vec_info *vinfo,
   && !TYPE_UNSIGNED (abd_out_type))
 {
   tree unsign = unsigned_type_for (abd_out_type);
-  tree unsign_vectype = get_vectype_for_scalar_type (vinfo, unsign);
-  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt,
- unsign_vectype);
+  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt, 
vectype_out);
+  vectype_out = get_vectype_for_scalar_type (vinfo, unsign);
 }
 
   return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out);


Re: [PATCH] vect: Call vect_convert_output with the right vecitype [PR114108]

2024-03-14 Thread Tejas Belagod



Ping.

Thanks,
Tejas.

On 3/13/24 6:07 PM, Tejas Belagod wrote:

Ping!

On 3/7/24 4:14 PM, Tejas Belagod wrote:
This patch fixes a bug where vect_recog_abd_pattern called 
vect_convert_output

with the incorrect vecitype for the corresponding pattern_stmt.
vect_convert_output expects vecitype to be the vector form of the 
scalar type
of the LHS of pattern_stmt, but we were passing in the vector form of 
the LHS

of the new impending conversion statement.  This caused a skew in ABD's
pattern_stmt having the vectype of the following gimple pattern_stmt.

2024-03-06  Tejas Belagod  

gcc/ChangeLog:

PR middle-end/114108
* tree-vect-patterns.cc (vect_recog_abd_pattern): Call
vect_convert_output with the correct vecitype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr114108.c: New test.
---
  gcc/testsuite/gcc.dg/vect/pr114108.c | 19 +++
  gcc/tree-vect-patterns.cc    |  5 ++---
  2 files changed, 21 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/pr114108.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114108.c 
b/gcc/testsuite/gcc.dg/vect/pr114108.c

new file mode 100644
index 000..b3075d41398
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114108.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+#include "tree-vect.h"
+
+typedef signed char schar;
+
+__attribute__((noipa, noinline, optimize("O3")))
+void foo(const schar *a, const schar *b, schar *c, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+  unsigned u = __builtin_abs (a[i] - b[i]);
+  c[i] = u <= 7U ? u : 7U;
+    }
+}
+
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump "vect_recog_abd_pattern: detected" 
"vect" { target aarch64*-*-* } } } */

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d562f57920f..4f491c6b833 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1576,9 +1576,8 @@ vect_recog_abd_pattern (vec_info *vinfo,
    && !TYPE_UNSIGNED (abd_out_type))
  {
    tree unsign = unsigned_type_for (abd_out_type);
-  tree unsign_vectype = get_vectype_for_scalar_type (vinfo, unsign);
-  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt,
-  unsign_vectype);
+  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt, 
vectype_out);

+  vectype_out = get_vectype_for_scalar_type (vinfo, unsign);
  }
    return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, 
vectype_out);






Re: [PATCH] vect: Call vect_convert_output with the right vecitype [PR114108]

2024-03-13 Thread Tejas Belagod

Ping!

On 3/7/24 4:14 PM, Tejas Belagod wrote:

This patch fixes a bug where vect_recog_abd_pattern called vect_convert_output
with the incorrect vecitype for the corresponding pattern_stmt.
vect_convert_output expects vecitype to be the vector form of the scalar type
of the LHS of pattern_stmt, but we were passing in the vector form of the LHS
of the new impending conversion statement.  This caused a skew in ABD's
pattern_stmt having the vectype of the following gimple pattern_stmt.

2024-03-06  Tejas Belagod  

gcc/ChangeLog:

PR middle-end/114108
* tree-vect-patterns.cc (vect_recog_abd_pattern): Call
vect_convert_output with the correct vecitype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr114108.c: New test.
---
  gcc/testsuite/gcc.dg/vect/pr114108.c | 19 +++
  gcc/tree-vect-patterns.cc|  5 ++---
  2 files changed, 21 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/pr114108.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114108.c 
b/gcc/testsuite/gcc.dg/vect/pr114108.c
new file mode 100644
index 000..b3075d41398
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114108.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+#include "tree-vect.h"
+
+typedef signed char schar;
+
+__attribute__((noipa, noinline, optimize("O3")))
+void foo(const schar *a, const schar *b, schar *c, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  unsigned u = __builtin_abs (a[i] - b[i]);
+  c[i] = u <= 7U ? u : 7U;
+}
+}
+
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target aarch64*-*-* 
} } } */
+/* { dg-final { scan-tree-dump "vect_recog_abd_pattern: detected" "vect" { 
target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d562f57920f..4f491c6b833 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1576,9 +1576,8 @@ vect_recog_abd_pattern (vec_info *vinfo,
&& !TYPE_UNSIGNED (abd_out_type))
  {
tree unsign = unsigned_type_for (abd_out_type);
-  tree unsign_vectype = get_vectype_for_scalar_type (vinfo, unsign);
-  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt,
- unsign_vectype);
+  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt, 
vectype_out);
+  vectype_out = get_vectype_for_scalar_type (vinfo, unsign);
  }
  
return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out);




[PATCH] vect: Call vect_convert_output with the right vecitype [PR114108]

2024-03-07 Thread Tejas Belagod
This patch fixes a bug where vect_recog_abd_pattern called vect_convert_output
with the incorrect vecitype for the corresponding pattern_stmt.
vect_convert_output expects vecitype to be the vector form of the scalar type
of the LHS of pattern_stmt, but we were passing in the vector form of the LHS
of the new impending conversion statement.  This caused a skew in ABD's
pattern_stmt having the vectype of the following gimple pattern_stmt.

2024-03-06  Tejas Belagod  

gcc/ChangeLog:

PR middle-end/114108
* tree-vect-patterns.cc (vect_recog_abd_pattern): Call
vect_convert_output with the correct vecitype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr114108.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr114108.c | 19 +++
 gcc/tree-vect-patterns.cc|  5 ++---
 2 files changed, 21 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr114108.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114108.c 
b/gcc/testsuite/gcc.dg/vect/pr114108.c
new file mode 100644
index 000..b3075d41398
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114108.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+#include "tree-vect.h"
+
+typedef signed char schar;
+
+__attribute__((noipa, noinline, optimize("O3")))
+void foo(const schar *a, const schar *b, schar *c, int n)
+{
+  for (int i = 0; i < n; i++)
+{   
+  unsigned u = __builtin_abs (a[i] - b[i]);
+  c[i] = u <= 7U ? u : 7U; 
+}   
+}
+
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target aarch64*-*-* 
} } } */
+/* { dg-final { scan-tree-dump "vect_recog_abd_pattern: detected" "vect" { 
target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d562f57920f..4f491c6b833 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1576,9 +1576,8 @@ vect_recog_abd_pattern (vec_info *vinfo,
   && !TYPE_UNSIGNED (abd_out_type))
 {
   tree unsign = unsigned_type_for (abd_out_type);
-  tree unsign_vectype = get_vectype_for_scalar_type (vinfo, unsign);
-  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt,
- unsign_vectype);
+  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt, 
vectype_out);
+  vectype_out = get_vectype_for_scalar_type (vinfo, unsign);
 }
 
   return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out);
-- 
2.25.1



Re: [PATCH] Arm: Fix incorrect tailcall-generation for indirect calls [PR113780]

2024-02-15 Thread Tejas Belagod

On 2/14/24 3:55 PM, Richard Earnshaw (lists) wrote:

On 14/02/2024 09:20, Tejas Belagod wrote:

On 2/7/24 11:41 PM, Richard Earnshaw (lists) wrote:

On 07/02/2024 07:59, Tejas Belagod wrote:

This patch fixes a bug that causes indirect calls in PAC-enabled functions
to be tailcalled incorrectly when all argument registers R0-R3 are used.

Tested on arm-none-eabi for armv8.1-m.main. OK for trunk?

2024-02-07  Tejas Belagod  

 PR target/113780
 * gcc/config/arm.cc (arm_function_ok_for_sibcall): Don't allow tailcalls
   for indirect calls with 4 or more arguments in pac-enabled functions.

 * gcc.target/arm/pac-sibcall.c: New.
---
   gcc/config/arm/arm.cc  | 12 
   gcc/testsuite/gcc.target/arm/pac-sibcall.c | 11 +++
   2 files changed, 19 insertions(+), 4 deletions(-)
   create mode 100644 gcc/testsuite/gcc.target/arm/pac-sibcall.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index c44047c377a..c1f8286a4d4 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -7980,10 +7980,14 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
     && DECL_WEAK (decl))
   return false;
   -  /* We cannot do a tailcall for an indirect call by descriptor if all the
- argument registers are used because the only register left to load the
- address is IP and it will already contain the static chain.  */
-  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+  /* We cannot do a tailcall for an indirect call by descriptor or for an
+ indirect call in a pac-enabled function if all the argument registers
+ are used because the only register left to load the address is IP and
+ it will already contain the static chain or the PAC signature in the
+ case of PAC-enabled functions.  */


This comment is becoming a bit unwieldy.  I suggest restructuring it as:

We cannot tailcall an indirect call by descriptor if all the call-clobbered
general registers are live (r0-r3 and ip).  This can happen when:
    - IP contains the static chain, or
    - IP is needed for validating the PAC signature.



+  if (!decl
+  && ((CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+  || arm_current_function_pac_enabled_p()))
   {
     tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
     CUMULATIVE_ARGS cum;
diff --git a/gcc/testsuite/gcc.target/arm/pac-sibcall.c 
b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
new file mode 100644
index 000..c57bf7a952c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
@@ -0,0 +1,11 @@
+/* Testing return address signing.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-options " -mcpu=cortex-m85 -mbranch-protection=pac-ret+leaf -O2" } */


No, you can't just add options like this, you need to first check that they 
won't result in conflicts with other options on the command line.  See 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644077.html for an 
example of how to handle this.


Thanks for the review, Richard. Respin attached.

Thanks,
Tejas.


+
+void fail(void (*f)(int, int, int, int))
+{
+  f(1, 2, 3, 4);
+}
+
+/* { dg-final { scan-assembler-not "bx\tip\t@ indirect register sibling call" 
} } */


R.


+++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
@@ -0,0 +1,14 @@
+/* If all call-clobbered general registers are live (r0-r3, ip), disable
+   indirect tail-call for a PAC-enabled function.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
This only checks if -mbranch-protection can work with the existing 
architecture/cpu; not with the flags you're about to add below.  You should 
check for arm_arch_v8_1m_main_pacbti_ok instead; then you can assume that 
-mbranch-protection can be added.



Indeed! Thanks for catching that.


+/* { dg-add-options arm_arch_v8_1m_main_pacbti } */
+/* { dg-additional-options "-mbranch-protection=pac-ret+leaf -O2" } */

Otherwise this is OK if you fix the above.



Thanks Richard. Respin attached. Will apply.

Thanks,
Tejas.


R.
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
c44047c377a802d0c1dc1406df1b88a6b079607b..1cd69268ee986a0953cc85ab259355d2191250ac
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -7980,10 +7980,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   && DECL_WEAK (decl))
 return false;
 
-  /* We cannot do a tailcall for an indirect call by descriptor if all the
- argument registers are used because the only register left to load the
- address is IP and it will already contain the static chain.  */
-  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+  /* We cannot tailcall an indirect call by descriptor if all the 
call-clobbered
+ general registers are live (r0-r3 and ip).  This can happen when:
+ 

Re: [PATCH] Arm: Fix incorrect tailcall-generation for indirect calls [PR113780]

2024-02-14 Thread Tejas Belagod

On 2/7/24 11:41 PM, Richard Earnshaw (lists) wrote:

On 07/02/2024 07:59, Tejas Belagod wrote:

This patch fixes a bug that causes indirect calls in PAC-enabled functions
to be tailcalled incorrectly when all argument registers R0-R3 are used.

Tested on arm-none-eabi for armv8.1-m.main. OK for trunk?

2024-02-07  Tejas Belagod  

PR target/113780
* gcc/config/arm.cc (arm_function_ok_for_sibcall): Don't allow tailcalls
for indirect calls with 4 or more arguments in pac-enabled functions.

* gcc.target/arm/pac-sibcall.c: New.
---
  gcc/config/arm/arm.cc  | 12 
  gcc/testsuite/gcc.target/arm/pac-sibcall.c | 11 +++
  2 files changed, 19 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/arm/pac-sibcall.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index c44047c377a..c1f8286a4d4 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -7980,10 +7980,14 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
&& DECL_WEAK (decl))
  return false;
  
-  /* We cannot do a tailcall for an indirect call by descriptor if all the

- argument registers are used because the only register left to load the
- address is IP and it will already contain the static chain.  */
-  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+  /* We cannot do a tailcall for an indirect call by descriptor or for an
+ indirect call in a pac-enabled function if all the argument registers
+ are used because the only register left to load the address is IP and
+ it will already contain the static chain or the PAC signature in the
+ case of PAC-enabled functions.  */


This comment is becoming a bit unwieldy.  I suggest restructuring it as:

We cannot tailcall an indirect call by descriptor if all the call-clobbered
general registers are live (r0-r3 and ip).  This can happen when:
   - IP contains the static chain, or
   - IP is needed for validating the PAC signature.



+  if (!decl
+  && ((CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+ || arm_current_function_pac_enabled_p()))
  {
tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
CUMULATIVE_ARGS cum;
diff --git a/gcc/testsuite/gcc.target/arm/pac-sibcall.c 
b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
new file mode 100644
index 000..c57bf7a952c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
@@ -0,0 +1,11 @@
+/* Testing return address signing.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-options " -mcpu=cortex-m85 -mbranch-protection=pac-ret+leaf -O2" } */


No, you can't just add options like this, you need to first check that they 
won't result in conflicts with other options on the command line.  See 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644077.html for an 
example of how to handle this.


Thanks for the review, Richard. Respin attached.

Thanks,
Tejas.


+
+void fail(void (*f)(int, int, int, int))
+{
+  f(1, 2, 3, 4);
+}
+
+/* { dg-final { scan-assembler-not "bx\tip\t@ indirect register sibling call" 
} } */


R.

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
c44047c377a802d0c1dc1406df1b88a6b079607b..1cd69268ee986a0953cc85ab259355d2191250ac
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -7980,10 +7980,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   && DECL_WEAK (decl))
 return false;
 
-  /* We cannot do a tailcall for an indirect call by descriptor if all the
- argument registers are used because the only register left to load the
- address is IP and it will already contain the static chain.  */
-  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+  /* We cannot tailcall an indirect call by descriptor if all the 
call-clobbered
+ general registers are live (r0-r3 and ip).  This can happen when:
+  - IP contains the static chain, or
+  - IP is needed for validating the PAC signature.  */
+  if (!decl
+  && ((CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+ || arm_current_function_pac_enabled_p()))
 {
   tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
   CUMULATIVE_ARGS cum;
diff --git a/gcc/testsuite/gcc.target/arm/pac-sibcall.c 
b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
new file mode 100644
index 
..29686ad8ecbbb1eeff862d827c27ff3721bfa8ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
@@ -0,0 +1,14 @@
+/* If all call-clobbered general registers are live (r0-r3, ip), disable
+   indirect tail-call for a PAC-enabled function.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-add-options arm_arch_v8_1m_main_pacbti } */
+/* { dg-additional-optio

[PATCH] Arm: Fix incorrect tailcall-generation for indirect calls [PR113780]

2024-02-06 Thread Tejas Belagod
This patch fixes a bug that causes indirect calls in PAC-enabled functions
to be tailcalled incorrectly when all argument registers R0-R3 are used.

Tested on arm-none-eabi for armv8.1-m.main. OK for trunk?

2024-02-07  Tejas Belagod  

PR target/113780
* gcc/config/arm.cc (arm_function_ok_for_sibcall): Don't allow tailcalls
for indirect calls with 4 or more arguments in pac-enabled functions.

* gcc.target/arm/pac-sibcall.c: New.
---
 gcc/config/arm/arm.cc  | 12 
 gcc/testsuite/gcc.target/arm/pac-sibcall.c | 11 +++
 2 files changed, 19 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-sibcall.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index c44047c377a..c1f8286a4d4 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -7980,10 +7980,14 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   && DECL_WEAK (decl))
 return false;
 
-  /* We cannot do a tailcall for an indirect call by descriptor if all the
- argument registers are used because the only register left to load the
- address is IP and it will already contain the static chain.  */
-  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+  /* We cannot do a tailcall for an indirect call by descriptor or for an
+ indirect call in a pac-enabled function if all the argument registers
+ are used because the only register left to load the address is IP and
+ it will already contain the static chain or the PAC signature in the
+ case of PAC-enabled functions.  */
+  if (!decl
+  && ((CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
+ || arm_current_function_pac_enabled_p()))
 {
   tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
   CUMULATIVE_ARGS cum;
diff --git a/gcc/testsuite/gcc.target/arm/pac-sibcall.c 
b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
new file mode 100644
index 000..c57bf7a952c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
@@ -0,0 +1,11 @@
+/* Testing return address signing.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-options " -mcpu=cortex-m85 -mbranch-protection=pac-ret+leaf -O2" } */
+
+void fail(void (*f)(int, int, int, int))
+{
+  f(1, 2, 3, 4);
+}
+
+/* { dg-final { scan-assembler-not "bx\tip\t@ indirect register sibling call" 
} } */
-- 
2.25.1



Re: [PATCH] AArch64: aarch64_class_max_nregs mishandles 64-bit structure modes [PR112577]

2024-02-05 Thread Tejas Belagod

On 1/24/24 5:09 PM, Richard Sandiford wrote:

Tejas Belagod  writes:

The target hook aarch64_class_max_nregs returns the incorrect result for 64-bit
structure modes like V31DImode or V41DFmode etc.  The calculation of the nregs
is based on the size of AdvSIMD vector register for 64-bit modes which ought to
be UNITS_PER_VREG / 2.  This patch fixes the register size.

Existing tests like gcc.target/aarch64/advsimd-intrinsics/vld1x3.c cover this 
change.

Regression tested on aarch64-linux. Bootstrapped on aarch64-linux.

OK for trunk?

gcc/ChangeLog:

PR target/112577
* config/aarch64/aarch64.cc (aarch64_class_max_nregs): Handle 64-bit
vector structure modes correctly.
---
  gcc/config/aarch64/aarch64.cc | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a5a6b52730d..b9f00bdce3b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12914,10 +12914,12 @@ aarch64_class_max_nregs (reg_class_t regclass, 
machine_mode mode)
  && constant_multiple_p (GET_MODE_SIZE (mode),
  aarch64_vl_bytes (mode, vec_flags), ))
return nregs;
-  return (vec_flags & VEC_ADVSIMD
- ? CEIL (lowest_size, UNITS_PER_VREG)
- : CEIL (lowest_size, UNITS_PER_WORD));
-
+  if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL))
+   return GET_MODE_SIZE (mode).to_constant () / 8;
+  else
+   return (vec_flags & VEC_ADVSIMD
+   ? CEIL (lowest_size, UNITS_PER_VREG)
+   : CEIL (lowest_size, UNITS_PER_WORD));


Very minor, sorry, but I think it would be more usual style to add the
new condition as an early-out and so not add an "else", especially since
there's alreaedy an early-out for SVE above:

   if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL))
return GET_MODE_SIZE (mode).to_constant () / 8;
   return (vec_flags & VEC_ADVSIMD
  ? CEIL (lowest_size, UNITS_PER_VREG)
  : CEIL (lowest_size, UNITS_PER_WORD));

I think it's also worth keeping the blank line between this and the
following block of cases.

OK with that change, thanks.

Richard


Thanks for the review, Richard. Re-spin attached. Will apply.

Thanks,
Tejas.





  case PR_REGS:
  case PR_LO_REGS:
  case PR_HI_REGS:
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
a5a6b52730d6c5013346d128e89915883f1707ae..a7c624f8b7327ae8c1324959c3ab5dfb4e7ebc6c
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12914,6 +12914,8 @@ aarch64_class_max_nregs (reg_class_t regclass, 
machine_mode mode)
  && constant_multiple_p (GET_MODE_SIZE (mode),
  aarch64_vl_bytes (mode, vec_flags), ))
return nregs;
+  if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL))
+   return GET_MODE_SIZE (mode).to_constant () / 8;
   return (vec_flags & VEC_ADVSIMD
  ? CEIL (lowest_size, UNITS_PER_VREG)
  : CEIL (lowest_size, UNITS_PER_WORD));


[PATCH] AArch64: aarch64_class_max_nregs mishandles 64-bit structure modes [PR112577]

2024-01-16 Thread Tejas Belagod
The target hook aarch64_class_max_nregs returns the incorrect result for 64-bit
structure modes like V31DImode or V41DFmode etc.  The calculation of the nregs
is based on the size of AdvSIMD vector register for 64-bit modes which ought to
be UNITS_PER_VREG / 2.  This patch fixes the register size.

Existing tests like gcc.target/aarch64/advsimd-intrinsics/vld1x3.c cover this 
change.

Regression tested on aarch64-linux. Bootstrapped on aarch64-linux.

OK for trunk?

gcc/ChangeLog:

PR target/112577
* config/aarch64/aarch64.cc (aarch64_class_max_nregs): Handle 64-bit
vector structure modes correctly.
---
 gcc/config/aarch64/aarch64.cc | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a5a6b52730d..b9f00bdce3b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12914,10 +12914,12 @@ aarch64_class_max_nregs (reg_class_t regclass, 
machine_mode mode)
  && constant_multiple_p (GET_MODE_SIZE (mode),
  aarch64_vl_bytes (mode, vec_flags), ))
return nregs;
-  return (vec_flags & VEC_ADVSIMD
- ? CEIL (lowest_size, UNITS_PER_VREG)
- : CEIL (lowest_size, UNITS_PER_WORD));
-
+  if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL))
+   return GET_MODE_SIZE (mode).to_constant () / 8;
+  else
+   return (vec_flags & VEC_ADVSIMD
+   ? CEIL (lowest_size, UNITS_PER_VREG)
+   : CEIL (lowest_size, UNITS_PER_WORD));
 case PR_REGS:
 case PR_LO_REGS:
 case PR_HI_REGS:
-- 
2.25.1



Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-07-26 Thread Tejas Belagod via Gcc-patches

On 7/17/23 5:46 PM, Richard Biener wrote:

On Fri, Jul 14, 2023 at 12:18 PM Tejas Belagod  wrote:


On 7/13/23 4:05 PM, Richard Biener wrote:

On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod  wrote:


On 7/3/23 1:31 PM, Richard Biener wrote:

On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod  wrote:


On 6/29/23 6:55 PM, Richard Biener wrote:

On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod  wrote:






From: Richard Biener 
Date: Tuesday, June 27, 2023 at 12:58 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:






From: Richard Biener 
Date: Monday, June 26, 2023 at 2:23 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
 wrote:


Hi,

Packed Boolean Vectors
--

I'd like to propose a feature addition to GNU Vector extensions to add packed
boolean vectors (PBV).  This has been discussed in the past here[1] and a 
variant has
been implemented in Clang recently[2].

With predication features being added to vector architectures (SVE, MVE, AVX),
it is a useful feature to have to model predication on targets.  This could
find its use in intrinsics or just used as is as a GNU vector extension being
mapped to underlying target features.  For example, the packed boolean vector
could directly map to a predicate register on SVE.

Also, this new packed boolean type GNU extension can be used with SVE ACLE
intrinsics to replace a fixed-length svbool_t.

Here are a few options to represent the packed boolean vector type.


The GIMPLE frontend uses a new 'vector_mask' attribute:

typedef int v8si __attribute__((vector_size(8*sizeof(int;
typedef v8si v8sib __attribute__((vector_mask));

it get's you a vector type that's the appropriate (dependent on the
target) vector
mask type for the vector data type (v8si in this case).



Thanks Richard.

Having had a quick look at the implementation, it does seem to tick the boxes.

I must admit I haven't dug deep, but if the target hook allows the mask to be

defined in way that is target-friendly (and I don't know how much effort it will

be to migrate the attribute to more front-ends), it should do the job nicely.

Let me go back and dig a bit deeper and get back with questions if any.



Let me add that the advantage of this is the compiler doesn't need
to support weird explicitely laid out packed boolean vectors that do
not match what the target supports and the user doesn't need to know
what the target supports (and thus have an #ifdef maze around explicitely
specified layouts).

Sorry for the delayed response – I spent a day experimenting with vector_mask.



Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough

to avoid having to sprinkle the code with ifdefs.


It does remove some flexibility though, for example with -mavx512f -mavx512vl
you'll get AVX512 style masks for V4SImode data vectors but of course the
target sill supports SSE2/AVX2 style masks as well, but those would not be
available as "packed boolean vectors", though they are of course in fact
equal to V4SImode data vectors with -1 or 0 values, so in this particular
case it might not matter.

That said, the vector_mask attribute will get you V4SImode vectors with
signed boolean elements of 32 bits for V4SImode data vectors with
SSE2/AVX2.



This sounds very much like what the scenario would be with NEON vs SVE. Coming 
to think

of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
‘base’ vector type

and a ‘w’ specified for the type.



Given its current implementation, if vector_mask is exposed to the CFE, would 
there be any

major challenges wrt implementation or defining behaviour semantics? I played 
around with a

few examples from the testsuite and wrote some new ones. I mostly tried 
operations that

the new type would have to support (unary, binary bitwise, initializations etc) 
– with a couple of exceptions

most of the ops seem to be supported. I also triggered a couple of ICEs in some 
tests involving

implicit conversions to wider/narrower vector_mask types (will raise reports 
for these). Correct me

if I’m wrong here, but we’d probably have to support a couple of new ops if 
vector_mask is exposed

to the CFE – initialization and subscript operations?


Yes, either that or restrict how the mask vectors can be used, thus
properly diagnose improper
uses.


Indeed.

 A question would be for example how to write common mask test

operations like
if (any (mask)) or if (all (mask)).


I see 2 options here. New builtins could support new types - they'd
provide a target independent way to test any and all conditions. Another
would be to let the target use its intrinsics to do them in the most
efficient way possible (which the builtins would get lowered down

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-07-14 Thread Tejas Belagod via Gcc-patches

On 7/13/23 4:05 PM, Richard Biener wrote:

On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod  wrote:


On 7/3/23 1:31 PM, Richard Biener wrote:

On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod  wrote:


On 6/29/23 6:55 PM, Richard Biener wrote:

On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod  wrote:






From: Richard Biener 
Date: Tuesday, June 27, 2023 at 12:58 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:






From: Richard Biener 
Date: Monday, June 26, 2023 at 2:23 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
 wrote:


Hi,

Packed Boolean Vectors
--

I'd like to propose a feature addition to GNU Vector extensions to add packed
boolean vectors (PBV).  This has been discussed in the past here[1] and a 
variant has
been implemented in Clang recently[2].

With predication features being added to vector architectures (SVE, MVE, AVX),
it is a useful feature to have to model predication on targets.  This could
find its use in intrinsics or just used as is as a GNU vector extension being
mapped to underlying target features.  For example, the packed boolean vector
could directly map to a predicate register on SVE.

Also, this new packed boolean type GNU extension can be used with SVE ACLE
intrinsics to replace a fixed-length svbool_t.

Here are a few options to represent the packed boolean vector type.


The GIMPLE frontend uses a new 'vector_mask' attribute:

typedef int v8si __attribute__((vector_size(8*sizeof(int;
typedef v8si v8sib __attribute__((vector_mask));

it get's you a vector type that's the appropriate (dependent on the
target) vector
mask type for the vector data type (v8si in this case).



Thanks Richard.

Having had a quick look at the implementation, it does seem to tick the boxes.

I must admit I haven't dug deep, but if the target hook allows the mask to be

defined in way that is target-friendly (and I don't know how much effort it will

be to migrate the attribute to more front-ends), it should do the job nicely.

Let me go back and dig a bit deeper and get back with questions if any.



Let me add that the advantage of this is the compiler doesn't need
to support weird explicitely laid out packed boolean vectors that do
not match what the target supports and the user doesn't need to know
what the target supports (and thus have an #ifdef maze around explicitely
specified layouts).

Sorry for the delayed response – I spent a day experimenting with vector_mask.



Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough

to avoid having to sprinkle the code with ifdefs.


It does remove some flexibility though, for example with -mavx512f -mavx512vl
you'll get AVX512 style masks for V4SImode data vectors but of course the
target sill supports SSE2/AVX2 style masks as well, but those would not be
available as "packed boolean vectors", though they are of course in fact
equal to V4SImode data vectors with -1 or 0 values, so in this particular
case it might not matter.

That said, the vector_mask attribute will get you V4SImode vectors with
signed boolean elements of 32 bits for V4SImode data vectors with
SSE2/AVX2.



This sounds very much like what the scenario would be with NEON vs SVE. Coming 
to think

of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
‘base’ vector type

and a ‘w’ specified for the type.



Given its current implementation, if vector_mask is exposed to the CFE, would 
there be any

major challenges wrt implementation or defining behaviour semantics? I played 
around with a

few examples from the testsuite and wrote some new ones. I mostly tried 
operations that

the new type would have to support (unary, binary bitwise, initializations etc) 
– with a couple of exceptions

most of the ops seem to be supported. I also triggered a couple of ICEs in some 
tests involving

implicit conversions to wider/narrower vector_mask types (will raise reports 
for these). Correct me

if I’m wrong here, but we’d probably have to support a couple of new ops if 
vector_mask is exposed

to the CFE – initialization and subscript operations?


Yes, either that or restrict how the mask vectors can be used, thus
properly diagnose improper
uses.


Indeed.

A question would be for example how to write common mask test

operations like
if (any (mask)) or if (all (mask)).


I see 2 options here. New builtins could support new types - they'd
provide a target independent way to test any and all conditions. Another
would be to let the target use its intrinsics to do them in the most
efficient way possible (which the builtins would get lowered down to
anyway).


Likewise writing merge operations

- do those as

a = a | (mask ? b : 0);

thus u

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-07-13 Thread Tejas Belagod via Gcc-patches

On 7/3/23 1:31 PM, Richard Biener wrote:

On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod  wrote:


On 6/29/23 6:55 PM, Richard Biener wrote:

On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod  wrote:






From: Richard Biener 
Date: Tuesday, June 27, 2023 at 12:58 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:






From: Richard Biener 
Date: Monday, June 26, 2023 at 2:23 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
 wrote:


Hi,

Packed Boolean Vectors
--

I'd like to propose a feature addition to GNU Vector extensions to add packed
boolean vectors (PBV).  This has been discussed in the past here[1] and a 
variant has
been implemented in Clang recently[2].

With predication features being added to vector architectures (SVE, MVE, AVX),
it is a useful feature to have to model predication on targets.  This could
find its use in intrinsics or just used as is as a GNU vector extension being
mapped to underlying target features.  For example, the packed boolean vector
could directly map to a predicate register on SVE.

Also, this new packed boolean type GNU extension can be used with SVE ACLE
intrinsics to replace a fixed-length svbool_t.

Here are a few options to represent the packed boolean vector type.


The GIMPLE frontend uses a new 'vector_mask' attribute:

typedef int v8si __attribute__((vector_size(8*sizeof(int;
typedef v8si v8sib __attribute__((vector_mask));

it get's you a vector type that's the appropriate (dependent on the
target) vector
mask type for the vector data type (v8si in this case).



Thanks Richard.

Having had a quick look at the implementation, it does seem to tick the boxes.

I must admit I haven't dug deep, but if the target hook allows the mask to be

defined in way that is target-friendly (and I don't know how much effort it will

be to migrate the attribute to more front-ends), it should do the job nicely.

Let me go back and dig a bit deeper and get back with questions if any.



Let me add that the advantage of this is the compiler doesn't need
to support weird explicitely laid out packed boolean vectors that do
not match what the target supports and the user doesn't need to know
what the target supports (and thus have an #ifdef maze around explicitely
specified layouts).

Sorry for the delayed response – I spent a day experimenting with vector_mask.



Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough

to avoid having to sprinkle the code with ifdefs.


It does remove some flexibility though, for example with -mavx512f -mavx512vl
you'll get AVX512 style masks for V4SImode data vectors but of course the
target sill supports SSE2/AVX2 style masks as well, but those would not be
available as "packed boolean vectors", though they are of course in fact
equal to V4SImode data vectors with -1 or 0 values, so in this particular
case it might not matter.

That said, the vector_mask attribute will get you V4SImode vectors with
signed boolean elements of 32 bits for V4SImode data vectors with
SSE2/AVX2.



This sounds very much like what the scenario would be with NEON vs SVE. Coming 
to think

of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
‘base’ vector type

and a ‘w’ specified for the type.



Given its current implementation, if vector_mask is exposed to the CFE, would 
there be any

major challenges wrt implementation or defining behaviour semantics? I played 
around with a

few examples from the testsuite and wrote some new ones. I mostly tried 
operations that

the new type would have to support (unary, binary bitwise, initializations etc) 
– with a couple of exceptions

most of the ops seem to be supported. I also triggered a couple of ICEs in some 
tests involving

implicit conversions to wider/narrower vector_mask types (will raise reports 
for these). Correct me

if I’m wrong here, but we’d probably have to support a couple of new ops if 
vector_mask is exposed

to the CFE – initialization and subscript operations?


Yes, either that or restrict how the mask vectors can be used, thus
properly diagnose improper
uses.


Indeed.

   A question would be for example how to write common mask test

operations like
if (any (mask)) or if (all (mask)).


I see 2 options here. New builtins could support new types - they'd
provide a target independent way to test any and all conditions. Another
would be to let the target use its intrinsics to do them in the most
efficient way possible (which the builtins would get lowered down to
anyway).


   Likewise writing merge operations

- do those as

   a = a | (mask ? b : 0);

thus use ternary ?: for this?


Yes, like now, the ternary could just translate to

{mask[0] ? b[0] : 0, mask[1] 

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-07-03 Thread Tejas Belagod via Gcc-patches

On 6/29/23 6:55 PM, Richard Biener wrote:

On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod  wrote:






From: Richard Biener 
Date: Tuesday, June 27, 2023 at 12:58 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:






From: Richard Biener 
Date: Monday, June 26, 2023 at 2:23 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
 wrote:


Hi,

Packed Boolean Vectors
--

I'd like to propose a feature addition to GNU Vector extensions to add packed
boolean vectors (PBV).  This has been discussed in the past here[1] and a 
variant has
been implemented in Clang recently[2].

With predication features being added to vector architectures (SVE, MVE, AVX),
it is a useful feature to have to model predication on targets.  This could
find its use in intrinsics or just used as is as a GNU vector extension being
mapped to underlying target features.  For example, the packed boolean vector
could directly map to a predicate register on SVE.

Also, this new packed boolean type GNU extension can be used with SVE ACLE
intrinsics to replace a fixed-length svbool_t.

Here are a few options to represent the packed boolean vector type.


The GIMPLE frontend uses a new 'vector_mask' attribute:

typedef int v8si __attribute__((vector_size(8*sizeof(int;
typedef v8si v8sib __attribute__((vector_mask));

it get's you a vector type that's the appropriate (dependent on the
target) vector
mask type for the vector data type (v8si in this case).



Thanks Richard.

Having had a quick look at the implementation, it does seem to tick the boxes.

I must admit I haven't dug deep, but if the target hook allows the mask to be

defined in way that is target-friendly (and I don't know how much effort it will

be to migrate the attribute to more front-ends), it should do the job nicely.

Let me go back and dig a bit deeper and get back with questions if any.



Let me add that the advantage of this is the compiler doesn't need
to support weird explicitely laid out packed boolean vectors that do
not match what the target supports and the user doesn't need to know
what the target supports (and thus have an #ifdef maze around explicitely
specified layouts).

Sorry for the delayed response – I spent a day experimenting with vector_mask.



Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough

to avoid having to sprinkle the code with ifdefs.


It does remove some flexibility though, for example with -mavx512f -mavx512vl
you'll get AVX512 style masks for V4SImode data vectors but of course the
target sill supports SSE2/AVX2 style masks as well, but those would not be
available as "packed boolean vectors", though they are of course in fact
equal to V4SImode data vectors with -1 or 0 values, so in this particular
case it might not matter.

That said, the vector_mask attribute will get you V4SImode vectors with
signed boolean elements of 32 bits for V4SImode data vectors with
SSE2/AVX2.



This sounds very much like what the scenario would be with NEON vs SVE. Coming 
to think

of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
‘base’ vector type

and a ‘w’ specified for the type.



Given its current implementation, if vector_mask is exposed to the CFE, would 
there be any

major challenges wrt implementation or defining behaviour semantics? I played 
around with a

few examples from the testsuite and wrote some new ones. I mostly tried 
operations that

the new type would have to support (unary, binary bitwise, initializations etc) 
– with a couple of exceptions

most of the ops seem to be supported. I also triggered a couple of ICEs in some 
tests involving

implicit conversions to wider/narrower vector_mask types (will raise reports 
for these). Correct me

if I’m wrong here, but we’d probably have to support a couple of new ops if 
vector_mask is exposed

to the CFE – initialization and subscript operations?


Yes, either that or restrict how the mask vectors can be used, thus
properly diagnose improper
uses. 


Indeed.

 A question would be for example how to write common mask test

operations like
if (any (mask)) or if (all (mask)). 


I see 2 options here. New builtins could support new types - they'd 
provide a target independent way to test any and all conditions. Another 
would be to let the target use its intrinsics to do them in the most 
efficient way possible (which the builtins would get lowered down to 
anyway).



 Likewise writing merge operations

- do those as

  a = a | (mask ? b : 0);

thus use ternary ?: for this?  


Yes, like now, the ternary could just translate to

  {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... }

One thing to flesh out is the semantics. Should we allow this operatio

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-06-28 Thread Tejas Belagod via Gcc-patches


From: Richard Biener 
Date: Tuesday, June 27, 2023 at 12:58 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:
>
>
>
>
>
> From: Richard Biener 
> Date: Monday, June 26, 2023 at 2:23 PM
> To: Tejas Belagod 
> Cc: gcc-patches@gcc.gnu.org 
> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
>
> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > Packed Boolean Vectors
> > --
> >
> > I'd like to propose a feature addition to GNU Vector extensions to add 
> > packed
> > boolean vectors (PBV).  This has been discussed in the past here[1] and a 
> > variant has
> > been implemented in Clang recently[2].
> >
> > With predication features being added to vector architectures (SVE, MVE, 
> > AVX),
> > it is a useful feature to have to model predication on targets.  This could
> > find its use in intrinsics or just used as is as a GNU vector extension 
> > being
> > mapped to underlying target features.  For example, the packed boolean 
> > vector
> > could directly map to a predicate register on SVE.
> >
> > Also, this new packed boolean type GNU extension can be used with SVE ACLE
> > intrinsics to replace a fixed-length svbool_t.
> >
> > Here are a few options to represent the packed boolean vector type.
>
> The GIMPLE frontend uses a new 'vector_mask' attribute:
>
> typedef int v8si __attribute__((vector_size(8*sizeof(int;
> typedef v8si v8sib __attribute__((vector_mask));
>
> it get's you a vector type that's the appropriate (dependent on the
> target) vector
> mask type for the vector data type (v8si in this case).
>
>
>
> Thanks Richard.
>
> Having had a quick look at the implementation, it does seem to tick the boxes.
>
> I must admit I haven't dug deep, but if the target hook allows the mask to be
>
> defined in way that is target-friendly (and I don't know how much effort it 
> will
>
> be to migrate the attribute to more front-ends), it should do the job nicely.
>
> Let me go back and dig a bit deeper and get back with questions if any.


Let me add that the advantage of this is the compiler doesn't need
to support weird explicitely laid out packed boolean vectors that do
not match what the target supports and the user doesn't need to know
what the target supports (and thus have an #ifdef maze around explicitely
specified layouts).

Sorry for the delayed response – I spent a day experimenting with vector_mask.

Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough
to avoid having to sprinkle the code with ifdefs.

It does remove some flexibility though, for example with -mavx512f -mavx512vl
you'll get AVX512 style masks for V4SImode data vectors but of course the
target sill supports SSE2/AVX2 style masks as well, but those would not be
available as "packed boolean vectors", though they are of course in fact
equal to V4SImode data vectors with -1 or 0 values, so in this particular
case it might not matter.

That said, the vector_mask attribute will get you V4SImode vectors with
signed boolean elements of 32 bits for V4SImode data vectors with
SSE2/AVX2.

This sounds very much like what the scenario would be with NEON vs SVE. Coming 
to think
of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
‘base’ vector type
and a ‘w’ specified for the type.

Given its current implementation, if vector_mask is exposed to the CFE, would 
there be any
major challenges wrt implementation or defining behaviour semantics? I played 
around with a
few examples from the testsuite and wrote some new ones. I mostly tried 
operations that
the new type would have to support (unary, binary bitwise, initializations etc) 
– with a couple of exceptions
most of the ops seem to be supported. I also triggered a couple of ICEs in some 
tests involving
implicit conversions to wider/narrower vector_mask types (will raise reports 
for these). Correct me
if I’m wrong here, but we’d probably have to support a couple of new ops if 
vector_mask is exposed
to the CFE – initialization and subscript operations?


Thanks,
Tejas.



Richard.

>
>
> Thanks,
>
> Tejas.
>
>
>
>
>
>
>
> > 1. __attribute__((vector_size (n))) where n represents bytes
> >
> >   typedef bool vbool __attribute__ ((vector_size (1)));
> >
> > In this approach, the shape of the boolean vector is unclear. IoW, it is not
> > clear if each bit in 'n' controls a byte or an element. On targets
> > like SVE, it would be natural to have each bit control a byte of the target
> &

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-06-27 Thread Tejas Belagod via Gcc-patches


From: Richard Biener 
Date: Monday, June 26, 2023 at 2:23 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
 wrote:
>
> Hi,
>
> Packed Boolean Vectors
> --
>
> I'd like to propose a feature addition to GNU Vector extensions to add packed
> boolean vectors (PBV).  This has been discussed in the past here[1] and a 
> variant has
> been implemented in Clang recently[2].
>
> With predication features being added to vector architectures (SVE, MVE, AVX),
> it is a useful feature to have to model predication on targets.  This could
> find its use in intrinsics or just used as is as a GNU vector extension being
> mapped to underlying target features.  For example, the packed boolean vector
> could directly map to a predicate register on SVE.
>
> Also, this new packed boolean type GNU extension can be used with SVE ACLE
> intrinsics to replace a fixed-length svbool_t.
>
> Here are a few options to represent the packed boolean vector type.

The GIMPLE frontend uses a new 'vector_mask' attribute:

typedef int v8si __attribute__((vector_size(8*sizeof(int;
typedef v8si v8sib __attribute__((vector_mask));

it get's you a vector type that's the appropriate (dependent on the
target) vector
mask type for the vector data type (v8si in this case).


Thanks Richard.

Having had a quick look at the implementation, it does seem to tick the boxes.

I must admit I haven't dug deep, but if the target hook allows the mask to be

defined in way that is target-friendly (and I don't know how much effort it will

be to migrate the attribute to more front-ends), it should do the job nicely.

Let me go back and dig a bit deeper and get back with questions if any.

Thanks,
Tejas.




> 1. __attribute__((vector_size (n))) where n represents bytes
>
>   typedef bool vbool __attribute__ ((vector_size (1)));
>
> In this approach, the shape of the boolean vector is unclear. IoW, it is not
> clear if each bit in 'n' controls a byte or an element. On targets
> like SVE, it would be natural to have each bit control a byte of the target
> vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX, 
> each
> bit would control one element/lane on the target vector(therefore resulting 
> in a
> 'packed' layout with all significant bits at the LSB).
>
> 2. __attribute__((vector_size (n))) where n represents num of lanes
>
>   typedef int v4si __attribute__ ((vector_size (4 * sizeof (int)));
>   typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof 
> (v4si){0}[0])));
>
> Here the 'n' in the vector_size attribute represents the number of bits that
> is needed to represent a vector quantity.  In this case, this packed boolean
> vector can represent upto 'n' vector lanes. The size of the type is
> rounded up the nearest byte.  For example, the sizeof v4bi in the above
> example is 1.
>
> In this approach, because of the nature of the representation, the n bits 
> required
> to represent the n lanes of the vector are packed at the LSB. This does not 
> naturally
> align with the SVE approach of each bit representing a byte of the target 
> vector
> and PBV therefore having an 'unpacked' layout.
>
> More importantly, another drawback here is that the change in units for 
> vector_size
> might be confusing to programmers.  The units will have to be interpreted 
> based on the
> base type of the typedef. It does not offer any flexibility in terms of the 
> layout of
> the bool vector - it is fixed.
>
> 3. Combination of 1 and 2.
>
> Combining the best of 1 and 2, we can introduce extra parameters to 
> vector_size that will
> unambiguously represent the layout of the PBV. Consider
>
>   typedef bool vbool __attribute__((vector_size (s, n[, w])));
>
> where 's' is size in bytes, 'n' is the number of lanes and an optional 3rd 
> parameter 'w'
> is the number of bits of the PBV that represents a lane of the target vector. 
> 'w' would
> allow a target to force a certain layout of the PBV.
>
> The 2-parameter form of vector_size allows the target to have an
> implementation-defined layout of the PBV. The target is free to choose the 'w'
> if it is not specified to mirror the target layout of predicate registers. For
> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n.
>
> As an example, to represent the result of a comparison on 2 int16x8_t, we'd 
> need
> 8 lanes of boolean which could be represented by
>
>   typedef bool v8b __attribute__ ((vector_size (2, 8)));
>
> SVE would implement v8b layout to make every 2nd bit significant i.e. w == 2
>
> and AVX would choose a layout where all 8 consecutive bit

[RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-06-26 Thread Tejas Belagod via Gcc-patches
Hi,

Packed Boolean Vectors
--

I'd like to propose a feature addition to GNU Vector extensions to add packed
boolean vectors (PBV).  This has been discussed in the past here[1] and a 
variant has
been implemented in Clang recently[2].

With predication features being added to vector architectures (SVE, MVE, AVX),
it is a useful feature to have to model predication on targets.  This could
find its use in intrinsics or just used as is as a GNU vector extension being
mapped to underlying target features.  For example, the packed boolean vector
could directly map to a predicate register on SVE.

Also, this new packed boolean type GNU extension can be used with SVE ACLE
intrinsics to replace a fixed-length svbool_t.

Here are a few options to represent the packed boolean vector type.

1. __attribute__((vector_size (n))) where n represents bytes

  typedef bool vbool __attribute__ ((vector_size (1)));

In this approach, the shape of the boolean vector is unclear. IoW, it is not
clear if each bit in 'n' controls a byte or an element. On targets
like SVE, it would be natural to have each bit control a byte of the target
vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX, each
bit would control one element/lane on the target vector(therefore resulting in a
'packed' layout with all significant bits at the LSB).

2. __attribute__((vector_size (n))) where n represents num of lanes

  typedef int v4si __attribute__ ((vector_size (4 * sizeof (int)));
  typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof 
(v4si){0}[0])));

Here the 'n' in the vector_size attribute represents the number of bits that
is needed to represent a vector quantity.  In this case, this packed boolean
vector can represent upto 'n' vector lanes. The size of the type is
rounded up the nearest byte.  For example, the sizeof v4bi in the above
example is 1.

In this approach, because of the nature of the representation, the n bits 
required
to represent the n lanes of the vector are packed at the LSB. This does not 
naturally
align with the SVE approach of each bit representing a byte of the target vector
and PBV therefore having an 'unpacked' layout.

More importantly, another drawback here is that the change in units for 
vector_size
might be confusing to programmers.  The units will have to be interpreted based 
on the
base type of the typedef. It does not offer any flexibility in terms of the 
layout of
the bool vector - it is fixed.

3. Combination of 1 and 2.

Combining the best of 1 and 2, we can introduce extra parameters to vector_size 
that will
unambiguously represent the layout of the PBV. Consider

  typedef bool vbool __attribute__((vector_size (s, n[, w])));

where 's' is size in bytes, 'n' is the number of lanes and an optional 3rd 
parameter 'w'
is the number of bits of the PBV that represents a lane of the target vector. 
'w' would
allow a target to force a certain layout of the PBV.

The 2-parameter form of vector_size allows the target to have an
implementation-defined layout of the PBV. The target is free to choose the 'w'
if it is not specified to mirror the target layout of predicate registers. For
eg. AVX would choose 'w' as 1 and SVE would choose s*8/n.

As an example, to represent the result of a comparison on 2 int16x8_t, we'd need
8 lanes of boolean which could be represented by

  typedef bool v8b __attribute__ ((vector_size (2, 8)));

SVE would implement v8b layout to make every 2nd bit significant i.e. w == 2

and AVX would choose a layout where all 8 consecutive bits packed at LSB would
be significant i.e. w == 1.

This scheme would accomodate more than 1 target to effectively represent vector
bools that mirror the target properties.

4. A new attribite

This is based on a suggestion from Richard S in [3]. The idea is to introduce a 
new
attribute to define the PBV and make it general enough to

* represent all targets flexibly (SVE, AVX etc)
* represent sub-byte length predicates
* have no change in units of vector_size/no new vector_size signature
* not have the number of bytes constrain representation

If we call the new attribute 'bool_vec' (for lack of a better name), consider

  typedef bool vbool __attribute__((bool_vec (n[, w])))

where 'n' represents number of lanes/elements and the optional 'w' is 
bits-per-lane.

If 'w' is not specified, it and bytes-per-predicate are implementation-defined 
based on target.
If 'w' is specified,  sizeof (vbool) will be ceil (n*w/8).

5. Behaviour of the packed vector boolean type.

Taking the example of one of the options above, following is an illustration of 
it's behavior

* ABI

  New ABI rules will need to be defined for this type - eg alignment, PCS,
  mangling etc

* Initialization:

  Packed Boolean Vectors(PBV) can be initialized like so:

typedef bool v4bi __attribute__ ((vector_size (2, 4, 4)));
v4bi p = {false, true, false, false};

  Each value in the initizlizer constant is of type bool. The 

Re: [PATCH v2] [PR96339] Optimise svlast[ab]

2023-06-14 Thread Tejas Belagod via Gcc-patches



From: Kyrylo Tkachov 
Date: Wednesday, June 14, 2023 at 10:11 PM
To: Prathamesh Kulkarni , Tejas Belagod 

Cc: Richard Sandiford , gcc-patches@gcc.gnu.org 

Subject: RE: [PATCH v2] [PR96339] Optimise svlast[ab]


> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Prathamesh
> Kulkarni via Gcc-patches
> Sent: Wednesday, June 14, 2023 8:13 AM
> To: Tejas Belagod 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH v2] [PR96339] Optimise svlast[ab]
>
> On Tue, 13 Jun 2023 at 12:38, Tejas Belagod via Gcc-patches
>  wrote:
> >
> >
> >
> > From: Richard Sandiford 
> > Date: Monday, June 12, 2023 at 2:15 PM
> > To: Tejas Belagod 
> > Cc: gcc-patches@gcc.gnu.org , Tejas Belagod
> 
> > Subject: Re: [PATCH v2] [PR96339] Optimise svlast[ab]
> > Tejas Belagod  writes:
> > > From: Tejas Belagod 
> > >
> > >   This PR optimizes an SVE intrinsics sequence where
> > > svlasta (svptrue_pat_b8 (SV_VL1), x)
> > >   a scalar is selected based on a constant predicate and a variable 
> > > vector.
> > >   This sequence is optimized to return the correspoding element of a
> NEON
> > >   vector. For eg.
> > > svlasta (svptrue_pat_b8 (SV_VL1), x)
> > >   returns
> > > umovw0, v0.b[1]
> > >   Likewise,
> > > svlastb (svptrue_pat_b8 (SV_VL1), x)
> > >   returns
> > >  umovw0, v0.b[0]
> > >   This optimization only works provided the constant predicate maps to a
> range
> > >   that is within the bounds of a 128-bit NEON register.
> > >
> > > gcc/ChangeLog:
> > >
> > >PR target/96339
> > >* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): 
> > > Fold
> sve
> > >calls that have a constant input predicate vector.
> > >(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
> > >(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
> > >(svlast_impl::vect_all_same): Check if all vector elements are 
> > > equal.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >PR target/96339
> > >* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
> > >* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
> > >* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
> > >* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
> > >to expect optimized code for function body.
> > >* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): 
> > > Likewise.
> >
> > OK, thanks.
> >
> > Applied on master, thanks.
> Hi Tejas,
> This seems to break aarch64 bootstrap build with following error due
> to -Wsign-compare diagnostic:
> 00:18:19 /home/tcwg-
> buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/config/
> aarch64/aarch64-sve-builtins-base.cc:1133:35:
> error: comparison of integer expressions of different signedness:
> ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
> 00:18:19  1133 | for (i = npats; i < enelts; i += step_1)
> 00:18:19  | ~~^~~~
> 00:30:46 abe-debug-build: cc1plus: all warnings being treated as errors
> 00:30:46 abe-debug-build: make[3]: ***
> [/home/tcwg-
> buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/config/
> aarch64/t-aarch64:96:
> aarch64-sve-builtins-base.o] Error 1

Fixed thusly in trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold):
Fix signed comparison warning in loop from npats to enelts.


Ah, sorry for breaking BS and thanks Kyrill for the fix.

Tejas.

>
> Thanks,
> Prathamesh
> >
> > Tejas.
> >
> >
> > Richard


Re: [PATCH v2] [PR96339] Optimise svlast[ab]

2023-06-13 Thread Tejas Belagod via Gcc-patches



From: Richard Sandiford 
Date: Monday, June 12, 2023 at 2:15 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org , Tejas Belagod 

Subject: Re: [PATCH v2] [PR96339] Optimise svlast[ab]
Tejas Belagod  writes:
> From: Tejas Belagod 
>
>   This PR optimizes an SVE intrinsics sequence where
> svlasta (svptrue_pat_b8 (SV_VL1), x)
>   a scalar is selected based on a constant predicate and a variable vector.
>   This sequence is optimized to return the correspoding element of a NEON
>   vector. For eg.
> svlasta (svptrue_pat_b8 (SV_VL1), x)
>   returns
> umovw0, v0.b[1]
>   Likewise,
> svlastb (svptrue_pat_b8 (SV_VL1), x)
>   returns
>  umovw0, v0.b[0]
>   This optimization only works provided the constant predicate maps to a range
>   that is within the bounds of a 128-bit NEON register.
>
> gcc/ChangeLog:
>
>PR target/96339
>* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): 
> Fold sve
>calls that have a constant input predicate vector.
>(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
>(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
>(svlast_impl::vect_all_same): Check if all vector elements are equal.
>
> gcc/testsuite/ChangeLog:
>
>PR target/96339
>* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
>* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
>* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
>* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
>to expect optimized code for function body.
>* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.

OK, thanks.

Applied on master, thanks.

Tejas.


Richard


Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-06-12 Thread Tejas Belagod via Gcc-patches


From: Richard Sandiford 
Date: Friday, May 19, 2023 at 3:20 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]
Tejas Belagod  writes:
> Am I correct to understand that we still need to check for the case when
> there's a repeating non-zero elements in the case of NELTS_PER_PATTERN == 2?
> eg. { 0, 0, 1, 1, 1, 1,} which should be encoded as {0, 0, 1, 1} with
> NPATTERNS = 2 ?

Yeah, that's right.  The current handling for NPATTERNS==2 looked
good to me.  It was the other two cases that I was worried about.

Thanks,
Richard

Thanks for all the reviews. I’ve posted a new version of the patch here - 
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621310.html

Thanks,
Tejas.



[PATCH v2] [PR96339] Optimise svlast[ab]

2023-06-12 Thread Tejas Belagod via Gcc-patches
From: Tejas Belagod 

  This PR optimizes an SVE intrinsics sequence where
svlasta (svptrue_pat_b8 (SV_VL1), x)
  a scalar is selected based on a constant predicate and a variable vector.
  This sequence is optimized to return the correspoding element of a NEON
  vector. For eg.
svlasta (svptrue_pat_b8 (SV_VL1), x)
  returns
umovw0, v0.b[1]
  Likewise,
svlastb (svptrue_pat_b8 (SV_VL1), x)
  returns
 umovw0, v0.b[0]
  This optimization only works provided the constant predicate maps to a range
  that is within the bounds of a 128-bit NEON register.

gcc/ChangeLog:

PR target/96339
* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): Fold 
sve
calls that have a constant input predicate vector.
(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
(svlast_impl::vect_all_same): Check if all vector elements are equal.

gcc/testsuite/ChangeLog:

PR target/96339
* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
to expect optimized code for function body.
* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 133 
 .../aarch64/sve/acle/general-c/svlast.c   |  63 
 .../sve/acle/general-c/svlast128_run.c| 313 +
 .../sve/acle/general-c/svlast256_run.c| 314 ++
 .../gcc.target/aarch64/sve/pcs/return_4.c |   2 -
 .../aarch64/sve/pcs/return_4_1024.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_128.c |   2 -
 .../aarch64/sve/pcs/return_4_2048.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_256.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_512.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5.c |   2 -
 .../aarch64/sve/pcs/return_5_1024.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_128.c |   2 -
 .../aarch64/sve/pcs/return_5_2048.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_256.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_512.c |   2 -
 16 files changed, 823 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index cd9cace3c9b..9b766ffa817 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1056,6 +1056,139 @@ class svlast_impl : public quiet
 public:
   CONSTEXPR svlast_impl (int unspec) : m_unspec (unspec) {}
 
+  bool is_lasta () const { return m_unspec == UNSPEC_LASTA; }
+  bool is_lastb () const { return m_unspec == UNSPEC_LASTB; }
+
+  bool vect_all_same (tree v, int step) const
+  {
+int i;
+int nelts = vector_cst_encoded_nelts (v);
+tree first_el = VECTOR_CST_ENCODED_ELT (v, 0);
+
+for (i = 0; i < nelts; i += step)
+  if (!operand_equal_p (VECTOR_CST_ENCODED_ELT (v, i), first_el, 0))
+   return false;
+
+return true;
+  }
+
+  /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF.
+ BIT_FIELD_REF lowers to Advanced SIMD element extract, so we have to
+ ensure the index of the element being accessed is in the range of a
+ Advanced SIMD vector width.  */
+  gimple *fold (gimple_folder & f) const override
+  {
+tree pred = gimple_call_arg (f.call, 0);
+tree val = gimple_call_arg (f.call, 1);
+
+if (TREE_CODE (pred) == VECTOR_CST)
+  {
+   HOST_WIDE_INT pos;
+   int i = 0;
+   int step = f.type_suffix (0).element_bytes;
+   int step_1 = gcd (step, VECTOR_CST_NPATTERNS (pred));
+   int npats = VECTOR_CST_NPATTERNS (pred);
+   unsigned HOST_W

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-19 Thread Tejas Belagod via Gcc-patches



From: Richard Sandiford 
Date: Tuesday, May 16, 2023 at 5:36 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]
Tejas Belagod  writes:
>>> +   {
>>> + b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
>>> + bitsize_int (step * BITS_PER_UNIT),
>>> + bitsize_int ((16 - step) * BITS_PER_UNIT));
>>> +
>>> + return gimple_build_assign (f.lhs, b);
>>> +   }
>>> +
>>> + /* If VECTOR_CST_NELTS_PER_PATTERN (pred) == 2 and every multiple of
>>> +'step_1' in
>>> +[VECTOR_CST_NPATTERNS .. VECTOR_CST_ENCODED_NELTS - 1]
>>> +is zero, then we can treat the vector as VECTOR_CST_NPATTERNS
>>> +elements followed by all inactive elements.  */
>>> + if (!const_vl && VECTOR_CST_NELTS_PER_PATTERN (pred) == 2)
>>
>> Following on from the above, maybe use:
>>
>>   !VECTOR_CST_NELTS (pred).is_constant ()
>>
>> instead of !const_vl here.
>>
>> I have a horrible suspicion that I'm contradicting our earlier discussion
>> here, sorry, but: I think we have to return null if NELTS_PER_PATTERN != 2.
>>
>>
>>
>> IIUC, the NPATTERNS .. ENCODED_ELTS represent the repeated part of the
> encoded
>> constant. This means the repetition occurs if NELTS_PER_PATTERN == 2, IOW the
>> base1 repeats in the encoding. This loop is checking this condition and looks
>> for a 1 in the repeated part of the NELTS_PER_PATTERN == 2 in a VL vector.
>> Please correct me if I’m misunderstanding here.
>
> NELTS_PER_PATTERN == 1 is also a repeating pattern: it means that the
> entire sequence is repeated to fill a vector.  So if an NELTS_PER_PATTERN
> == 1 constant has elements {0, 1, 0, 0}, the vector is:
>
>{0, 1, 0, 0, 0, 1, 0, 0, ...}
>
>
> Wouldn’t the vect_all_same(pred, step) cover this case for a given value of
> step?
>
>
> and the optimisation can't handle that.  NELTS_PER_PATTERN == 3 isn't
> likely to occur for predicates, but in principle it has the same problem.
>
>
>
> OK, I had misunderstood the encoding to always make base1 the repeating value
> by adjusting the NPATTERNS accordingly – I didn’t know you could also have the
> base2 value and beyond encoding the repeat value. In this case could I just
> remove NELTS_PER_PATTERN == 2 condition and the enclosed loop would check for 
> a
> repeating ‘1’ in the repeated part of the encoded pattern?

But for NELTS_PER_PATTERN==1, the whole encoded sequence repeats.
So you would have to start the check at element 0 rather than
NPATTERNS.  And then (for NELTS_PER_PATTERN==1) the loop would reject
any constant that has a nonzero element.  But all valid zero-vector
cases have been handled by this point, so the effect wouldn't be useful.

It should never be the case that all elements from NPATTERNS
onwards are zero for NELTS_PER_PATTERN==3; that case should be
canonicalised to NELTS_PER_PATTERN==2 instead.

So in practice it's simpler and more obviously correct to punt
when NELTS_PER_PATTERN != 2.

Thanks for the clarification.
I understand all points about punting when NELTS_PER_PATTERN !=2, but one.

Am I correct to understand that we still need to check for the case when 
there's a repeating non-zero elements in the case of NELTS_PER_PATTERN == 2? 
eg. { 0, 0, 1, 1, 1, 1,} which should be encoded as {0, 0, 1, 1} with 
NPATTERNS = 2 ?

Thanks,
Tejas.


Thanks,
Richard


Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-16 Thread Tejas Belagod via Gcc-patches



From: Richard Sandiford 
Date: Tuesday, May 16, 2023 at 2:15 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]
Tejas Belagod  writes:
>> +  {
>> +int i;
>> +int nelts = vector_cst_encoded_nelts (v);
>> +int first_el = 0;
>> +
>> +for (i = first_el; i < nelts; i += step)
>> +  if (VECTOR_CST_ENCODED_ELT (v, i) != VECTOR_CST_ENCODED_ELT (v,
> first_el))
>
> I think this should use !operand_equal_p (..., ..., 0).
>
>
> Oops! I wonder why I thought VECTOR_CST_ENCODED_ELT returned a constant! 
> Thanks
> for spotting that.

It does only return a constant.  But there can be multiple trees with
the same constant value, through things like TREE_OVERFLOW (not sure
where things stand on expunging that from gimple) and the fact that
gimple does not maintain a distinction between different types that
have the same mode and signedness.  (E.g. on ILP32 hosts, gimple does
not maintain a distinction between int and long, even though int 0 and
long 0 are different trees.)

> Also, should the flags here be OEP_ONLY_CONST ?

Nah, just 0 should be fine.

>> + return false;
>> +
>> +return true;
>> +  }
>> +
>> +  /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF.
>> + BIT_FIELD_REF lowers to a NEON element extract, so we have to make sure
>> + the index of the element being accessed is in the range of a NEON
> vector
>> + width.  */
>
> s/NEON/Advanced SIMD/.  Same in later comments
>
>> +  gimple *fold (gimple_folder & f) const override
>> +  {
>> +tree pred = gimple_call_arg (f.call, 0);
>> +tree val = gimple_call_arg (f.call, 1);
>> +
>> +if (TREE_CODE (pred) == VECTOR_CST)
>> +  {
>> + HOST_WIDE_INT pos;
>> + unsigned int const_vg;
>> + int i = 0;
>> + int step = f.type_suffix (0).element_bytes;
>> + int step_1 = gcd (step, VECTOR_CST_NPATTERNS (pred));
>> + int npats = VECTOR_CST_NPATTERNS (pred);
>> + unsigned HOST_WIDE_INT nelts = vector_cst_encoded_nelts (pred);
>> + tree b = NULL_TREE;
>> + bool const_vl = aarch64_sve_vg.is_constant (_vg);
>
> I think this might be left over from previous versions, but:
> const_vg isn't used and const_vl is only used once, so I think it
> would be better to remove them.
>
>> +
>> + /* We can optimize 2 cases common to variable and fixed-length cases
>> +without a linear search of the predicate vector:
>> +1.  LASTA if predicate is all true, return element 0.
>> +2.  LASTA if predicate all false, return element 0.  */
>> + if (is_lasta () && vect_all_same (pred, step_1))
>> +   {
>> + b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
>> + bitsize_int (step * BITS_PER_UNIT), bitsize_int (0));
>> + return gimple_build_assign (f.lhs, b);
>> +   }
>> +
>> + /* Handle the all-false case for LASTB where SVE VL == 128b -
>> +return the highest numbered element.  */
>> + if (is_lastb () && known_eq (BYTES_PER_SVE_VECTOR, 16)
>> + && vect_all_same (pred, step_1)
>> + && integer_zerop (VECTOR_CST_ENCODED_ELT (pred, 0)))
>
> Formatting nit: one condition per line once one line isn't enough.
>
>> +   {
>> + b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
>> + bitsize_int (step * BITS_PER_UNIT),
>> + bitsize_int ((16 - step) * BITS_PER_UNIT));
>> +
>> + return gimple_build_assign (f.lhs, b);
>> +   }
>> +
>> + /* If VECTOR_CST_NELTS_PER_PATTERN (pred) == 2 and every multiple of
>> +'step_1' in
>> +[VECTOR_CST_NPATTERNS .. VECTOR_CST_ENCODED_NELTS - 1]
>> +is zero, then we can treat the vector as VECTOR_CST_NPATTERNS
>> +elements followed by all inactive elements.  */
>> + if (!const_vl && VECTOR_CST_NELTS_PER_PATTERN (pred) == 2)
>
> Following on from the above, maybe use:
>
>   !VECTOR_CST_NELTS (pred).is_constant ()
>
> instead of !const_vl here.
>
> I have a horrible suspicion that I'm contradicting our earlier discussion
> here, sorry, but: I think we have to return null if NELTS_PER_PATTERN != 2.
>
>
>
> IIUC, the NPATTERNS .. ENCODED_ELTS represent the repeated part of the encoded
> constant. This means the repetition occurs if NELTS_PER_PATTERN == 2, IOW the
> base1 repeats in the encoding. This loop is checking this condition and looks
> for a 1 in the repeated part of th

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-16 Thread Tejas Belagod via Gcc-patches
Thanks for your comments, Richard.

From: Richard Sandiford 
Date: Friday, May 12, 2023 at 1:02 AM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org , Tejas Belagod 

Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]
Tejas Belagod  writes:
> From: Tejas Belagod 
>
>   This PR optimizes an SVE intrinsics sequence where
> svlasta (svptrue_pat_b8 (SV_VL1), x)
>   a scalar is selected based on a constant predicate and a variable vector.
>   This sequence is optimized to return the correspoding element of a NEON
>   vector. For eg.
> svlasta (svptrue_pat_b8 (SV_VL1), x)
>   returns
> umovw0, v0.b[1]
>   Likewise,
> svlastb (svptrue_pat_b8 (SV_VL1), x)
>   returns
>  umovw0, v0.b[0]
>   This optimization only works provided the constant predicate maps to a range
>   that is within the bounds of a 128-bit NEON register.
>
> gcc/ChangeLog:
>
>PR target/96339
>* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): 
> Fold sve
>calls that have a constant input predicate vector.
>(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
>(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
>(svlast_impl::vect_all_same): Check if all vector elements are equal.
>
> gcc/testsuite/ChangeLog:
>
>PR target/96339
>* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
>* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
>* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
>* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
>to expect optimized code for function body.
>* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
>* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  | 124 +++
>  .../aarch64/sve/acle/general-c/svlast.c   |  63 
>  .../sve/acle/general-c/svlast128_run.c| 313 +
>  .../sve/acle/general-c/svlast256_run.c| 314 ++
>  .../gcc.target/aarch64/sve/pcs/return_4.c |   2 -
>  .../aarch64/sve/pcs/return_4_1024.c   |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_4_128.c |   2 -
>  .../aarch64/sve/pcs/return_4_2048.c   |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_4_256.c |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_4_512.c |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5.c |   2 -
>  .../aarch64/sve/pcs/return_5_1024.c   |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5_128.c |   2 -
>  .../aarch64/sve/pcs/return_5_2048.c   |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5_256.c |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5_512.c |   2 -
>  16 files changed, 814 insertions(+), 24 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index cd9cace3c9b..db2b4dcaac9 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -1056,6 +1056,130 @@ class svlast_impl : public quiet
>  public:
>CONSTEXPR svlast_impl (int unspec) : m_unspec (unspec) {}
>
> +  bool is_lasta () const { return m_unspec == UNSPEC_LASTA; }
> +  bool is_lastb () const { return m_unspec == UNSPEC_LASTB; }
> +
> +  bool vect_all_same (tree v , int step) const

Nit: stray space after "v".

> +  {
> +int i;
> +int nelts = vector_cst_encoded_nelts (v);
> +int first_el = 0;
> +
> +for (i = first_el; i < nelts; i += step)
> +  if (VECTOR_CST_ENCODED_ELT (v, i) != VECTOR_CST_ENCODED_ELT (v, 
> first_el))

I think this should use !opera

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-03 Thread Tejas Belagod via Gcc-patches
[Ping]

From: Tejas Belagod 
Date: Thursday, March 16, 2023 at 5:09 PM
To: gcc-patches@gcc.gnu.org 
Cc: Tejas Belagod , Richard Sandiford 

Subject: [PATCH] [PR96339] AArch64: Optimise svlast[ab]
From: Tejas Belagod 

  This PR optimizes an SVE intrinsics sequence where
svlasta (svptrue_pat_b8 (SV_VL1), x)
  a scalar is selected based on a constant predicate and a variable vector.
  This sequence is optimized to return the correspoding element of a NEON
  vector. For eg.
svlasta (svptrue_pat_b8 (SV_VL1), x)
  returns
umovw0, v0.b[1]
  Likewise,
svlastb (svptrue_pat_b8 (SV_VL1), x)
  returns
 umovw0, v0.b[0]
  This optimization only works provided the constant predicate maps to a range
  that is within the bounds of a 128-bit NEON register.

gcc/ChangeLog:

PR target/96339
* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): Fold 
sve
calls that have a constant input predicate vector.
(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
(svlast_impl::vect_all_same): Check if all vector elements are equal.

gcc/testsuite/ChangeLog:

PR target/96339
* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
to expect optimized code for function body.
* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 124 +++
 .../aarch64/sve/acle/general-c/svlast.c   |  63 
 .../sve/acle/general-c/svlast128_run.c| 313 +
 .../sve/acle/general-c/svlast256_run.c| 314 ++
 .../gcc.target/aarch64/sve/pcs/return_4.c |   2 -
 .../aarch64/sve/pcs/return_4_1024.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_128.c |   2 -
 .../aarch64/sve/pcs/return_4_2048.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_256.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_512.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5.c |   2 -
 .../aarch64/sve/pcs/return_5_1024.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_128.c |   2 -
 .../aarch64/sve/pcs/return_5_2048.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_256.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_512.c |   2 -
 16 files changed, 814 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index cd9cace3c9b..db2b4dcaac9 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1056,6 +1056,130 @@ class svlast_impl : public quiet
 public:
   CONSTEXPR svlast_impl (int unspec) : m_unspec (unspec) {}

+  bool is_lasta () const { return m_unspec == UNSPEC_LASTA; }
+  bool is_lastb () const { return m_unspec == UNSPEC_LASTB; }
+
+  bool vect_all_same (tree v , int step) const
+  {
+int i;
+int nelts = vector_cst_encoded_nelts (v);
+int first_el = 0;
+
+for (i = first_el; i < nelts; i += step)
+  if (VECTOR_CST_ENCODED_ELT (v, i) != VECTOR_CST_ENCODED_ELT (v, 
first_el))
+   return false;
+
+return true;
+  }
+
+  /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF.
+ BIT_FIELD_REF lowers to a NEON element extract, so we have to make sure
+ the index of the element being accessed is in the range of a NEON vector
+ width.  */
+  gimple *fold (gimple_folder & f) const override
+  {
+tree pred = gimple_call_arg (f.call, 0);
+tree val = gimple_call_arg (f.call, 1);
+
+if (TREE_CODE (pred) == VECTOR_CST)
+  {
+   HOST_WIDE_INT pos;
+   unsigned int const_vg;
+ 

[PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-03-16 Thread Tejas Belagod via Gcc-patches
From: Tejas Belagod 

  This PR optimizes an SVE intrinsics sequence where
svlasta (svptrue_pat_b8 (SV_VL1), x)
  a scalar is selected based on a constant predicate and a variable vector.
  This sequence is optimized to return the correspoding element of a NEON
  vector. For eg.
svlasta (svptrue_pat_b8 (SV_VL1), x)
  returns
umovw0, v0.b[1]
  Likewise,
svlastb (svptrue_pat_b8 (SV_VL1), x)
  returns
 umovw0, v0.b[0]
  This optimization only works provided the constant predicate maps to a range
  that is within the bounds of a 128-bit NEON register.

gcc/ChangeLog:

PR target/96339
* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): Fold 
sve
calls that have a constant input predicate vector.
(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
(svlast_impl::vect_all_same): Check if all vector elements are equal.

gcc/testsuite/ChangeLog:

PR target/96339
* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
to expect optimized code for function body.
* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 124 +++
 .../aarch64/sve/acle/general-c/svlast.c   |  63 
 .../sve/acle/general-c/svlast128_run.c| 313 +
 .../sve/acle/general-c/svlast256_run.c| 314 ++
 .../gcc.target/aarch64/sve/pcs/return_4.c |   2 -
 .../aarch64/sve/pcs/return_4_1024.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_128.c |   2 -
 .../aarch64/sve/pcs/return_4_2048.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_256.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_512.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5.c |   2 -
 .../aarch64/sve/pcs/return_5_1024.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_128.c |   2 -
 .../aarch64/sve/pcs/return_5_2048.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_256.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_512.c |   2 -
 16 files changed, 814 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index cd9cace3c9b..db2b4dcaac9 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1056,6 +1056,130 @@ class svlast_impl : public quiet
 public:
   CONSTEXPR svlast_impl (int unspec) : m_unspec (unspec) {}
 
+  bool is_lasta () const { return m_unspec == UNSPEC_LASTA; }
+  bool is_lastb () const { return m_unspec == UNSPEC_LASTB; }
+
+  bool vect_all_same (tree v , int step) const
+  {
+int i;
+int nelts = vector_cst_encoded_nelts (v);
+int first_el = 0;
+
+for (i = first_el; i < nelts; i += step)
+  if (VECTOR_CST_ENCODED_ELT (v, i) != VECTOR_CST_ENCODED_ELT (v, 
first_el))
+   return false;
+
+return true;
+  }
+
+  /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF.
+ BIT_FIELD_REF lowers to a NEON element extract, so we have to make sure
+ the index of the element being accessed is in the range of a NEON vector
+ width.  */
+  gimple *fold (gimple_folder & f) const override
+  {
+tree pred = gimple_call_arg (f.call, 0);
+tree val = gimple_call_arg (f.call, 1);
+
+if (TREE_CODE (pred) == VECTOR_CST)
+  {
+   HOST_WIDE_INT pos;
+   unsigned int const_vg;
+   int i = 0;
+   int step = f.type_suffix (0).element_bytes;
+   int step_1 = gcd (step, VECTOR_CST_NPATTERNS (pred));
+   int npats = VECTOR_CST_NPATTERNS (pred);
+   u

[PATCH 2/2, GCC12] AArch64: Gate various crypto intrinsics availability based on features

2023-02-15 Thread Tejas Belagod via Gcc-patches
The 64-bit variant of PMULL{2} and AES instructions are available if FEAT_AES
is implemented according to the Arm ARM [1].  Similarly FEAT_SHA1 and
FEAT_SHA256 enable the use of SHA1 and SHA256 instruction variants.
This patch fixes arm_neon.h to correctly reflect the feature availability based
on '+aes' and '+sha2' as opposed to the ambiguous catch-all '+crypto'.

[1] Section D17.2.61, C7.2.215

2022-01-11  Tejas Belagod  

gcc/ChangeLog:

* config/aarch64/arm_neon.h (vmull_p64, vmull_high_p64, vaeseq_u8,
vaesdq_u8, vaesmcq_u8, vaesimcq_u8): Gate under "nothing+aes".
(vsha1*_u32, vsha256*_u32): Gate under "nothing+sha2".

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/pmull64.c: New.
* gcc.target/aarch64/aes-fuse-1.c: Replace '+crypto' with corresponding
feature flag based on the intrinsic.
* gcc.target/aarch64/aes-fuse-2.c: Likewise.
* gcc.target/aarch64/aes_1.c: Likewise.
* gcc.target/aarch64/aes_2.c: Likewise.
* gcc.target/aarch64/aes_xor_combine.c: Likewise.
* gcc.target/aarch64/sha1_1.c: Likewise.
* gcc.target/aarch64/sha256_1.c: Likewise.
* gcc.target/aarch64/target_attr_crypto_ice_1.c: Likewise.
---
 gcc/config/aarch64/arm_neon.h | 35 ++-
 .../gcc.target/aarch64/acle/pmull64.c | 14 
 gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c |  4 +--
 gcc/testsuite/gcc.target/aarch64/aes-fuse-2.c |  4 +--
 gcc/testsuite/gcc.target/aarch64/aes_1.c  |  2 +-
 gcc/testsuite/gcc.target/aarch64/aes_2.c  |  4 ++-
 .../gcc.target/aarch64/aes_xor_combine.c  |  2 +-
 gcc/testsuite/gcc.target/aarch64/sha1_1.c |  2 +-
 gcc/testsuite/gcc.target/aarch64/sha256_1.c   |  2 +-
 .../aarch64/target_attr_crypto_ice_1.c|  2 +-
 10 files changed, 44 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/pmull64.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 85d03c58d2a..695aafd9a5e 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -10243,7 +10243,7 @@ vqrdmlshs_laneq_s32 (int32_t __a, int32_t __b, 
int32x4_t __c, const int __d)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("+nothing+crypto")
+#pragma GCC target ("+nothing+aes")
 /* vaes  */
 
 __extension__ extern __inline uint8x16_t
@@ -10273,6 +10273,22 @@ vaesimcq_u8 (uint8x16_t data)
 {
   return __builtin_aarch64_crypto_aesimcv16qi_uu (data);
 }
+
+__extension__ extern __inline poly128_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmull_p64 (poly64_t __a, poly64_t __b)
+{
+  return
+__builtin_aarch64_crypto_pmulldi_ppp (__a, __b);
+}
+
+__extension__ extern __inline poly128_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
+{
+  return __builtin_aarch64_crypto_pmullv2di_ppp (__a, __b);
+}
+
 #pragma GCC pop_options
 
 /* vcage  */
@@ -23519,7 +23535,7 @@ vrsrad_n_u64 (uint64_t __a, uint64_t __b, const int __c)
 }
 
 #pragma GCC push_options
-#pragma GCC target ("+nothing+crypto")
+#pragma GCC target ("+nothing+sha2")
 
 /* vsha1  */
 
@@ -23596,21 +23612,6 @@ vsha256su1q_u32 (uint32x4_t __tw0_3, uint32x4_t 
__w8_11, uint32x4_t __w12_15)
   __w12_15);
 }
 
-__extension__ extern __inline poly128_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vmull_p64 (poly64_t __a, poly64_t __b)
-{
-  return
-__builtin_aarch64_crypto_pmulldi_ppp (__a, __b);
-}
-
-__extension__ extern __inline poly128_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
-{
-  return __builtin_aarch64_crypto_pmullv2di_ppp (__a, __b);
-}
-
 #pragma GCC pop_options
 
 /* vshl */
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c 
b/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
new file mode 100644
index 000..6a1e99e2d0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.2-a" } */
+
+#pragma push_options
+#pragma GCC target ("+aes")
+
+#include "arm_neon.h"
+
+int foo (poly64_t a, poly64_t b)
+{
+  return vgetq_lane_s32 (vreinterpretq_s32_p128 (vmull_p64 (a, b)), 0);
+}
+
+/* { dg-final { scan-assembler "\tpmull\tv" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c 
b/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
index d7b4f89919d..1b4e10f78db 100644
--- a/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mcpu=cortex-a72+crypto -dp" } */
-/* { dg-additional-options "-march=armv8-a+crypto" { target { aa

[PATCH 1/2, GCC12] AArch64: Update transitive closures of aes, sha2 and sha3 extensions

2023-02-15 Thread Tejas Belagod via Gcc-patches
Transitive closures of architectural extensions have to be manually maintained
for AARCH64_OPT_EXTENSION list.  Currently aes, sha2 and sha3 extensions add
AARCH64_FL_SIMD has their dependency - this does not automatically pull in the
transitive dependence of AARCH64_FL_FP from AARCH64_FL_SIMD's definition.  As
described the transitive closure/dependence has to be maintained manually.
This patch adds AARCH64_FL_FP to each of these crypto extensions' dependence
set.  Automatic transitive closure maintenance is fixed on trunk in commit
11a113d501ff64fa4843e28d0a21b3f4e9d0d3de.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (aes, sha2, sha3):
Update AARCH64_OPT_EXTENSION definition of architectural dependence for
defintion of aes, sha2 and sha3 with AARCH64_FL_FP.
---
 gcc/config/aarch64/aarch64-option-extensions.def | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index b4d0ac8b600..88cefc20022 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -118,19 +118,19 @@ AARCH64_OPT_EXTENSION("dotprod", AARCH64_FL_DOTPROD, 
AARCH64_FL_SIMD, 0, \
 
 /* Enabling "aes" also enables "simd".
Disabling "aes" disables "aes" and "sve2-aes'.  */
-AARCH64_OPT_EXTENSION("aes", AARCH64_FL_AES, AARCH64_FL_SIMD, \
- AARCH64_FL_SVE2_AES, false, "aes")
+AARCH64_OPT_EXTENSION("aes", AARCH64_FL_AES, AARCH64_FL_SIMD | \
+ AARCH64_FL_FP, AARCH64_FL_SVE2_AES, false, "aes")
 
 /* Enabling "sha2" also enables "simd".
Disabling "sha2" just disables "sha2".  */
-AARCH64_OPT_EXTENSION("sha2", AARCH64_FL_SHA2, AARCH64_FL_SIMD, 0, false, \
- "sha1 sha2")
+AARCH64_OPT_EXTENSION("sha2", AARCH64_FL_SHA2, AARCH64_FL_SIMD | \
+ AARCH64_FL_FP, 0, false, "sha1 sha2")
 
 /* Enabling "sha3" enables "simd" and "sha2".
Disabling "sha3" disables "sha3" and "sve2-sha3".  */
 AARCH64_OPT_EXTENSION("sha3", AARCH64_FL_SHA3, AARCH64_FL_SIMD | \
- AARCH64_FL_SHA2, AARCH64_FL_SVE2_SHA3, false, \
- "sha3 sha512")
+ AARCH64_FL_SHA2 | AARCH64_FL_FP, AARCH64_FL_SVE2_SHA3, \
+ false, "sha3 sha512")
 
 /* Enabling "sm4" also enables "simd".
Disabling "sm4" disables "sm4" and "sve2-sm4".  */
-- 
2.17.1



Re: [PATCH] AArch64: Gate various crypto intrinsics availability based on features

2023-01-20 Thread Tejas Belagod via Gcc-patches



From: Kyrylo Tkachov 
Date: Tuesday, January 17, 2023 at 3:53 PM
To: Tejas Belagod , gcc-patches@gcc.gnu.org 

Cc: Richard Sandiford , Richard Earnshaw 

Subject: RE: [PATCH] AArch64: Gate various crypto intrinsics availability based 
on features
Hi Tejas,

> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Tejas Belagod
> via Gcc-patches
> Sent: Monday, January 16, 2023 7:12 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Tejas Belagod ; Richard Sandiford
> ; Richard Earnshaw
> 
> Subject: [PATCH] AArch64: Gate various crypto intrinsics availability based on
> features
>
> The 64-bit variant of PMULL{2} and AES instructions are available if FEAT_AES
> is implemented according to the Arm ARM [1].  Similarly FEAT_SHA1 and
> FEAT_SHA256 enable the use of SHA1 and SHA256 instruction variants.
> This patch fixes arm_neon.h to correctly reflect the feature availability 
> based
> on '+aes' and '+sha2' as opposed to the ambiguous catch-all '+crypto'.
>
> [1] Section D17.2.61, C7.2.215
>
> 2022-01-11  Tejas Belagod  
>
> gcc/
>* config/aarch64/arm_neon.h: Gate AES and PMULL64 intrinsics
>under target feature +aes as opposed to +crypto. Gate SHA1 and
> SHA2
>intrinsics under +sha2.

The ChangeLog should list the intrinsics affected like
* config/aarch64/arm_neon.h (vmull_p64, vmull_high_p64): Gate under 
"nothing+aes"
For example.
Ok with a fixed ChangeLog.
Thanks,
Kyrill


Thanks for the review Kyrill, now pushed to master. OK to backport to 12?
Thanks,
Tejas.

>
> testsuite/
>
>* gcc.target/aarch64/acle/pmull64.c: New.
>* gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c: Replace '+crypto'
> with
>corresponding feature flag based on the intrinsic.
>* gcc.target/aarch64/aes-fuse-2.c: Likewise.
>* gcc.target/aarch64/aes_1.c: Likewise.
>* gcc.target/aarch64/aes_2.c: Likewise.
>* gcc.target/aarch64/aes_xor_combine.c: Likewise.
>* gcc.target/aarch64/sha1_1.c: Likewise.
>* gcc.target/aarch64/sha256_1.c: Likewise.
>* gcc.target/aarch64/target_attr_crypto_ice_1.c: Likewise.
> ---
>  gcc/config/aarch64/arm_neon.h | 35 ++-
>  .../gcc.target/aarch64/acle/pmull64.c | 14 
>  gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c |  4 +--
>  gcc/testsuite/gcc.target/aarch64/aes-fuse-2.c |  4 +--
>  gcc/testsuite/gcc.target/aarch64/aes_1.c  |  2 +-
>  gcc/testsuite/gcc.target/aarch64/aes_2.c  |  4 ++-
>  .../gcc.target/aarch64/aes_xor_combine.c  |  2 +-
>  gcc/testsuite/gcc.target/aarch64/sha1_1.c |  2 +-
>  gcc/testsuite/gcc.target/aarch64/sha256_1.c   |  2 +-
>  .../aarch64/target_attr_crypto_ice_1.c|  2 +-
>  10 files changed, 44 insertions(+), 27 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
>
> diff --git a/gcc/config/aarch64/arm_neon.h
> b/gcc/config/aarch64/arm_neon.h
> index cf6af728ca9..a795a387b38 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -7496,7 +7496,7 @@ vqrdmlshs_laneq_s32 (int32_t __a, int32_t __b,
> int32x4_t __c, const int __d)
>  #pragma GCC pop_options
>
>  #pragma GCC push_options
> -#pragma GCC target ("+nothing+crypto")
> +#pragma GCC target ("+nothing+aes")
>  /* vaes  */
>
>  __extension__ extern __inline uint8x16_t
> @@ -7526,6 +7526,22 @@ vaesimcq_u8 (uint8x16_t data)
>  {
>return __builtin_aarch64_crypto_aesimcv16qi_uu (data);
>  }
> +
> +__extension__ extern __inline poly128_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vmull_p64 (poly64_t __a, poly64_t __b)
> +{
> +  return
> +__builtin_aarch64_crypto_pmulldi_ppp (__a, __b);
> +}
> +
> +__extension__ extern __inline poly128_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
> +{
> +  return __builtin_aarch64_crypto_pmullv2di_ppp (__a, __b);
> +}
> +
>  #pragma GCC pop_options
>
>  /* vcage  */
> @@ -20772,7 +20788,7 @@ vrsrad_n_u64 (uint64_t __a, uint64_t __b, const
> int __c)
>  }
>
>  #pragma GCC push_options
> -#pragma GCC target ("+nothing+crypto")
> +#pragma GCC target ("+nothing+sha2")
>
>  /* vsha1  */
>
> @@ -20849,21 +20865,6 @@ vsha256su1q_u32 (uint32x4_t __tw0_3,
> uint32x4_t __w8_11, uint32x4_t __w12_15)
>   __w12_15);
>  }
>
> -__extension__ extern __inline poly128_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -vmull_p64 (poly64_t __a, poly64_t __b)
> -{
> -  retu

[PATCH] AArch64: Gate various crypto intrinsics availability based on features

2023-01-15 Thread Tejas Belagod via Gcc-patches
The 64-bit variant of PMULL{2} and AES instructions are available if FEAT_AES
is implemented according to the Arm ARM [1].  Similarly FEAT_SHA1 and
FEAT_SHA256 enable the use of SHA1 and SHA256 instruction variants.
This patch fixes arm_neon.h to correctly reflect the feature availability based
on '+aes' and '+sha2' as opposed to the ambiguous catch-all '+crypto'.

[1] Section D17.2.61, C7.2.215

2022-01-11  Tejas Belagod  

gcc/
* config/aarch64/arm_neon.h: Gate AES and PMULL64 intrinsics
under target feature +aes as opposed to +crypto. Gate SHA1 and SHA2
intrinsics under +sha2.

testsuite/

* gcc.target/aarch64/acle/pmull64.c: New.
* gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c: Replace '+crypto' with
corresponding feature flag based on the intrinsic.
* gcc.target/aarch64/aes-fuse-2.c: Likewise.
* gcc.target/aarch64/aes_1.c: Likewise.
* gcc.target/aarch64/aes_2.c: Likewise.
* gcc.target/aarch64/aes_xor_combine.c: Likewise.
* gcc.target/aarch64/sha1_1.c: Likewise.
* gcc.target/aarch64/sha256_1.c: Likewise.
* gcc.target/aarch64/target_attr_crypto_ice_1.c: Likewise.
---
 gcc/config/aarch64/arm_neon.h | 35 ++-
 .../gcc.target/aarch64/acle/pmull64.c | 14 
 gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c |  4 +--
 gcc/testsuite/gcc.target/aarch64/aes-fuse-2.c |  4 +--
 gcc/testsuite/gcc.target/aarch64/aes_1.c  |  2 +-
 gcc/testsuite/gcc.target/aarch64/aes_2.c  |  4 ++-
 .../gcc.target/aarch64/aes_xor_combine.c  |  2 +-
 gcc/testsuite/gcc.target/aarch64/sha1_1.c |  2 +-
 gcc/testsuite/gcc.target/aarch64/sha256_1.c   |  2 +-
 .../aarch64/target_attr_crypto_ice_1.c|  2 +-
 10 files changed, 44 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/pmull64.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index cf6af728ca9..a795a387b38 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7496,7 +7496,7 @@ vqrdmlshs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t 
__c, const int __d)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("+nothing+crypto")
+#pragma GCC target ("+nothing+aes")
 /* vaes  */
 
 __extension__ extern __inline uint8x16_t
@@ -7526,6 +7526,22 @@ vaesimcq_u8 (uint8x16_t data)
 {
   return __builtin_aarch64_crypto_aesimcv16qi_uu (data);
 }
+
+__extension__ extern __inline poly128_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmull_p64 (poly64_t __a, poly64_t __b)
+{
+  return
+__builtin_aarch64_crypto_pmulldi_ppp (__a, __b);
+}
+
+__extension__ extern __inline poly128_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
+{
+  return __builtin_aarch64_crypto_pmullv2di_ppp (__a, __b);
+}
+
 #pragma GCC pop_options
 
 /* vcage  */
@@ -20772,7 +20788,7 @@ vrsrad_n_u64 (uint64_t __a, uint64_t __b, const int __c)
 }
 
 #pragma GCC push_options
-#pragma GCC target ("+nothing+crypto")
+#pragma GCC target ("+nothing+sha2")
 
 /* vsha1  */
 
@@ -20849,21 +20865,6 @@ vsha256su1q_u32 (uint32x4_t __tw0_3, uint32x4_t 
__w8_11, uint32x4_t __w12_15)
   __w12_15);
 }
 
-__extension__ extern __inline poly128_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vmull_p64 (poly64_t __a, poly64_t __b)
-{
-  return
-__builtin_aarch64_crypto_pmulldi_ppp (__a, __b);
-}
-
-__extension__ extern __inline poly128_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
-{
-  return __builtin_aarch64_crypto_pmullv2di_ppp (__a, __b);
-}
-
 #pragma GCC pop_options
 
 /* vshl */
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c 
b/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
new file mode 100644
index 000..6a1e99e2d0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.2-a" } */
+
+#pragma push_options
+#pragma GCC target ("+aes")
+
+#include "arm_neon.h"
+
+int foo (poly64_t a, poly64_t b)
+{
+  return vgetq_lane_s32 (vreinterpretq_s32_p128 (vmull_p64 (a, b)), 0);
+}
+
+/* { dg-final { scan-assembler "\tpmull\tv" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c 
b/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
index d7b4f89919d..1b4e10f78db 100644
--- a/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mcpu=cortex-a72+crypto -dp" } */
-/* { dg-additional-options "-march=armv8-a+crypto" { target { aarch64*-*-* } } 
}*/
+/* { dg-options "-O3 -mcpu=cortex-a72+a

RE: [Patch 1/8, Arm, AArch64, GCC] Refactor mbranch-protection option parsing and make it common to AArch32 and AArch64 backends. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]

2021-11-15 Thread Tejas Belagod via Gcc-patches
Ping for this series.

Thanks,
Tejas.

> -Original Message-
> From: Gcc-patches  bounces+belagod=gcc.gnu@gcc.gnu.org> On Behalf Of Tejas Belagod via
> Gcc-patches
> Sent: Thursday, October 28, 2021 12:41 PM
> To: Richard Earnshaw ; gcc-
> patc...@gcc.gnu.org
> Subject: [Patch 1/8, Arm, AArch64, GCC] Refactor mbranch-protection option
> parsing and make it common to AArch32 and AArch64 backends. [Was RE:
> [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]
> 
> 
> 
> > -Original Message-
> > From: Richard Earnshaw 
> > Sent: Monday, October 11, 2021 1:58 PM
> > To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> > Subject: Re: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.
> >
> > On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:
> > > Hi,
> > >
> > > Add -mbranch-protection option and its associated parsing routines.
> > > This option enables the code-generation of pointer signing and
> > > authentication instructions in function prologues and epilogues.
> > >
> > > Tested on arm-none-eabi. OK for trunk?
> > >
> > > 2021-10-04  Tejas Belagod  
> > >
> > > gcc/ChangeLog:
> > >
> > >   * common/config/arm/arm-common.c
> > >(arm_print_hit_for_pacbti_option): New.
> > >(arm_progress_next_token): New.
> > >(arm_parse_pac_ret_clause): New routine for parsing the
> > >   pac-ret clause for -mbranch-protection.
> > >   (arm_parse_pacbti_option): New routine to parse all the options
> > >   to -mbranch-protection.
> > >   * config/arm/arm-protos.h (arm_parse_pacbti_option): Export.
> > >   * config/arm/arm.c (arm_configure)build_target): Handle option
> > >   to -mbranch-protection.
> > >   * config/arm/arm.opt (mbranch-protection). New.
> > >   (arm_enable_pacbti): New.
> > >
> >
> > You're missing documentation for invoke.texi.
> >
> > Also, how does this differ from the exising option in aarch64?  Can
> > the code from that be adapted to be made common to both targets rather
> > than doing a new implementation?
> >
> > Finally, there are far to many manifest constants in this patch, they
> > need replacing with enums or #defines as appropriate if we cannot
> > share the
> > aarch64 code.
> >
> 
> Thanks for the reviews.
> 
> This change refactors all the mbranch-protection option parsing code and
> types to make it common to both AArch32 and AArch64 backends.  This
> change also pulls in some supporting types from AArch64 to make it common
> (aarch_parse_opt_result).  The significant changes in this patch are the
> movement of all branch protection parsing routines from aarch64.c to aarch-
> common.c and supporting data types and static data structures.  This patch
> also pre-declares variables and types required in the aarch32 back for moved
> variables for function sign scope and key to prepare for the impending series
> of patches that support parsing the feature mbranch-protection in the
> aarch32 back end.
> 
> 2021-10-25  Tejas Belagod  
> 
> gcc/ChangeLog:
> 
>   * common/config/aarch64/aarch64-common.c: Include aarch-
> common.h.
>   (all_architectures): Fix comment.
>   (aarch64_parse_extension): Rename return type, enum value
> names.
>   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
> Rename
>   factored out aarch_ra_sign_scope and aarch_ra_sign_key variables.
>   Also rename corresponding enum values.
>   * config/aarch64/aarch64-opts.h (aarch64_function_type): Factor out
>   aarch64_function_type and move it to common code as
> aarch_function_type
>   in aarch-common.h.
>   * config/aarch64/aarch64-protos.h: Include common types header,
> move out
>   types aarch64_parse_opt_result and aarch64_key_type to aarch-
> common.h
>   * config/aarch64/aarch64.c: Move mbranch-protection parsing types
> and
>   functions out into aarch-common.h and aarch-common.c.  Fix up all
> the name
>   changes resulting from the move.
>   * config/aarch64/aarch64.md: Fix up aarch64_ra_sign_key type name
> change
>   and enum value.
>   * config/aarch64/aarch64.opt: Include aarch-common.h to import
> type move.
>   Fix up name changes from factoring out common code and data.
>   * config/arm/aarch-common-protos.h: Export factored out routines
> to both
>   backends.
>   * config/arm/aarch-common.c: Include newly factored out types.
> Move all
>   mbranch-protection code and data structures from aarch64.c.
>   * config/arm/aarch-common.h:

[Patch 9/9, GCC, Arm] Implement arm Function target attribute 'branch-protection'.

2021-11-12 Thread Tejas Belagod via Gcc-patches
Hi,

This patch is part of the series of PACBTI-M patches posted earlier 
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582773.html

This change adds the target function attribute 'branch-protection'.  The
options that it can take are the same the command-line option
'mbranch-protection'.  The function attribute options override the command-
line options for the function scope.

Regression tested for arm-none-eabi. OK for trunk?

2021-11-11  Tejas Belagod  

gcc/ChangeLog:
* config/arm/arm.c (arm_valid_target_attribute_rec): Add ARM function
attribute 'branch-protection' and parse its options.
* doc/extend.texi: Document ARM Function attribute 'branch-protection'.

gcc/testsuite/
* gcc.target/arm/acle/pacbti-m-predef-7.c: New test.

Thanks,
Tejas.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
a87bcb298f9e6d7b2f3fd61b4586e291f46b0f81..64253f3814786b302f8fea573fdfc4213da439ce
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -33147,6 +33147,22 @@ arm_valid_target_attribute_rec (tree args, struct 
gcc_options *opts)
 
  opts->x_arm_arch_string = xstrndup (arch, strlen (arch));
}
+  else if (startswith (q, "branch-protection="))
+   {
+ char *bp_str = q + strlen ("branch-protection=");
+
+ opts->x_arm_branch_protection_string
+   = xstrndup (bp_str, strlen (bp_str));
+
+ /* Capture values from target attribute.  */
+ aarch_validate_mbranch_protection
+   (opts->x_arm_branch_protection_string);
+
+ /* Init function target attr values.  */
+ opts->x_aarch_ra_sign_scope = aarch_ra_sign_scope;
+ opts->x_aarch_enable_bti = aarch_enable_bti;
+
+   }
   else if (q[0] == '+')
{
  opts->x_arm_arch_string
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 
eee4c6737bbfa9529fd613a0197d513121d058ec..c605adf665754804735e244006fb39f02705c74e
 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4767,6 +4767,12 @@ In this example @code{target("+crc+nocrypto")} enables 
the @code{crc}
 extension and disables the @code{crypto} extension for the function @code{foo}
 without modifying an existing @option{-march=} or @option{-mcpu} option.
 
+@item branch-protection=
+@cindex @code{branch-protection=} function attribute, arm
+Select the function scope on which branch protection will be applied.  The
+behavior and permissible arguments are the same as for the command-line option
+@option{-mbranch-protection=}.  The default value is @code{none}.
+
 @end table
 
 @end table
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-7.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-7.c
new file mode 100644
index 
..ccf3e1cb9ae6cbed77844142e94641548b75c918
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-7.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+leaf --save-temps" } 
*/
+
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 0" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
+
+#if defined (__ARM_FEATURE_BTI_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be undefined."
+#endif
+
+#if !defined (__ARM_FEATURE_PAC_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_PAC_DEFAULT should be defined."
+#endif
+
+__attribute__((target("branch-protection=pac-ret+bti"), noinline))
+void foo ()
+{
+  if (__ARM_FEATURE_PAC_DEFAULT != 5)
+__builtin_abort ();
+}
+
+int
+main()
+{
+  foo ();
+  return 0;
+}


[Patch 8/8, Arm, GCC] Introduce multilibs for PACBTI target feature. [Was RE: [Patch 7/7, Arm, GCC] Introduce multilibs for PACBTI target feature.]

2021-10-28 Thread Tejas Belagod via Gcc-patches


> -Original Message-
> From: Gcc-patches  bounces+belagod=gcc.gnu@gcc.gnu.org> On Behalf Of Tejas Belagod via
> Gcc-patches
> Sent: Friday, October 8, 2021 1:19 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [Patch 7/7, Arm, GCC] Introduce multilibs for PACBTI target feature.
> 
> Hi,
> 
> This patch adds a multilib for pacbti target feature.
> 
> Tested on arm-none-eabi. OK for trunk?
> 
> 2021-10-04  Tejas Belagod  
> 
> gcc/ChangeLog:
> 
>   * config/arm/t-rmprofile: Add multilib rules for +pacbti.


This patch adds a multilib for pacbti target feature.

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* config/arm/t-rmprofile: Add multilib rules for +pacbti.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 
a6036bf0a5191a3cac3bfbe2329783204d5c3ef4..241bf1939e30ae7890ae332556d33759f538ced5
 100644
--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -27,8 +27,8 @@
 
 # Arch and FPU variants to build libraries with
 
-MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve
-MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve
+MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve/march=armv8.1-m.main+pacbti
+MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve v8.1-m.main+pacbti
 
 # Base M-profile (no fp)
 MULTILIB_REQUIRED  += mthumb/march=armv6s-m/mfloat-abi=soft
@@ -36,6 +36,7 @@ MULTILIB_REQUIRED += mthumb/march=armv7-m/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv7e-m/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv8-m.base/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv8-m.main/mfloat-abi=soft
+MULTILIB_REQUIRED  += mthumb/march=armv8.1-m.main+pacbti/mfloat-abi=soft
 
 # ARMv7e-M with FP (single and double precision variants)
 MULTILIB_REQUIRED  += mthumb/march=armv7e-m+fp/mfloat-abi=hard
@@ -93,3 +94,4 @@ MULTILIB_MATCHES  += 
march?armv8-m.main=mlibarch?armv8-m.main
 MULTILIB_MATCHES   += march?armv8-m.main+fp=mlibarch?armv8-m.main+fp
 MULTILIB_MATCHES   += march?armv8-m.main+fp.dp=mlibarch?armv8-m.main+fp.dp
 MULTILIB_MATCHES   += march?armv8.1-m.main+mve=mlibarch?armv8.1-m.main+mve
+MULTILIB_MATCHES   += 
march?armv8.1-m.main+pacbti=mlibarch?armv8.1-m.main+pacbti


[Patch 7/8, Arm, GCC] Emit build attributes for PACBTI target feature. [ Was RE: [Patch 6/7, Arm, GCC] Emit build attributes for PACBTI target feature.]

2021-10-28 Thread Tejas Belagod via Gcc-patches


> -Original Message-
> From: Gcc-patches  bounces+belagod=gcc.gnu@gcc.gnu.org> On Behalf Of Tejas Belagod via
> Gcc-patches
> Sent: Friday, October 8, 2021 1:19 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [Patch 6/7, Arm, GCC] Emit build attributes for PACBTI target
> feature.
> 
> Hi,
> 
> This patch emits assembler directives for PACBTI build attributes as defined
> by the ABI. (https://github.com/ARM-software/abi-
> aa/releases/download/2021Q1/addenda32.pdf)
> 
> Tested on arm-none-eabi.
> 
> 2021-10-04  Tejas Belagod  
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.c (arm_file_start): Emit EABI attributes for
>   Tag_PAC_extension, Tag_BTI_extension, TAG_BTI_use,
> TAG_PACRET_use.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/acle/pacbti-m-predef-1.c: New test.
>   * gcc.target/arm/acle/pacbti-m-predef-3: New test.
>   * gcc.target/arm/acle/pacbti-m-predef-6.c: New test.


This patch emits assembler directives for PACBTI build attributes
as defined by the ABI.
https://github.com/ARM-software/abi-aa/releases/download/2021Q1/addenda32.pdf

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm.c (arm_file_start): Emit EABI attributes for
Tag_PAC_extension, Tag_BTI_extension, TAG_BTI_use, TAG_PACRET_use.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-1.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-3: New test.
* gcc.target/arm/acle/pacbti-m-predef-6.c: New test.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
946841526ee127105396097d143e755bdfc756f5..a87bcb298f9e6d7b2f3fd61b4586e291f46b0f81
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28200,6 +28200,8 @@ static void
 arm_file_start (void)
 {
   int val;
+  bool pac = (aarch_ra_sign_scope != AARCH_FUNCTION_NONE);
+  bool bti = (aarch_enable_bti == 1);
 
   arm_print_asm_arch_directives
 (asm_out_file, TREE_TARGET_OPTION (target_option_default_node));
@@ -28270,6 +28272,24 @@ arm_file_start (void)
arm_emit_eabi_attribute ("Tag_ABI_FP_16bit_format", 38,
 (int) arm_fp16_format);
 
+  if (TARGET_HAVE_PACBTI)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 2);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 2);
+ arm_emit_eabi_attribute ("TAG_BTI_use", 74, bti);
+ arm_emit_eabi_attribute ("TAG_PACRET_use", 76, pac);
+   }
+  else
+   {
+ if (pac || bti)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 1);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 1);
+ arm_emit_eabi_attribute ("TAG_BTI_use", 74, bti);
+ arm_emit_eabi_attribute ("TAG_PACRET_use", 76, pac);
+   }
+   }
+
   if (arm_lang_output_object_attributes_hook)
arm_lang_output_object_attributes_hook();
 }
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
new file mode 100644
index 
..cc88380731ae81dd27c0a343518252a172f8f3ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
@@ -0,0 +1,30 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+bti --save-temps" } */
+
+/* { dg-final { scan-assembler "\.arch_extension pacbti" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
+
+#if !defined (__ARM_FEATURE_BTI_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be defined."
+#endif
+
+#if !defined (__ARM_FEATURE_PAC_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_PAC_DEFAULT should be defined."
+#endif
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 1)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
new file mode 100644
index 
..8bebd995b170df953e13f86d2276576d5ab34e93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
@@ -0,0 +1,26 @@
+
+/* { dg-do run } 

[Patch 6/8, Arm. GCC] Add pointer authentication for stack-unwinding runtime. [Was RE: [Patch 5/7, Arm. GCC] Add pointer authentication for stack-unwinding runtime.]

2021-10-28 Thread Tejas Belagod via Gcc-patches


> -Original Message-
> From: Gcc-patches  bounces+belagod=gcc.gnu@gcc.gnu.org> On Behalf Of Tejas Belagod via
> Gcc-patches
> Sent: Friday, October 8, 2021 1:18 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [Patch 5/7, Arm. GCC] Add pointer authentication for stack-
> unwinding runtime.
> 
> Hi,
> 
> This patch adds authentication for when the stack is unwound when an
> exception is taken.  All the changes here are done to the runtime code in
> libgcc's unwinder code for Arm target. All the changes are guarded under
> defined (__ARM_FEATURE_PAC_DEFAULT) and activates only if the +pacbti
> feature is switched on for the architecture. This means that switching on the
> target feature via -march or -mcpu is sufficient and -mbranch-protection
> need not be enabled. This ensures that the unwinder is authenticated only if
> the PACBTI instructions are available in the non-NOP space as it uses AUTG.
> Just generating PAC/AUT instructions using -mbranch-protection will not
> enable authentication on the unwinder.
> 
> Tested on arm-none-eabi. OK for trunk?
> 
> 2021-10-04  Tejas Belagod  
> 
> gcc/ChangeLog:
> 
>   * ginclude/unwind-arm-common.h (_Unwind_VRS_RegClass):
> Introduce
>   new pseudo register class _UVRSC_PAC.
>   * libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
>   exception opcode (0xb4) for saving RA_AUTH_CODE and
> authenticate
>   with AUTG if found.
>   * libgcc/config/arm/unwind-arm.c (struct pseudo_regs): New.
>   (phase1_vrs): Introduce new field to store pseudo-reg state.
>   (phase2_vrs): Likewise.
>   (_Unwind_VRS_Get): Load pseudo register state from virtual reg set.
>   (_Unwind_VRS_Set): Store pseudo register state to virtual reg set.
>   (_Unwind_VRS_Pop): Load pseudo register value from stack into
> VRS.

Rebased and respin based on reviews for previous patches.

This patch adds authentication for when the stack is unwound when
an exception is taken.  All the changes here are done to the runtime
code in libgcc's unwinder code for Arm target. All the changes are
guarded under defined (__ARM_FEATURE_PAUTH) and activates only
if the +pacbti feature is switched on for the architecture. This means
that switching on the target feature via -march or -mcpu is sufficient
and -mbranch-protection need not be enabled. This ensures that the
unwinder is authenticated only if the PACBTI instructions are available
in the non-NOP space as it uses AUTG. Just generating PAC/AUT instructions
using -mbranch-protection will not enable authentication on the unwinder.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* ginclude/unwind-arm-common.h (_Unwind_VRS_RegClass): Introduce
new pseudo register class _UVRSC_PAC.
* libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
exception opcode (0xb4) for saving RA_AUTH_CODE and authenticate
with AUTG if found.
* libgcc/config/arm/unwind-arm.c (struct pseudo_regs): New.
(phase1_vrs): Introduce new field to store pseudo-reg state.
(phase2_vrs): Likewise.
(_Unwind_VRS_Get): Load pseudo register state from virtual reg set.
(_Unwind_VRS_Set): Store pseudo register state to virtual reg set.
(_Unwind_VRS_Pop): Load pseudo register value from stack into VRS.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.

diff --git a/gcc/ginclude/unwind-arm-common.h b/gcc/ginclude/unwind-arm-common.h
index 
79f107d8abb2dd1e2d4903531db47147da63fee8..b60c07128460f8c5e82bdffac7ec469f3607a271
 100644
--- a/gcc/ginclude/unwind-arm-common.h
+++ b/gcc/ginclude/unwind-arm-common.h
@@ -127,7 +127,10 @@ extern "C" {
   _UVRSC_VFP = 1,   /* vfp */
   _UVRSC_FPA = 2,   /* fpa */
   _UVRSC_WMMXD = 3, /* Intel WMMX data register */
-  _UVRSC_WMMXC = 4  /* Intel WMMX control register */
+  _UVRSC_WMMXC = 4, /* Intel WMMX control register */
+#if defined(__ARM_FEATURE_PAUTH)
+  _UVRSC_PAC = 5/* Armv8.1-M Mainline PAC/AUTH pseudo-register */
+#endif
 }
   _Unwind_VRS_RegClass;
 
diff --git a/libgcc/config/arm/pr-support.c b/libgcc/config/arm/pr-support.c
index 
7525e35b4918d38b4ab3ae73a69b722e31b4b322..da27d742fc7be1cef7704a1ea03204743017a591
 100644
--- a/libgcc/config/arm/pr-support.c
+++ b/libgcc/config/arm/pr-support.c
@@ -106,6 +106,9 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
 {
   _uw op;
   int set_pc;
+#if defined(__ARM_FEATURE_PAUTH)
+  int set_pac = 0;
+#endif
   _uw reg;
 
   set_pc = 0;
@@ -114,6 +117,22 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
   op = n

[Patch 5/8, Arm, GCC] Implement target feature macros for PACBTI. [Was RE: [Patch 4/7, Arm. GCC] Implement target feature macros for PACBTI.]

2021-10-28 Thread Tejas Belagod via Gcc-patches


> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 2:58 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 4/7, Arm. GCC] Implement target feature macros for
> PACBTI.
> 
> On 08/10/2021 13:18, Tejas Belagod via Gcc-patches wrote:
> > Hi,
> >
> > This patch implements target feature macros when PACBTI is enabled
> > through the -march option or -mbranch-protection.
> >
> > Tested on arm-none-eabi. OK for trunk?
> >
> > 2021-10-04  Tejas Belagod  
> >
> > gcc/ChangeLog:
> >
> > * config/arm/arm-c.c (arm_cpu_builtins): Define
> > __ARM_FEATURE_BTI_DEFAULT and
> __ARM_FEATURE_PAC_DEFAULT.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/arm/acle/pacbti-m-predef-2.c: New test.
> > * gcc.target/arm/acle/pacbti-m-predef-4.c: New test.
> > * gcc.target/arm/acle/pacbti-m-predef-5.c: New test.
> >
> 
> I presume the specification for this is ACLE - please say so rather than 
> making
> me guess.
> 

Yes, sorry, very poor description on my part. Now fixed - please see patch 
description below for links to specific ACLE sections.

> 
> +  cpp_undef (pfile, "__ARM_FEATURE_BTI_DEFAULT");
> +  cpp_undef (pfile, "__ARM_FEATURE_PAC_DEFAULT");
> +  if (TARGET_HAVE_PACBTI)
> +{
> +  builtin_define_with_int_value ("__ARM_FEATURE_BTI_DEFAULT",
> +  arm_enable_pacbti & 0x1);
> 
> My reading of the ACLE specification would suggest this shouldn't be
> defined if it would have a value of 0, but that's not what this code
> does.  I think it would be better to move this outside the
> TARGET_HAVE_PACBTI and use the def_or_undef approach.
> 
> +  builtin_define_with_int_value ("__ARM_FEATURE_PAC_DEFAULT",
> +  arm_enable_pacbti >> 1);
> 
> This one is less clear, could the value ever be zero?  I guess exactly
> one of a-key and b-key must be defined and each has a separate bit.
> 

Now fixed according to what the arch specifies. For the M-profile, there's only 
one key which means when -mbranch-protection is invoked, bit 0 is always 1.

> +}
> +
> +
> 
> Not more than one blank line at the end of a block.
> 
> 
> diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
> b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
> 
> 
> Given what I've said above, I think you need to also test that
> __ARM_FEATURE_BTI_DEFAULT is defined before testing the value (and
> emitting #error if it isn't).
> 

Fixed.

This patch implements target feature macros when PACBTI is
enabled through the -march option or -mbranch-protection.
The target feature macros __ARM_FEATURE_PAC_DEFAULT and
__ARM_FEATURE_BTI_DEFAULT are specified in ARM ACLE
(https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros?lang=en)
__ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI are specified in the pull-request
(https://github.com/ARM-software/acle/pull/55). 

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_BTI_DEFAULT, __ARM_FEATURE_PAC_DEFAULT,
__ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-2.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-4.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-5.c: New test.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 
cc7901bca8dc9c5c27ed6afc5bc26afd42689e6d..98d47ad4cc6e88aa7401429a809c555c5aadc15f
 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -193,6 +193,24 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_PAUTH", TARGET_HAVE_PACBTI);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BTI", TARGET_HAVE_PACBTI);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BTI_DEFAULT",
+ aarch_enable_bti == 1);
+
+  cpp_undef (pfile, "__ARM_FEATURE_PAC_DEFAULT");
+  if (aarch_ra_sign_scope != AARCH_FUNCTION_NONE)
+  {
+unsigned int pac = 1;
+
+gcc_assert (aarch_ra_sign_key == AARCH_KEY_A);
+
+if (aarch_ra_sign_scope == AARCH_FUNCTION_ALL)
+  pac |= 0x4;
+
+builtin_define_with_int_value ("__ARM_FEATURE_PAC_DEFAULT&quo

[Patch 4/8, Arm. GCC] Add testsuite library support for PACBTI target. [Was RE: [Patch 3/7, Arm, GCC] Add testsuite library support for PACBTI target.]

2021-10-28 Thread Tejas Belagod via Gcc-patches


> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 2:38 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 3/7, Arm, GCC] Add testsuite library support for PACBTI
> target.
> 
> On 11/10/2021 14:36, Richard Earnshaw via Gcc-patches wrote:
> > On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:
> >> Hi,
> >>
> >> Add targeting-checking entities for PACBTI in testsuite framework.
> >>
> >> Tested on arm-none-eabi. OK for trunk?
> >>
> >> 2021-10-04  Tejas Belagod  
> >>
> >> gcc/ChangeLog:
> >>
> >> * testsuite/lib/target-supports.exp
> >> (check_effective_target_arm_pacbti_hw): New.
> >>
> >
> > OK.
> >
> > R.
> 
> Oh, wait!  Not OK.  Needs documentation in sourcebuild.texi.
> 

Thanks for the reviews.

Add targeting-checking entities for PACBTI in testsuite
framework.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* testsuite/lib/target-supports.exp:
(check_effective_target_arm_pacbti_hw): New.
* doc/sourcebuild.texi: Document arm_pacbti_hw.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 
6a16576763006a13e946147ab1ea5b16b5bc219b..3dd1dd8d7f031720e55cf389376f1572991d8071
 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2141,6 +2141,10 @@ ARM target supports options to generate instructions 
from ARMv8.1-M with
 the Custom Datapath Extension (CDE) and M-Profile Vector Extension (MVE).
 Some multilibs may be incompatible with these options.
 
+@item arm_pacbti_hw
+Test system supports executing Pointer Authentication and Branch Target
+Identification instructions.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
1c8b1ebb86e8769e40fe88af3a4c651990dbb2a1..843397adf437700ca622ce140359b6aaa0172e42
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5064,6 +5064,22 @@ proc check_effective_target_arm_cmse_clear_ok {} {
 } "-mcmse"];
 }
 
+# Return 1 if the target supports executing PACBTI instructions, 0
+# otherwise.
+
+proc check_effective_target_arm_pacbti_hw {} {
+return [check_runtime arm_pacbti_hw_available {
+   __attribute__ ((naked)) int
+   main (void)
+   {
+ asm ("pac r12, lr, sp");
+ asm ("mov r0, #0");
+ asm ("autg r12, lr, sp");
+ asm ("bx lr");
+   }
+} ""]
+}
+
 # Return 1 if this compilation turns on string_ops_prefer_neon on.
 
 proc check_effective_target_arm_tune_string_ops_prefer_neon { } {


[Patch 3/8, Arm, GCC] Add option -mbranch-protection. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]

2021-10-28 Thread Tejas Belagod via Gcc-patches


> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 1:58 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.
> 
> On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:
> > Hi,
> >
> > Add -mbranch-protection option and its associated parsing routines.
> > This option enables the code-generation of pointer signing and
> > authentication instructions in function prologues and epilogues.
> >
> > Tested on arm-none-eabi. OK for trunk?
> >
> > 2021-10-04  Tejas Belagod  
> >
> > gcc/ChangeLog:
> >
> > * common/config/arm/arm-common.c
> >  (arm_print_hit_for_pacbti_option): New.
> >  (arm_progress_next_token): New.
> >  (arm_parse_pac_ret_clause): New routine for parsing the
> > pac-ret clause for -mbranch-protection.
> > (arm_parse_pacbti_option): New routine to parse all the options
> > to -mbranch-protection.
> > * config/arm/arm-protos.h (arm_parse_pacbti_option): Export.
> > * config/arm/arm.c (arm_configure)build_target): Handle option
> > to -mbranch-protection.
> > * config/arm/arm.opt (mbranch-protection). New.
> > (arm_enable_pacbti): New.
> >
> 
> You're missing documentation for invoke.texi.
> 
> Also, how does this differ from the exising option in aarch64?  Can the code
> from that be adapted to be made common to both targets rather than doing
> a new implementation?
> 
> Finally, there are far to many manifest constants in this patch, they need
> replacing with enums or #defines as appropriate if we cannot share the
> aarch64 code.

Thanks for the reviews.

Add -mbranch-protection option.  This option enables the code-generation of
pointer signing and authentication instructions in function prologues and
epilogues.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm.c (arm_configure_build_target): Parse and validate
-mbranch-protection option and initialize appropriate data structures.
* config/arm/arm.opt: New option -mbranch-protection.
* doc/invoke.texi: Document -mbranch-protection.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
a952655db80663f28f5a5d12005f2adb4702894f..946841526ee127105396097d143e755bdfc756f5
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3216,6 +3216,17 @@ arm_configure_build_target (struct arm_build_target 
*target,
   tune_opts = strchr (opts->x_arm_tune_string, '+');
 }
 
+  if (opts->x_arm_branch_protection_string)
+{
+  aarch_validate_mbranch_protection (opts->x_arm_branch_protection_string);
+
+  if (aarch_ra_sign_key != AARCH_KEY_A)
+   {
+ warning (0, "invalid key type for %<-mbranch-protection=%>");
+ aarch_ra_sign_key = AARCH_KEY_A;
+   }
+}
+
   if (arm_selected_arch)
 {
   arm_initialize_isa (target->isa, arm_selected_arch->common.isa_bits);
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 
5c5b4f3ae0699a3a9d78df40a5ab65324dcba7b9..4f2754c3e84c436f7058ea0bd1c9f517b3a63ccd
 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -313,6 +313,10 @@ mbranch-cost=
 Target RejectNegative Joined UInteger Var(arm_branch_cost) Init(-1)
 Cost to assume for a branch insn.
 
+mbranch-protection=
+Target RejectNegative Joined Var(arm_branch_protection_string) Save
+Use branch-protection features.
+
 mgeneral-regs-only
 Target RejectNegative Mask(GENERAL_REGS_ONLY) Save
 Generate code which uses the core registers only (r0-r14).
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
27df8cf5bee79c2abac8b81c1ac54f1c3e50c628..7f886db008a39c44819616eb2799c01822d0aae9
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -810,7 +810,9 @@ Objective-C and Objective-C++ Dialects}.
 -mpure-code @gol
 -mcmse @gol
 -mfix-cmse-cve-2021-35465 @gol
--mfdpic}
+-mfdpic @gol
+-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]
+[+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]}
 
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args @gol
@@ -20969,6 +20971,18 @@ The opposite @option{-mno-fdpic} option is useful (and 
required) to
 build the Linux kernel using the same (@code{arm-*-uclinuxfdpiceabi})
 toolchain as the one used to build the userland programs.
 
+@item 
-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}][+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]
+@opindex mbranch-protection
+Select the branch protection feature

[Patch 2/8, Arm, GCC] Add Armv8.1-M Mainline target feature +pacbti. [Was RE: [Patch 1/7, Arm, GCC] Add Armv8.1-M Mainline target feature +pacbti.]

2021-10-28 Thread Tejas Belagod via Gcc-patches


> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 1:29 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 1/7, Arm, GCC] Add Armv8.1-M Mainline target feature
> +pacbti.
> 
> On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:
> > Hi,
> >
> > This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
> > This feature enables pointer signing and authentication instructions
> > on M-class architectures.
> >
> > Tested on arm-none-eabi. OK for trunk?
> >
> > 2021-10-04  Tejas Belagod  
> >
> > gcc/Changelog:
> >
> > * config/arm/arm-cpus.in: Define new feature pacbti.
> > * config/arm/arm.h (TARGET_HAVE_PACBTI): New.
> >
> 
> "+pacbti" needs to be documented in invoke.texi at the appropriate place.
> 

Thanks for the reviews.

This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
This feature enables pointer signing and authentication instructions
on M-class architectures.

2021-10-25  Tejas Belagod  

gcc/Changelog:

* config/arm/arm-cpus.in: Define new feature pacbti.
* config/arm/arm.h (TARGET_HAVE_PACBTI): New.
* doc/invoke.texi: Document new feature pacbti.



Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap


> R.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
d0d0d0f1c7e4176fc4aa30d82394fe938b083a59..8a0e9c79682766ee2bec3fd7ba6ed67dff69dbad
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -223,6 +223,10 @@ define feature cdecp5
 define feature cdecp6
 define feature cdecp7
 
+# M-profile control flow integrity extensions (PAC/AUT/BTI).
+# Optional from Armv8.1-M Mainline.
+define feature pacbti
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -741,6 +745,7 @@ begin arch armv8.1-m.main
  option nofp remove ALL_FP
  option mve add MVE
  option mve.fp add MVE_FP
+ option pacbti add pacbti
  option cdecp0 add cdecp0
  option cdecp1 add cdecp1
  option cdecp2 add cdecp2
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
015299c15346f1bea59d70fdcb1d19545473b23b..8e6ef41f6b065217d1af3f4f1cb85b2d8fbd0dc0
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -335,6 +335,12 @@ emission of floating point pcs attributes.  */
isa_bit_mve_float) \
   && !TARGET_GENERAL_REGS_ONLY)
 
+/* Non-zero if this target supports Armv8.1-M Mainline pointer-signing
+   extension.  */
+#define TARGET_HAVE_PACBTI (arm_arch8_1m_main \
+   && bitmap_bit_p (arm_active_target.isa, \
+isa_bit_pacbti))
+
 /* MVE have few common instructions as VFP, like VLDM alias VPOP, VLDR, VSTM
alia VPUSH, VSTR and VMOV, VMSR and VMRS.  In the same manner it updates few
registers such as FPCAR, FPCCR, FPDSCR, FPSCR, MVFR0, MVFR1 and MVFR2.  All
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
71992b8c59749f5508a3c6a1b1792910652eac57..27df8cf5bee79c2abac8b81c1ac54f1c3e50c628
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20469,6 +20469,9 @@ Disable the floating-point extension.
 @item +cdecp0, +cdecp1, ... , +cdecp7
 Enable the Custom Datapath Extension (CDE) on selected coprocessors according
 to the numbers given in the options in the range 0 to 7.
+
+@item +pacbti
+Enable the Pointer Authentication and Branch Target Identification Extension.
 @end table
 
 @item  armv8-m.main


[Patch 1/8, Arm, AArch64, GCC] Refactor mbranch-protection option parsing and make it common to AArch32 and AArch64 backends. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]

2021-10-28 Thread Tejas Belagod via Gcc-patches


> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 1:58 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.
> 
> On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:
> > Hi,
> >
> > Add -mbranch-protection option and its associated parsing routines.
> > This option enables the code-generation of pointer signing and
> > authentication instructions in function prologues and epilogues.
> >
> > Tested on arm-none-eabi. OK for trunk?
> >
> > 2021-10-04  Tejas Belagod  
> >
> > gcc/ChangeLog:
> >
> > * common/config/arm/arm-common.c
> >  (arm_print_hit_for_pacbti_option): New.
> >  (arm_progress_next_token): New.
> >  (arm_parse_pac_ret_clause): New routine for parsing the
> > pac-ret clause for -mbranch-protection.
> > (arm_parse_pacbti_option): New routine to parse all the options
> > to -mbranch-protection.
> > * config/arm/arm-protos.h (arm_parse_pacbti_option): Export.
> > * config/arm/arm.c (arm_configure)build_target): Handle option
> > to -mbranch-protection.
> > * config/arm/arm.opt (mbranch-protection). New.
> > (arm_enable_pacbti): New.
> >
> 
> You're missing documentation for invoke.texi.
> 
> Also, how does this differ from the exising option in aarch64?  Can the code
> from that be adapted to be made common to both targets rather than doing
> a new implementation?
> 
> Finally, there are far to many manifest constants in this patch, they need
> replacing with enums or #defines as appropriate if we cannot share the
> aarch64 code.
> 

Thanks for the reviews.

This change refactors all the mbranch-protection option parsing code and types
to make it common to both AArch32 and AArch64 backends.  This change also pulls
in some supporting types from AArch64 to make it common
(aarch_parse_opt_result).  The significant changes in this patch are the
movement of all branch protection parsing routines from aarch64.c to
aarch-common.c and supporting data types and static data structures.  This
patch also pre-declares variables and types required in the aarch32 back for
moved variables for function sign scope and key to prepare for the impending
series of patches that support parsing the feature mbranch-protection in the
aarch32 back end.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.c: Include aarch-common.h.
(all_architectures): Fix comment.
(aarch64_parse_extension): Rename return type, enum value names.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Rename
factored out aarch_ra_sign_scope and aarch_ra_sign_key variables.
Also rename corresponding enum values.
* config/aarch64/aarch64-opts.h (aarch64_function_type): Factor out
aarch64_function_type and move it to common code as aarch_function_type
in aarch-common.h.
* config/aarch64/aarch64-protos.h: Include common types header, move out
types aarch64_parse_opt_result and aarch64_key_type to aarch-common.h
* config/aarch64/aarch64.c: Move mbranch-protection parsing types and
functions out into aarch-common.h and aarch-common.c.  Fix up all the 
name
changes resulting from the move.
* config/aarch64/aarch64.md: Fix up aarch64_ra_sign_key type name change
and enum value.
* config/aarch64/aarch64.opt: Include aarch-common.h to import type 
move.
Fix up name changes from factoring out common code and data.
* config/arm/aarch-common-protos.h: Export factored out routines to both
backends.
* config/arm/aarch-common.c: Include newly factored out types.  Move all
mbranch-protection code and data structures from aarch64.c.
* config/arm/aarch-common.h: New header that declares types shared 
between
aarch32 and aarch64 backends.
* config/arm/arm-protos.h: Declare types and variables that are made 
common
to aarch64 and aarch32 backends - aarch_ra_sign_key, 
aarch_ra_sign_scope and
aarch_enable_bti.


Tested the following configurations. OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.

> R.
diff --git a/gcc/common/config/aarch64/aarch64-common.c 
b/gcc/common/config/aarch64/aarch64-common.c
index 
6d200a186604be2028b19ee9691e7bbf4a7be9c2..92c8f14a17466b9d6c44bdf4ede673a65f1b426f
 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -30,6 +30,7 @@
 #include "opts.h"
 #include "flags.h"
 #include "diagn

[Patch 7/7, Arm, GCC] Introduce multilibs for PACBTI target feature.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch adds a multilib for pacbti target feature.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* config/arm/t-rmprofile: Add multilib rules for +pacbti.
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 
a6036bf0a5191a3cac3bfbe2329783204d5c3ef4..241bf1939e30ae7890ae332556d33759f538ced5
 100644
--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -27,8 +27,8 @@
 
 # Arch and FPU variants to build libraries with
 
-MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve
-MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve
+MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve/march=armv8.1-m.main+pacbti
+MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve v8.1-m.main+pacbti
 
 # Base M-profile (no fp)
 MULTILIB_REQUIRED  += mthumb/march=armv6s-m/mfloat-abi=soft
@@ -36,6 +36,7 @@ MULTILIB_REQUIRED += mthumb/march=armv7-m/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv7e-m/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv8-m.base/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv8-m.main/mfloat-abi=soft
+MULTILIB_REQUIRED  += mthumb/march=armv8.1-m.main+pacbti/mfloat-abi=soft
 
 # ARMv7e-M with FP (single and double precision variants)
 MULTILIB_REQUIRED  += mthumb/march=armv7e-m+fp/mfloat-abi=hard
@@ -93,3 +94,4 @@ MULTILIB_MATCHES  += 
march?armv8-m.main=mlibarch?armv8-m.main
 MULTILIB_MATCHES   += march?armv8-m.main+fp=mlibarch?armv8-m.main+fp
 MULTILIB_MATCHES   += march?armv8-m.main+fp.dp=mlibarch?armv8-m.main+fp.dp
 MULTILIB_MATCHES   += march?armv8.1-m.main+mve=mlibarch?armv8.1-m.main+mve
+MULTILIB_MATCHES   += 
march?armv8.1-m.main+pacbti=mlibarch?armv8.1-m.main+pacbti


[Patch 6/7, Arm, GCC] Emit build attributes for PACBTI target feature.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch emits assembler directives for PACBTI build attributes
as defined by the ABI. 
(https://github.com/ARM-software/abi-aa/releases/download/2021Q1/addenda32.pdf)

Tested on arm-none-eabi.

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm.c (arm_file_start): Emit EABI attributes for
Tag_PAC_extension, Tag_BTI_extension, TAG_BTI_use, TAG_PACRET_use.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-1.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-3: New test.
* gcc.target/arm/acle/pacbti-m-predef-6.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
1f939a6b79a90430abf120e0aa075dfc1fab29a8..557aae371e2707cb8db569ce033242a139b64e86
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28305,6 +28305,27 @@ arm_file_start (void)
arm_emit_eabi_attribute ("Tag_ABI_FP_16bit_format", 38,
 (int) arm_fp16_format);
 
+  if (TARGET_HAVE_PACBTI)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 2);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 2);
+ arm_emit_eabi_attribute ("TAG_BTI_use", 74, arm_enable_pacbti & 0x1);
+ arm_emit_eabi_attribute ("TAG_PACRET_use", 76,
+  (arm_enable_pacbti >> 1 != 0));
+   }
+  else
+   {
+ if (arm_enable_pacbti != 0)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 1);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 1);
+ arm_emit_eabi_attribute ("TAG_BTI_use", 74,
+  arm_enable_pacbti & 0x1);
+ arm_emit_eabi_attribute ("TAG_PACRET_use", 76,
+  (arm_enable_pacbti >> 1 != 0));
+   }
+   }
+
   if (arm_lang_output_object_attributes_hook)
arm_lang_output_object_attributes_hook();
 }
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
new file mode 100644
index 
..de9102be3f293605d0891c45cd247be9cf8bd00b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
@@ -0,0 +1,22 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+bti --save-temps" } */
+
+/* { dg-final { scan-assembler "\.arch_extension pacbti" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 1)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
new file mode 100644
index 
..6ecdf2f7411e5d44a5304681032d0841d965b49c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
@@ -0,0 +1,21 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+b-key+leaf 
--save-temps" } */
+
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 0" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 0)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 6)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-6.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-6.c
new file mode 100644
index 
..2340bf0f937b7ea68a02500b66f151f0ce3f39b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-6.c
@@ -0,0 +1,21 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=bti --save-temps" } */
+
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 0" } } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 0)
+__builtin_abort ();
+
+  return 0;
+}


[Patch 5/7, Arm. GCC] Add pointer authentication for stack-unwinding runtime.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch adds authentication for when the stack is unwound when
an exception is taken.  All the changes here are done to the runtime
code in libgcc's unwinder code for Arm target. All the changes are
guarded under defined (__ARM_FEATURE_PAC_DEFAULT) and activates only
if the +pacbti feature is switched on for the architecture. This means
that switching on the target feature via -march or -mcpu is sufficient
and -mbranch-protection need not be enabled. This ensures that the
unwinder is authenticated only if the PACBTI instructions are available
in the non-NOP space as it uses AUTG. Just generating PAC/AUT instructions
using -mbranch-protection will not enable authentication on the unwinder.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* ginclude/unwind-arm-common.h (_Unwind_VRS_RegClass): Introduce
new pseudo register class _UVRSC_PAC.
* libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
exception opcode (0xb4) for saving RA_AUTH_CODE and authenticate
with AUTG if found.
* libgcc/config/arm/unwind-arm.c (struct pseudo_regs): New.
(phase1_vrs): Introduce new field to store pseudo-reg state.
(phase2_vrs): Likewise.
(_Unwind_VRS_Get): Load pseudo register state from virtual reg set.
(_Unwind_VRS_Set): Store pseudo register state to virtual reg set.
(_Unwind_VRS_Pop): Load pseudo register value from stack into VRS.
diff --git a/gcc/ginclude/unwind-arm-common.h b/gcc/ginclude/unwind-arm-common.h
index 
79f107d8abb2dd1e2d4903531db47147da63fee8..903c0d22e4a7bf41d806842e030a4ad532fb835f
 100644
--- a/gcc/ginclude/unwind-arm-common.h
+++ b/gcc/ginclude/unwind-arm-common.h
@@ -127,7 +127,10 @@ extern "C" {
   _UVRSC_VFP = 1,   /* vfp */
   _UVRSC_FPA = 2,   /* fpa */
   _UVRSC_WMMXD = 3, /* Intel WMMX data register */
-  _UVRSC_WMMXC = 4  /* Intel WMMX control register */
+  _UVRSC_WMMXC = 4, /* Intel WMMX control register */
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+  _UVRSC_PAC = 5/* Armv8.1-M Mainline PAC/AUTH pseudo-register */
+#endif
 }
   _Unwind_VRS_RegClass;
 
diff --git a/libgcc/config/arm/pr-support.c b/libgcc/config/arm/pr-support.c
index 
7525e35b4918d38b4ab3ae73a69b722e31b4b322..ff45f3c6e08a8df64011c0e3a5f5dd1677b3ed11
 100644
--- a/libgcc/config/arm/pr-support.c
+++ b/libgcc/config/arm/pr-support.c
@@ -106,6 +106,9 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
 {
   _uw op;
   int set_pc;
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+  int set_pac = 0;
+#endif
   _uw reg;
 
   set_pc = 0;
@@ -114,6 +117,22 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
   op = next_unwind_byte (uws);
   if (op == CODE_FINISH)
{
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+ /* When we reach end, we have to authenticate R12 we just popped 
earlier.  */
+ if (set_pac)
+   {
+ _uw sp;
+ _uw lr;
+ _uw pac;
+ _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32, );
+ _Unwind_VRS_Get (context, _UVRSC_CORE, R_LR, _UVRSD_UINT32, );
+ _Unwind_VRS_Get (context, _UVRSC_PAC, R_IP,
+  _UVRSD_UINT32, );
+ __asm__ __volatile__
+   ("autg %0, %1, %2" : : "r"(pac), "r"(lr), "r"(sp) :);
+   }
+#endif
+
  /* If we haven't already set pc then copy it from lr.  */
  if (!set_pc)
{
@@ -227,6 +246,19 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
return _URC_FAILURE;
  continue;
}
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+ /* Pop PAC off the stack into VRS pseudo.pac.  */
+ if (op == 0xb4)
+   {
+ if (_Unwind_VRS_Pop (context, _UVRSC_PAC, 0, _UVRSD_UINT32)
+ != _UVRSR_OK)
+   return _URC_FAILURE;
+ set_pac = 1;
+ continue;
+   }
+
+#endif
+
  if ((op & 0xfc) == 0xb4)  /* Obsolete FPA.  */
return _URC_FAILURE;
 
diff --git a/libgcc/config/arm/unwind-arm.c b/libgcc/config/arm/unwind-arm.c
index 
d0394019c3649f2f6d6a2882389e55b56c21b8ef..6e6eb808d70dd1f6d68ec3c5bf0cd3978cc1166b
 100644
--- a/libgcc/config/arm/unwind-arm.c
+++ b/libgcc/config/arm/unwind-arm.c
@@ -64,6 +64,14 @@ struct wmmxc_regs
   _uw wc[4];
 };
 
+#if defined(__ARM_FEATURE_PAC_DEFAULT)
+/*  Holds value of pseudo registers eg. PAC.  */
+struct pseudo_regs
+{
+  _uw pac;
+};
+#endif
+
 /* The ABI specifies that the unwind routines may only use core registers,
except when actually manipulating coprocessor state.  This allows
us to write one implementation that works on all platforms by
@@ -78,6 +86,11 @@ typedef struct
   /* The first fields must be the same as a phase2_vrs. 

[Patch 4/7, Arm. GCC] Implement target feature macros for PACBTI.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch implements target feature macros when PACBTI is
enabled through the -march option or -mbranch-protection.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_BTI_DEFAULT and __ARM_FEATURE_PAC_DEFAULT.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-2.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-4.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-5.c: New test.
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 
cc7901bca8dc9c5c27ed6afc5bc26afd42689e6d..00dc1c2f13f2023c2ba8d7b03038a4cdde068ef6
 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -193,6 +193,17 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
+  cpp_undef (pfile, "__ARM_FEATURE_BTI_DEFAULT");
+  cpp_undef (pfile, "__ARM_FEATURE_PAC_DEFAULT");
+  if (TARGET_HAVE_PACBTI)
+{
+  builtin_define_with_int_value ("__ARM_FEATURE_BTI_DEFAULT",
+arm_enable_pacbti & 0x1);
+  builtin_define_with_int_value ("__ARM_FEATURE_PAC_DEFAULT",
+arm_enable_pacbti >> 1);
+}
+
+
   cpp_undef (pfile, "__ARM_FEATURE_MVE");
   if (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT)
 {
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
new file mode 100644
index 
..7e8cdb2c5fc74dd22085fcac1f692229300a333a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
@@ -0,0 +1,16 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=bti+pac-ret+b-key+leaf" } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 6)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-4.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-4.c
new file mode 100644
index 
..41fdcf91a8ab789d055407ae3f8c151984660ee9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-4.c
@@ -0,0 +1,16 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+b-key" } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 0)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 2)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-5.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-5.c
new file mode 100644
index 
..9527c9620a3a5c973b47a5f364ae290d975358c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-5.c
@@ -0,0 +1,16 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=bti+pac-ret+leaf" } */
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 5)
+__builtin_abort ();
+
+  return 0;
+}


[Patch 3/7, Arm, GCC] Add testsuite library support for PACBTI target.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

Add targeting-checking entities for PACBTI in testsuite
framework.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* testsuite/lib/target-supports.exp
(check_effective_target_arm_pacbti_hw): New.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
9ebca7ac007363d2a35158bb80092118f629b97b..323541c2da527e3da5dce4d85cadcb2068d9bb5c
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5064,6 +5064,22 @@ proc check_effective_target_arm_cmse_clear_ok {} {
 } "-mcmse"];
 }
 
+# Return 1 if the target supports executing PACBTI instructions, 0
+# otherwise.
+
+proc check_effective_target_arm_pacbti_hw {} {
+return [check_runtime arm_pacbti_hw_available {
+   __attribute__ ((naked)) int
+   main (void)
+   {
+ asm ("pac r12, lr, sp");
+ asm ("mov r0, #0");
+ asm ("autg r12, lr, sp");
+ asm ("bx lr");
+   }
+} ""]
+}
+
 # Return 1 if this compilation turns on string_ops_prefer_neon on.
 
 proc check_effective_target_arm_tune_string_ops_prefer_neon { } {


[Patch 2/7, Arm, GCC] Add option -mbranch-protection.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

Add -mbranch-protection option and its associated parsing routines.
This option enables the code-generation of pointer signing and
authentication instructions in function prologues and epilogues.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* common/config/arm/arm-common.c
 (arm_print_hit_for_pacbti_option): New.
 (arm_progress_next_token): New.
 (arm_parse_pac_ret_clause): New routine for parsing the
pac-ret clause for -mbranch-protection.
(arm_parse_pacbti_option): New routine to parse all the options
to -mbranch-protection.
* config/arm/arm-protos.h (arm_parse_pacbti_option): Export.
* config/arm/arm.c (arm_configure)build_target): Handle option
to -mbranch-protection.
* config/arm/arm.opt (mbranch-protection). New.
(arm_enable_pacbti): New.
diff --git a/gcc/common/config/arm/arm-common.c 
b/gcc/common/config/arm/arm-common.c
index 
de898a74165db4d7250aa0097dfab682beb0f99c..188feebb15b52f389d5d0b3ec322be3017efd5a0
 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -475,6 +475,156 @@ arm_parse_arch_option_name (const arch_option *list, 
const char *optname,
   return NULL;
 }
 
+static void
+arm_print_hint_for_pacbti_option ()
+{
+  const char *s = "pac-ret[+leaf][+b-key][+bti]"
+ " | bti[+pac-ret[+leaf][+b-key]]";
+  inform (input_location, "valid arguments are: %s", s);
+}
+
+/* Progress *E to end of next token delimited by DELIMITER.
+   Cache old *E in *OE.  */
+static void
+arm_progress_next_token (const char **oe, const char **e,
+size_t *l, const char delimiter)
+{
+  *oe = *e + 1;
+  *e = strchr (*oe, delimiter);
+  *l = *e ? *e - *oe : strlen (*oe);
+}
+
+/* Parse options to -mbranch-protection.  */
+static const char*
+arm_parse_pac_ret_clause (const char *pacret, const char *optname,
+ unsigned int *pacbti)
+{
+  const char *old_end = NULL;
+  const char *end = strchr (pacret, '+');
+  size_t len = end ? end - pacret : strlen (pacret);
+  if (len == 7 && strncmp (pacret, "pac-ret", len) == 0)
+{
+  *pacbti |= 2;
+  if (end != NULL)
+   {
+ /* pac-ret+...  */
+ arm_progress_next_token (_end, , , '+');
+ if (len == 4 && strncmp (old_end, "leaf", len) == 0)
+   {
+ *pacbti |= 8;
+ if (end != NULL)
+   {
+ /* pac-ret+leaf+...  */
+ arm_progress_next_token (_end, , , '+');
+ if (len == 5 && strncmp (old_end, "b-key", len) == 0)
+   {
+ /* Clear bit for A-key.  */
+ *pacbti &= 0xfffd;
+ *pacbti |= 4;
+ /* A non-NULL end indicates its pointing to a '+'.
+Advance it to point to the next option in the string.  
*/
+ if (end != NULL)
+   end++;
+   }
+ else
+   /* This could be 'bti', leave it to caller to parse.  */
+   end = old_end;
+   }
+   }
+ else if (len == 5 && strncmp (old_end, "b-key", len) == 0)
+   {
+ /* Clear bit for A-key.  */
+ *pacbti &= 0xfffd;
+ *pacbti |= 4;
+ if (end != NULL)
+   {
+ /* pac-ret+b-key+...  */
+ arm_progress_next_token (_end, , , '+');
+ if (len == 4 && strncmp (old_end, "leaf", len) == 0)
+   {
+ *pacbti |= 8;
+ /* A non-NULL end indicates its pointing to a '+'.
+Advance it to point to the next option in the string.  
*/
+ if (end != NULL)
+   end++;
+   }
+ else
+   /* This could be 'bti', leave it to caller to parse.  */
+   end = old_end;
+   }
+   }
+ else
+   {
+ /* This could be a 'bti' option, so leave it to the caller to
+parse.  Fall through to the return.  */
+ end = old_end;
+   }
+   }
+}
+  else
+{
+  error_at (input_location, "unrecognized %s argument: %s", optname, 
pacret);
+  arm_print_hint_for_pacbti_option ();
+  return NULL;
+}
+
+  return end;
+}
+
+unsigned int
+arm_parse_pacbti_option (const char *pacbti, const char *optname, bool 
complain)
+{
+  unsigned int enable_pacbti = 0;
+  const char *end = strchr (pacbti, '+');
+  size_t len = end ? end - pacbti : strlen (pacbti);
+
+  if (strcmp (pacbti, "none") == 0)
+return 0;
+
+  if (strcmp (pacbt

[Patch 1/7, Arm, GCC] Add Armv8.1-M Mainline target feature +pacbti.

2021-10-08 Thread Tejas Belagod via Gcc-patches
Hi,

This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
This feature enables pointer signing and authentication instructions
on M-class architectures.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/Changelog:

* config/arm/arm-cpus.in: Define new feature pacbti.
* config/arm/arm.h (TARGET_HAVE_PACBTI): New.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
d0d0d0f1c7e4176fc4aa30d82394fe938b083a59..8a0e9c79682766ee2bec3fd7ba6ed67dff69dbad
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -223,6 +223,10 @@ define feature cdecp5
 define feature cdecp6
 define feature cdecp7
 
+# M-profile control flow integrity extensions (PAC/AUT/BTI).
+# Optional from Armv8.1-M Mainline.
+define feature pacbti
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -741,6 +745,7 @@ begin arch armv8.1-m.main
  option nofp remove ALL_FP
  option mve add MVE
  option mve.fp add MVE_FP
+ option pacbti add pacbti
  option cdecp0 add cdecp0
  option cdecp1 add cdecp1
  option cdecp2 add cdecp2
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
015299c15346f1bea59d70fdcb1d19545473b23b..8e6ef41f6b065217d1af3f4f1cb85b2d8fbd0dc0
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -335,6 +335,12 @@ emission of floating point pcs attributes.  */
isa_bit_mve_float) \
   && !TARGET_GENERAL_REGS_ONLY)
 
+/* Non-zero if this target supports Armv8.1-M Mainline pointer-signing
+   extension.  */
+#define TARGET_HAVE_PACBTI (arm_arch8_1m_main \
+   && bitmap_bit_p (arm_active_target.isa, \
+isa_bit_pacbti))
+
 /* MVE have few common instructions as VFP, like VLDM alias VPOP, VLDR, VSTM
alia VPUSH, VSTR and VMOV, VMSR and VMRS.  In the same manner it updates few
registers such as FPCAR, FPCCR, FPDSCR, FPSCR, MVFR0, MVFR1 and MVFR2.  All


[PATCH, AArch64] PR target/101609 - Use the correct iterator for AArch64 vector right shift pattern.

2021-08-05 Thread Tejas Belagod via Gcc-patches
Hi,

Loops containing long long shifts fail to vectorize due to the vectorizer
not being able to recognize long long right shifts. This is due to a bug
in the iterator used for the vashr and vlshr patterns in aarch64-simd.md.

Tested and bootstrapped on aarch64-linux. OK?

2021-08-05  Tejas Belagod  

gcc/ChangeLog:

PR target/101609
* config/aarch64/aarch64-simd.md (vlshr3, vashr3): Use
  the right iterator.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-shr-reg.c: New testcase.
* gcc.target/aarch64/vect-shr-reg-run.c: Likewise.


Thanks,
Tejas Belagod.
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
c5638d096fa84a27b4ea397f62cd0d05a28e7c8c..48eddf64e05afe3788abfa05141f6544a9323ea1
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1299,13 +1299,10 @@ (define_expand "vashl3"
   DONE;
 })
 
-;; Using mode VDQ_BHSI as there is no V2DImode neg!
-;; Negating individual lanes most certainly offsets the
-;; gain from vectorization.
 (define_expand "vashr3"
- [(match_operand:VDQ_BHSI 0 "register_operand")
-  (match_operand:VDQ_BHSI 1 "register_operand")
-  (match_operand:VDQ_BHSI 2 "register_operand")]
+ [(match_operand:VDQ_I 0 "register_operand")
+  (match_operand:VDQ_I 1 "register_operand")
+  (match_operand:VDQ_I 2 "register_operand")]
  "TARGET_SIMD"
 {
   rtx neg = gen_reg_rtx (mode);
@@ -1333,9 +1330,9 @@ (define_expand "aarch64_ashr_simddi"
 )
 
 (define_expand "vlshr3"
- [(match_operand:VDQ_BHSI 0 "register_operand")
-  (match_operand:VDQ_BHSI 1 "register_operand")
-  (match_operand:VDQ_BHSI 2 "register_operand")]
+ [(match_operand:VDQ_I 0 "register_operand")
+  (match_operand:VDQ_I 1 "register_operand")
+  (match_operand:VDQ_I 2 "register_operand")]
  "TARGET_SIMD"
 {
   rtx neg = gen_reg_rtx (mode);
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-shr-reg-run.c 
b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg-run.c
new file mode 100644
index 
..3190448e0936b9d5265f538304f9d20f13927339
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg-run.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -march=armv8.2-a" } */
+
+#include "vect-shr-reg.c"
+
+int
+main(void)
+{
+  int64_t a[16];
+  int64_t b[16];
+  int64_t c[17];
+
+  uint64_t ua[16];
+  uint64_t ub[16];
+  uint64_t uc[17];
+
+  int64_t res_a[16];
+  uint64_t res_ua[16];
+
+  int i;
+
+  /* Set up inputs.  */
+  for (i = 0; i < 16; i++)
+{
+  b[i] = -2;
+  c[i] = 34;
+  ub[i] = 0x;
+  uc[i] = 52;
+}
+
+  /* Set up reference values.  */
+  for (i = 0; i < 16; i++)
+{
+  res_a[i] = -1LL;
+  res_ua[i] = 0x0fffLL;
+}
+
+  /* Do the shifts.  */
+  f (ua, ub, uc);
+  g (a, b, c);
+
+  /* Compare outputs against reference values.  */
+  for (i = 0; i < 16; i++)
+{
+  if (a[i] != res_a[i])
+   __builtin_abort ();
+
+  if (ua[i] != res_ua[i])
+   __builtin_abort ();
+}
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-shr-reg.c 
b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg.c
new file mode 100644
index 
..5736dafb5a19957032e7b4bc1e90b218f52788fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a" } */
+
+#include 
+#include 
+
+#pragma GCC target "+nosve"
+
+int __attribute__((noinline))
+f(uint64_t *__restrict a, uint64_t *__restrict b, uint64_t *__restrict c)
+{
+  int i;
+
+  for (i = 0; i < 16; i++)
+a[i] = b[i] >> c[i];
+}
+
+
+int __attribute__((noinline))
+g(int64_t *__restrict a, int64_t *__restrict b, int64_t *__restrict c)
+{
+  int i;
+
+  for (i = 0; i < 16; i++)
+a[i] = b[i] >> c[i];
+}
+
+/* { dg-final { scan-assembler "neg\\tv" } } */
+/* { dg-final { scan-assembler "ushl\\tv" } } */
+/* { dg-final { scan-assembler "sshl\\tv" } } */


Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-05-16 Thread Tejas Belagod

On 11/05/16 16:46, Joseph Myers wrote:

On Wed, 11 May 2016, Tejas Belagod wrote:


AFAICS, I don't think it mandates a double-rounding behavior for double to
__fp16 conversions and I don't see a change in stand between the two versions
of ACLE on the behavior of __fp16.


It's not a change between the two versions of ACLE.  It's a change
relative to the early (pre-ACLE) __fp16 specification (or, at least, a
clarification thereto in email on 12 Aug 2008) that was used as a basis
for the original implementation of __fp16 in GCC (and that thus is what's
currently implemented by GCC and tested for in the testsuite).



Hi Joseph,

Sorry for the delay in responding.

I've had a conversation with Al and I now have some context. You're right - the 
2008 mail you are referring to is the pre-ACLE behavior. By the time the first 
draft of the first version of ACLE was reviewed by CodeSourcery(circa 2011), it 
already mandated single rounding. No published ACLE has ever allowed double 
rounding.


This meant that when the first draft of ACLE was published in 2011, its pre-ACLE 
implementations in gcc and armcc were already non-conformant, in other words, 
'bug-compatible'.


We do have plans to fix pre-ACLE behavior of fp16 to conform to current ACLE 
spec, but can't say when exactly.


Thanks,
Tejas.


Re: Re: Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-05-13 Thread Tejas Belagod

On 11/05/16 16:46, Joseph Myers wrote:

On Wed, 11 May 2016, Tejas Belagod wrote:


AFAICS, I don't think it mandates a double-rounding behavior for double to
__fp16 conversions and I don't see a change in stand between the two versions
of ACLE on the behavior of __fp16.


It's not a change between the two versions of ACLE.  It's a change
relative to the early (pre-ACLE) __fp16 specification (or, at least, a
clarification thereto in email on 12 Aug 2008) that was used as a basis
for the original implementation of __fp16 in GCC (and that thus is what's
currently implemented by GCC and tested for in the testsuite).



Hi Joseph,

I can't seem to find that email on gcc-patches circa August 2008 - which list 
was it sent to?


Thanks,
Tejas.


Re: Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-05-11 Thread Tejas Belagod

On 28/04/16 16:49, Joseph Myers wrote:

On Thu, 28 Apr 2016, Matthew Wahab wrote:


Hello,

The ARM target supports the half-precision floating point type __fp16
but does not allow its use as a function return or parameter type. This
patch removes that restriction and defines the ACLE macro
__ARM_FP16_ARGS to indicate this. The code generated for passing __fp16
values into and out of functions depends on the level of hardware
support but conforms to the AAPCS (see
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf).


The sole use of the TARGET_INVALID_PARAMETER_TYPE and
TARGET_INVALID_RETURN_TYPE hooks was to disallow __fp16 use as a function
return or parameter type.  Thus, I think this patch should completely
remove those hooks and poison them in system.h.

This patch addresses one incompatibility of the original __fp16
specification with the more recent ACLE specification and the
specification in ISO/IEC TS 18661-3 for how such types should work.
Another such incompatibility is the peculiar rule in the original
specification that conversions from double to __fp16 go via float, with
double rounding.  Do you have plans to eliminate that and move to the
single-rounding semantics that are in current specifications?



http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

Section 4.1.2 states that double to fp16 should round only once and it only 
suggests that it is done via a two step hardware instruction rather than an 
emulation library if speed is the priority as pre-ARMv8 architectures do not 
support this in hardware.


http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1.pdf

updates this paragraph to reflect ARMv8 hardware feature, but still maintains 
the suggestion of using two-step hardware instruction rather than emulation 
library if speed is priority for pre-ARMv8 architectures.


AFAICS, I don't think it mandates a double-rounding behavior for double to 
__fp16 conversions and I don't see a change in stand between the two versions of 
ACLE on the behavior of __fp16.


We could improve the ACLE spec to include a caveat that a two-step reduction 
could introduce a loss in precision which could result in incompatibility with 
ARMv8 architectures.



I note that that AAPCS revision says for __fp16, in 7.1.1 Arithmetic
Types, "In a variadic function call this will be passed as a
double-precision value.".  I haven't checked what this patch implements,
but that could be problematic, and different from what's said under 7.2,
"For variadic functions, float arguments that match the ellipsis (...) are
converted to type double.".

In TS 18661-3, _Float16 is *not* affected by default argument promotions;
only float is.  This reflects how the default conversion of float to
double is a legacy feature; note for example how in C99 and C11 float
_Imaginary is not promoted to double _Imaginary, and float _Complex is not
promoted to double _Complex.

Thus it would be better for compatibility with TS 18661-3 to pass __fp16
values to variadic functions as themselves, unpromoted.  (Formally of
course the lack of promotion is a language feature not an ABI feature; as
long as va_arg for _Float16 named works correctly, you could promote at
the ABI level and then convert back, and the only effect would be that
sNaNs get quieted, so passing a _Float16 sNaN through variable arguments
would act as a convertFormat operation instead of a copy operation.  It's
not clear that having such an ABI-level promotion is a good idea,
however.)

Now, in the context of the current implementation and current ACLE
arithmetic on __fp16 values produces float results - the operands are
promoted at the C language level.  This is different from TS 18661-3,
where _Float16 arithmetic produces results whose semantics type is
_Float16 but which, if FLT_EVAL_METHOD is 0, are evaluated with excess
range and precision to the range and precision of float.  So if __fp16 and
float are differently passed to variadic functions, you have the issue
that if the argument is an expression resulting from __fp16 arithmetic,
the way it is passed depends on whether current ACLE or TS 18661-3 are
followed.  But if the eventual aim is for __fp16 (when using the IEEE
format rather than the alternative format) to become just a typedef for
_Float16, then these issues will need to be addressed.



__fp16's compatibility with _Float16 is still under discussion internally.

Thanks,
Tejas.



Re: RE: [Ping^3] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m

2015-07-07 Thread Tejas Belagod

Ping!

On 30/04/15 10:40, Hale Wang wrote:

-Original Message-
From: Hale Wang [mailto:hale.w...@arm.com]
Sent: Monday, February 09, 2015 9:54 AM
To: Richard Earnshaw
Cc: Hale Wang; gcc-patches; Matthew Gretton-Dann
Subject: RE: [Ping^2] [PATCH, ARM, libgcc] New aeabi_idiv function for
armv6-m

Ping https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01059.html.



Ping for trunk. Is it ok for trunk now?

Thanks,
Hale

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Hale Wang
Sent: Friday, December 12, 2014 9:36 AM
To: gcc-patches
Subject: RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for
armv6- m

Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk?

-Hale


-Original Message-
From: Joey Ye [mailto:joey.ye...@gmail.com]
Sent: Thursday, November 27, 2014 10:01 AM
To: Hale Wang
Cc: gcc-patches
Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for
armv6-m

OK applying to arm/embedded-4_9-branch, though you still need
maintainer approval into trunk.

- Joey

On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang hale.w...@arm.com

wrote:

Hi,

This patch ports the aeabi_idiv routine from Linaro Cortex-Strings
(https://git.linaro.org/toolchain/cortex-strings.git), which was
contributed by ARM under Free BSD license.

The new aeabi_idiv routine is used to replace the one in
libgcc/config/arm/lib1funcs.S. This replacement happens within the
Thumb1 wrapper. The new routine is under LGPLv3 license.

The main advantage of this version is that it can improve the
performance of the aeabi_idiv function for Thumb1. This solution
will also increase the code size. So it will only be used if
__OPTIMIZE_SIZE__ is

not defined.


Make check passed for armv6-m.

OK for trunk?

Thanks,
Hale Wang

libgcc/ChangeLog:

2014-11-26  Hale Wang  hale.w...@arm.com

 * config/arm/lib1funcs.S: Add new wrapper.

===
diff --git a/libgcc/config/arm/lib1funcs.S
b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -306,34 +306,12 @@ LSYM(Lend_fde):
  #ifdef __ARM_EABI__
  .macro THUMB_LDIV0 name signed
  #if defined(__ARM_ARCH_6M__)
-   .ifc \signed, unsigned
-   cmp r0, #0
-   beq 1f
-   mov r0, #0
-   mvn r0, r0  @ 0x
-1:
-   .else
-   cmp r0, #0
-   beq 2f
-   blt 3f
+
+   push{r0, lr}
 mov r0, #0
-   mvn r0, r0
-   lsr r0, r0, #1  @ 0x7fff
-   b   2f
-3: mov r0, #0x80
-   lsl r0, r0, #24 @ 0x8000
-2:
-   .endif
-   push{r0, r1, r2}
-   ldr r0, 4f
-   adr r1, 4f
-   add r0, r1
-   str r0, [sp, #8]
-   @ We know we are not on armv4t, so pop pc is safe.
-   pop {r0, r1, pc}
-   .align  2
-4:
-   .word   __aeabi_idiv0 - 4b
+   bl  SYM(__aeabi_idiv0)
+   pop {r1, pc}
+
  #elif defined(__thumb2__)
 .syntax unified
 .ifc \signed, unsigned
@@ -927,7 +905,158 @@ LSYM(Lover7):
 add dividend, work
.endif
  LSYM(Lgot_result):
-.endm
+.endm
+
+#if defined(__prefer_thumb__)

 !defined(__OPTIMIZE_SIZE__) .macro

+BranchToDiv n, label
+   lsr curbit, dividend, \n
+   cmp curbit, divisor
+   blo \label
+.endm
+
+.macro DoDiv n
+   lsr curbit, dividend, \n
+   cmp curbit, divisor
+   bcc 1f
+   lsl curbit, divisor, \n
+   sub dividend, dividend, curbit
+
+1: adc result, result
+.endm
+
+.macro THUMB1_Div_Positive
+   mov result, #0
+   BranchToDiv #1, LSYM(Lthumb1_div1)
+   BranchToDiv #4, LSYM(Lthumb1_div4)
+   BranchToDiv #8, LSYM(Lthumb1_div8)
+   BranchToDiv #12, LSYM(Lthumb1_div12)
+   BranchToDiv #16, LSYM(Lthumb1_div16)
+LSYM(Lthumb1_div_large_positive):
+   mov result, #0xff
+   lsl divisor, divisor, #8
+   rev result, result
+   lsr curbit, dividend, #16
+   cmp curbit, divisor
+   blo 1f
+   asr result, #8
+   lsl divisor, divisor, #8
+   beq LSYM(Ldivbyzero_waypoint)
+
+1: lsr curbit, dividend, #12
+   cmp curbit, divisor
+   blo LSYM(Lthumb1_div12)
+   b   LSYM(Lthumb1_div16)
+LSYM(Lthumb1_div_loop):
+   lsr divisor, divisor, #8
+LSYM(Lthumb1_div16):
+   Dodiv   #15
+   Dodiv   #14
+   Dodiv   #13
+   Dodiv   #12
+LSYM(Lthumb1_div12):
+   Dodiv   #11
+   Dodiv   #10
+   Dodiv   #9
+   Dodiv   #8
+   bcs LSYM(Lthumb1_div_loop)
+LSYM(Lthumb1_div8):
+   Dodiv   #7
+   Dodiv   #6
+   Dodiv   #5
+LSYM(Lthumb1_div5):
+   Dodiv   #4
+LSYM(Lthumb1_div4):
+   Dodiv   #3
+LSYM(Lthumb1_div3):
+   Dodiv   #2
+LSYM(Lthumb1_div2):
+   Dodiv   #1
+LSYM(Lthumb1_div1):
+   sub 

Re: [Patch, AArch64, Obvious] Fix PR64231.

2015-02-02 Thread Tejas Belagod

On 30/01/15 13:25, Jakub Jelinek wrote:

On Fri, Jan 23, 2015 at 11:03:13AM +, Tejas Belagod wrote:


Hi,

This is an almost obvious patch to fix PR64231 as discovered by A. Pinksi
and as proposed by Jakub.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64231

Regressions happy. OK to commit?


This is ok for trunk.  We have a real bug that we need to fix, if we have
some more useful macro in the future, this can be rewritten to use that
macro together with the many other spots that would be changed for it as
well.  But blocking the fix for it doesn't sound right to me.



Thanks. Committed as r220348.

Thanks,
Tejas.


2015-01-23  Tejas Belagod  tejas.bela...@arm.com
Andrew Pinski  pins...@gcc.gnu.org
Jakub Jelinek  ja...@gcc.gnu.org

PR target/64231
* config/aarch64/aarch64.c (aarch64_classify_symbol): Fix large
integer typing for small model. Use IN_RANGE.


Jakub






Re: [[ARM/AArch64][testsuite] 03/36] Add vmax, vmin, vhadd, vhsub and vrhadd tests.

2015-01-26 Thread Tejas Belagod

On 25/01/15 21:05, Christophe Lyon wrote:

On 23 January 2015 at 14:44, Christophe Lyon christophe.l...@linaro.org wrote:

On 23 January 2015 at 12:42, Christophe Lyon christophe.l...@linaro.org wrote:

On 23 January 2015 at 11:18, Tejas Belagod tejas.bela...@arm.com wrote:

On 22/01/15 21:31, Christophe Lyon wrote:


On 22 January 2015 at 16:22, Tejas Belagod tejas.bela...@arm.com wrote:


On 22/01/15 14:28, Christophe Lyon wrote:



On 22 January 2015 at 12:19, Tejas Belagod tejas.bela...@arm.com
wrote:



On 21/01/15 15:07, Christophe Lyon wrote:




On 19 January 2015 at 17:54, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:




On 19 January 2015 at 15:43, Christophe Lyon
christophe.l...@linaro.org
wrote:




On 19 January 2015 at 14:29, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:




On 16 January 2015 at 17:52, Christophe Lyon
christophe.l...@linaro.org wrote:


OK provided, as per the previous couple, that we don;t regression
or
introduce new fails on aarch64[_be] or aarch32.





This patch shows failures on aarch64 and aarch64_be for vmax and
vmin
when the input is -NaN.
It's a corner case, and my reading of the ARM ARM is that the
result
should the same as on aarch32.
I haven't had time to look at it in more details though.
So, not OK?





They should have the same behaviour in aarch32 and aarch64. Did you
test on HW or a model?


I ran the tests on qemu for aarch32 and aarch64-linux, and on the
foundation model for aarch64*-elf.





Leave this one out until we understand why it fails. /Marcus





I've looked at this a bit more.
We have
fmaxv0.4s, v0.4s, v1.4s
where v0 is a vector of -NaN (0xffc0) and v1 is a vector of 1.

The output is still -NaN (0xffc0), while the test expects
defaultNaN (0x7fc0).



In the AArch32 execution state, Advanced SIMD FP arithmetic always uses
the
DefaultNaN setting regardless of the DN-bit value in the FPSCR. In
AArch64
execution state, result of Advanced SIMD FP arithmetic operations
depend
on
the value of the DN-bit i.e. either propagate the input NaN or generate
DefaultNaN depending on the value of DN.




Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I
can see only the latter (no diff between aarch32 and aarch64 in
FPProcessNan pseudo-code)



If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec -
under DN:

The value of this bit only controls scalar floating-point arithmetic.
Advanced SIMD arithmetic always uses the Default NaN setting, regardless
of
the value of the DN bit.

Also on page 3180 for the description of VMAX(vector FP), it says:

*  max(+0.0, -0.0) = +0.0
* If any input is a NaN, the corresponding result element is the default
NaN.



Oops I was looking at FMAX (vector) pg 936.


The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue() to
FPMax() which is on pg. 2285

// StandardFPSCRValue()
// 
FPCRType StandardFPSCRValue()
return ‘0’ : FPSCR.AHP : ‘11’

Here bit-25(FPSCR.DN) is set to 1.



So, we should get defaultNaN too on aarch64, and no need to try to
force DN to 1 in gdb?

What can be wrong?



On pg 3180, I see VMAX(FPSIMD) for A32/T32, not A64. I hope we're reading
the same document.

Regardless of the page number, if you see the pseudocode for VMAX(FPSIMD)
for AArch32, StandardFPSCRValue() (i.e. DN = 1) is passed to FPMax() which
means generate DefaultNaN() regardless.

OTOH, on pg 936, you have FMAX(vector) for A64 where FPMax() in the
pseudocode gets just FPCR.



Ok, that was my initial understanding but our discussion confused me.

And that's why I tried to force DN = 1 in gdb before single-stepping over
fmaxv0.4s, v0.4s, v1.4s

but it changed nothing :-(
Hence my question about a gdb possible bug or misuse.


Hmm... user error, I missed one bit
set $fpcr=0x200
works under gdb.


I'll try modifying the test to have it force DN=1.


Forcing DN=1 in the test makes it pass.

I am going to look at adding that cleanly to my test, and resubmit it.

Thanks, and sorry for the noise.


Here is the updated version:
- Now I set DN=1 on AArch64 in clean_results, as it is the main
initialization function.
- I removed the double negative :-)
- I removed the useless [u]int64 and poly variants

Christophe.

2015-01-25  Christophe Lyon  christophe.l...@linaro.org

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(_ARM_FPSRC): Add DN and AHP fields.
(clean_results): Force DN=1 on AArch64.
* gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vhadd.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vhsub.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmax.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmin.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: New file.



I guess you don't need the fake dependency fix for this as this is 
mostly called only once?


+  _ARM_FPSCR _afpscr_for_dn;
+  asm volatile (mrs %0,fpcr : =r

Re: [Patch, AArch64, Obvious] Fix PR64231.

2015-01-26 Thread Tejas Belagod

On 23/01/15 17:15, Jakub Jelinek wrote:

On Fri, Jan 23, 2015 at 08:48:43AM -0800, Mike Stump wrote:

On Jan 23, 2015, at 3:03 AM, Tejas Belagod tejas.bela...@arm.com wrote:

This is an almost obvious patch to fix PR64231 as discovered by A. Pinksi and 
as proposed by Jakub.


Kinda crappy code.  The macro to use here should take the number of bits as an 
int, and wether the constant is signed or not.

   FITS (x, 32, UNSIGNED)


Except that the test is testing something different.
First of all, it is closer to FITS (x, 33, SIGNED), but
the details are different, the test as is (except for the bug on 32-bit
hosts) INTVAL (x) is in not in [ -4GB + 33, 4GB - 32 ] interval (inclusive).
Why it isn't exactly not in [ -4GB, 4GB - 1 ] or similar.



The value of the offset itself is a heuristic as we cannot accurately 
say what the distance of a symbol from the ADRP will be at compile time. 
So it doesn't really matter what the exact value of range is in this case.


Thanks,
Tejas.



Re: [[ARM/AArch64][testsuite] 03/36] Add vmax, vmin, vhadd, vhsub and vrhadd tests.

2015-01-23 Thread Tejas Belagod

On 22/01/15 21:31, Christophe Lyon wrote:

On 22 January 2015 at 16:22, Tejas Belagod tejas.bela...@arm.com wrote:

On 22/01/15 14:28, Christophe Lyon wrote:


On 22 January 2015 at 12:19, Tejas Belagod tejas.bela...@arm.com wrote:


On 21/01/15 15:07, Christophe Lyon wrote:



On 19 January 2015 at 17:54, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:



On 19 January 2015 at 15:43, Christophe Lyon
christophe.l...@linaro.org
wrote:



On 19 January 2015 at 14:29, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:



On 16 January 2015 at 17:52, Christophe Lyon
christophe.l...@linaro.org wrote:


OK provided, as per the previous couple, that we don;t regression
or
introduce new fails on aarch64[_be] or aarch32.




This patch shows failures on aarch64 and aarch64_be for vmax and
vmin
when the input is -NaN.
It's a corner case, and my reading of the ARM ARM is that the result
should the same as on aarch32.
I haven't had time to look at it in more details though.
So, not OK?




They should have the same behaviour in aarch32 and aarch64. Did you
test on HW or a model?


I ran the tests on qemu for aarch32 and aarch64-linux, and on the
foundation model for aarch64*-elf.




Leave this one out until we understand why it fails. /Marcus




I've looked at this a bit more.
We have
fmaxv0.4s, v0.4s, v1.4s
where v0 is a vector of -NaN (0xffc0) and v1 is a vector of 1.

The output is still -NaN (0xffc0), while the test expects
defaultNaN (0x7fc0).



In the AArch32 execution state, Advanced SIMD FP arithmetic always uses
the
DefaultNaN setting regardless of the DN-bit value in the FPSCR. In
AArch64
execution state, result of Advanced SIMD FP arithmetic operations depend
on
the value of the DN-bit i.e. either propagate the input NaN or generate
DefaultNaN depending on the value of DN.



Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I
can see only the latter (no diff between aarch32 and aarch64 in
FPProcessNan pseudo-code)



If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec -
under DN:

The value of this bit only controls scalar floating-point arithmetic.
Advanced SIMD arithmetic always uses the Default NaN setting, regardless of
the value of the DN bit.

Also on page 3180 for the description of VMAX(vector FP), it says:

*  max(+0.0, -0.0) = +0.0
* If any input is a NaN, the corresponding result element is the default
NaN.



Oops I was looking at FMAX (vector) pg 936.


The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue() to
FPMax() which is on pg. 2285

// StandardFPSCRValue()
// 
FPCRType StandardFPSCRValue()
return ‘0’ : FPSCR.AHP : ‘11’

Here bit-25(FPSCR.DN) is set to 1.



So, we should get defaultNaN too on aarch64, and no need to try to
force DN to 1 in gdb?

What can be wrong?



On pg 3180, I see VMAX(FPSIMD) for A32/T32, not A64. I hope we're 
reading the same document.


Regardless of the page number, if you see the pseudocode for 
VMAX(FPSIMD) for AArch32, StandardFPSCRValue() (i.e. DN = 1) is passed 
to FPMax() which means generate DefaultNaN() regardless.


OTOH, on pg 936, you have FMAX(vector) for A64 where FPMax() in the 
pseudocode gets just FPCR.



Thanks,
Tejas.


Thanks,
Tejas.



If you're running your test in the AArch64 execution state, you'd want to
define the DN bit and modify the expected results accordingly or have the
test poll at runtime what the DN-bit is set to and check expected results
dynamically.


Makes sense, I hadn't noticed the different aarch64 spec here.


I think the test already has expected behaviour for AArch32 execution
state
by expecting DefaultNaN regardless.


Yes.


I have executed the test under GDB on AArch64 HW, and noticed that fpcr
was 0.
I forced it to have DN==1:
set $fpcr=0x100
but this didn't change the result.

Does setting fpcr.dn under gdb actually work?



It should. Possibly a bug, patches welcome :-).


:-)











[Patch, AArch64, Obvious] Fix PR64231.

2015-01-23 Thread Tejas Belagod


Hi,

This is an almost obvious patch to fix PR64231 as discovered by A. 
Pinksi and as proposed by Jakub.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64231

Regressions happy. OK to commit?

Thanks,
Tejas.

ChangeLog:

gcc/

2015-01-23  Tejas Belagod  tejas.bela...@arm.com
Andrew Pinski  pins...@gcc.gnu.org
Jakub Jelinek  ja...@gcc.gnu.org

PR target/64231
* config/aarch64/aarch64.c (aarch64_classify_symbol): Fix large
integer typing for small model. Use IN_RANGE.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index dd49fcd..b790bc6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7083,8 +7083,8 @@ aarch64_classify_symbol (rtx x, rtx offset,
  /* Same reasoning as the tiny code model, but the offset cap here is
 4G.  */
  if (SYMBOL_REF_WEAK (x)
- || INTVAL (offset)  (HOST_WIDE_INT) -4294967263
- || INTVAL (offset)  (HOST_WIDE_INT) 4294967264)
+ || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
+   HOST_WIDE_INT_C (4294967264)))
return SYMBOL_FORCE_TO_MEM;
  return SYMBOL_SMALL_ABSOLUTE;
 


Re: [[ARM/AArch64][testsuite] 03/36] Add vmax, vmin, vhadd, vhsub and vrhadd tests.

2015-01-22 Thread Tejas Belagod

On 21/01/15 15:07, Christophe Lyon wrote:

On 19 January 2015 at 17:54, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:

On 19 January 2015 at 15:43, Christophe Lyon christophe.l...@linaro.org wrote:

On 19 January 2015 at 14:29, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:

On 16 January 2015 at 17:52, Christophe Lyon christophe.l...@linaro.org wrote:


OK provided, as per the previous couple, that we don;t regression or
introduce new fails on aarch64[_be] or aarch32.


This patch shows failures on aarch64 and aarch64_be for vmax and vmin
when the input is -NaN.
It's a corner case, and my reading of the ARM ARM is that the result
should the same as on aarch32.
I haven't had time to look at it in more details though.
So, not OK?


They should have the same behaviour in aarch32 and aarch64. Did you
test on HW or a model?


I ran the tests on qemu for aarch32 and aarch64-linux, and on the
foundation model for aarch64*-elf.


Leave this one out until we understand why it fails. /Marcus


I've looked at this a bit more.
We have
fmaxv0.4s, v0.4s, v1.4s
where v0 is a vector of -NaN (0xffc0) and v1 is a vector of 1.

The output is still -NaN (0xffc0), while the test expects
defaultNaN (0x7fc0).



In the AArch32 execution state, Advanced SIMD FP arithmetic always uses 
the DefaultNaN setting regardless of the DN-bit value in the FPSCR. In 
AArch64 execution state, result of Advanced SIMD FP arithmetic 
operations depend on the value of the DN-bit i.e. either propagate the 
input NaN or generate DefaultNaN depending on the value of DN.


If you're running your test in the AArch64 execution state, you'd want 
to define the DN bit and modify the expected results accordingly or have 
the test poll at runtime what the DN-bit is set to and check expected 
results dynamically.


I think the test already has expected behaviour for AArch32 execution 
state by expecting DefaultNaN regardless.



I have executed the test under GDB on AArch64 HW, and noticed that fpcr was 0.
I forced it to have DN==1:
set $fpcr=0x100
but this didn't change the result.

Does setting fpcr.dn under gdb actually work?



It should. Possibly a bug, patches welcome :-).

Tejas.




Re: [[ARM/AArch64][testsuite] 05/36] Add vldX_dup test.

2015-01-22 Thread Tejas Belagod






LGTM.


Thanks, I should mention that this test fails on aarch64_be, because
of pending Alan's patches.


A few big-endian fixes were checked in last night. Do you mean this?

r219957 | rsandifo | 2015-01-21 17:53:04 + (Wed, 21 Jan 2015) | 6 lines

gcc/
2015-01-25  Alan Hayward  alan.hayw...@arm.com

* rtlanal.c (subreg_get_info): Exit early for simple and common
cases.


Thanks,
Tejas.



Re: [[ARM/AArch64][testsuite] 03/36] Add vmax, vmin, vhadd, vhsub and vrhadd tests.

2015-01-22 Thread Tejas Belagod

On 22/01/15 14:28, Christophe Lyon wrote:

On 22 January 2015 at 12:19, Tejas Belagod tejas.bela...@arm.com wrote:

On 21/01/15 15:07, Christophe Lyon wrote:


On 19 January 2015 at 17:54, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:


On 19 January 2015 at 15:43, Christophe Lyon christophe.l...@linaro.org
wrote:


On 19 January 2015 at 14:29, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:


On 16 January 2015 at 17:52, Christophe Lyon
christophe.l...@linaro.org wrote:


OK provided, as per the previous couple, that we don;t regression or
introduce new fails on aarch64[_be] or aarch32.



This patch shows failures on aarch64 and aarch64_be for vmax and vmin
when the input is -NaN.
It's a corner case, and my reading of the ARM ARM is that the result
should the same as on aarch32.
I haven't had time to look at it in more details though.
So, not OK?



They should have the same behaviour in aarch32 and aarch64. Did you
test on HW or a model?


I ran the tests on qemu for aarch32 and aarch64-linux, and on the
foundation model for aarch64*-elf.



Leave this one out until we understand why it fails. /Marcus



I've looked at this a bit more.
We have
fmaxv0.4s, v0.4s, v1.4s
where v0 is a vector of -NaN (0xffc0) and v1 is a vector of 1.

The output is still -NaN (0xffc0), while the test expects
defaultNaN (0x7fc0).



In the AArch32 execution state, Advanced SIMD FP arithmetic always uses the
DefaultNaN setting regardless of the DN-bit value in the FPSCR. In AArch64
execution state, result of Advanced SIMD FP arithmetic operations depend on
the value of the DN-bit i.e. either propagate the input NaN or generate
DefaultNaN depending on the value of DN.


Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I
can see only the latter (no diff between aarch32 and aarch64 in
FPProcessNan pseudo-code)



If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec - 
under DN:


The value of this bit only controls scalar floating-point arithmetic. 
Advanced SIMD arithmetic always uses the Default NaN setting, regardless 
of the value of the DN bit.


Also on page 3180 for the description of VMAX(vector FP), it says:

*  max(+0.0, -0.0) = +0.0
* If any input is a NaN, the corresponding result element is the default 
NaN.



The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue() to 
FPMax() which is on pg. 2285


// StandardFPSCRValue()
// 
FPCRType StandardFPSCRValue()
return ‘0’ : FPSCR.AHP : ‘11’

Here bit-25(FPSCR.DN) is set to 1.

Thanks,
Tejas.


If you're running your test in the AArch64 execution state, you'd want to
define the DN bit and modify the expected results accordingly or have the
test poll at runtime what the DN-bit is set to and check expected results
dynamically.

Makes sense, I hadn't noticed the different aarch64 spec here.


I think the test already has expected behaviour for AArch32 execution state
by expecting DefaultNaN regardless.

Yes.


I have executed the test under GDB on AArch64 HW, and noticed that fpcr
was 0.
I forced it to have DN==1:
set $fpcr=0x100
but this didn't change the result.

Does setting fpcr.dn under gdb actually work?



It should. Possibly a bug, patches welcome :-).


:-)






Re: [PING] [PATCH] [AArch64, NEON] Add vfms_n_f32, vfmsq_n_f32 and vfmsq_n_f64 specified by the ACLE

2015-01-21 Thread Tejas Belagod

On 21/01/15 09:22, Yangfei (Felix) wrote:

This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01008.html
I updated the testcase adding test for vfmsq_n_f64 intrinsic.
Test OK for both aarch64-linux-gnu and aarch64_be-linux-gnu-gcc.
OK for the trunk?  Thanks.


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 219845)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2015-01-21  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/arm_neon.h (vfms_n_f32, vfmsq_n_f32, vfmsq_n_f64): New
+   intrinsics.
+


Hi Felix,

Thanks for the the patch. It LGTM apart from one point - you seem to 
have missed out vfms_n_f64?


Thanks,
Tejas.


  2015-01-19  Jiong Wang  jiong.w...@arm.com
Andrew Pinski  apin...@cavium.com

Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_n.c
===
--- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_n.c
(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_n.c
(revision 0)
@@ -0,0 +1,74 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#if defined(__aarch64__)  defined(__ARM_FEATURE_FMA)
+/* Expected results.  */
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x4438ca3d, 0x44390a3d };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x44869eb8, 0x4486beb8, 0x4486deb8, 
0x4486feb8 };
+VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0x408906e1532b8520, 
0x40890ee1532b8520 };
+
+#define VECT_VAR_ASSIGN(S,Q,T1,W) S##Q##_##T1##W
+#define ASSIGN(S, Q, T, W, V) T##W##_t S##Q##_##T##W = V
+#define TEST_MSG VFMS_N/VFMSQ_N
+
+void exec_vfms_n (void)
+{
+  /* Basic test: v4=vfms_n(v1,v2), then store the result.  */
+#define TEST_VFMS_N(Q, T1, T2, W, N)   \
+  VECT_VAR(vector_res, T1, W, N) = \
+vfms##Q##_n_##T2##W(VECT_VAR(vector1, T1, W, N),   \
+   VECT_VAR(vector2, T1, W, N),\
+   VECT_VAR_ASSIGN(scalar, Q, T1, W)); \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define CHECK_VFMS_N_RESULTS(test_name,comment)
\
+  {\
+CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment);  \
+CHECK_FP(test_name, float, 32, 4, PRIx32, expected, comment);  \
+CHECK_FP(test_name, float, 64, 2, PRIx64, expected, comment);  \
+  }
+
+#define DECL_VFMS_N_VAR(VAR)   \
+  DECL_VARIABLE(VAR, float, 32, 2);\
+  DECL_VARIABLE(VAR, float, 32, 4);\
+  DECL_VARIABLE(VAR, float, 64, 2);
+
+  DECL_VFMS_N_VAR(vector1);
+  DECL_VFMS_N_VAR(vector2);
+  DECL_VFMS_N_VAR(vector3);
+  DECL_VFMS_N_VAR(vector_res);
+
+  clean_results ();
+
+  /* Initialize input vector1 from buffer.  */
+  VLOAD(vector1, buffer, , float, f, 32, 2);
+  VLOAD(vector1, buffer, q, float, f, 32, 4);
+  VLOAD(vector1, buffer, q, float, f, 64, 2);
+
+  /* Choose init value arbitrarily.  */
+  VDUP(vector2, , float, f, 32, 2, -9.3f);
+  VDUP(vector2, q, float, f, 32, 4, -29.7f);
+  VDUP(vector2, q, float, f, 64, 2, -15.8f);
+
+  /* Choose init value arbitrarily.  */
+  ASSIGN(scalar, , float, 32, 81.2f);
+  ASSIGN(scalar, q, float, 32, 36.8f);
+  ASSIGN(scalar, q, float, 64, 51.7f);
+
+  /* Execute the tests.  */
+  TEST_VFMS_N(, float, f, 32, 2);
+  TEST_VFMS_N(q, float, f, 32, 4);
+  TEST_VFMS_N(q, float, f, 64, 2);
+
+  CHECK_VFMS_N_RESULTS (TEST_MSG, );
+}
+#endif
+
+int main (void)
+{
+#if defined(__aarch64__)  defined(__ARM_FEATURE_FMA)
+  exec_vfms_n ();
+#endif
+  return 0;
+}
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 219845)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,7 @@
+2015-01-21  Felix Yang  felix.y...@huawei.com
+
+   * gcc.target/aarch64/advsimd-intrinsics/vfms_n.c: New test.
+
  2015-01-19  Felix Yang  felix.y...@huawei.com
Haijian Zhang  z.zhanghaij...@huawei.com

Index: gcc/config/aarch64/arm_neon.h
===
--- gcc/config/aarch64/arm_neon.h   (revision 219845)
+++ gcc/config/aarch64/arm_neon.h   (working copy)
@@ -14774,7 +14774,24 @@ vfmsq_f64 (float64x2_t __a, float64x2_t __b, float
return __builtin_aarch64_fmav2df (-__b, __c, __a);
  }

+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vfms_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
+{
+  return __builtin_aarch64_fmav2sf (-__b, vdup_n_f32 (__c), __a);
+}

+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vfmsq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
+{
+  return __builtin_aarch64_fmav4sf 

Re: [Patch, AArch64, testsuite] PR63971: Revert test_frame_* patch.

2015-01-19 Thread Tejas Belagod

On 19/01/15 08:53, Andrew Pinski wrote:

On Thu, Jan 15, 2015 at 8:18 AM, Mike Stump mikest...@comcast.net wrote:

On Jan 14, 2015, at 3:50 AM, Tejas Belagod tejas.bela...@arm.com wrote:

As agreed here (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63971), please can 
I reverse Andrew's patch 
out(https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02916.html)?


Ok.

Unless someone objects to a reversion like this, when the author of a patch 
says it should be reverted…  that’s all the approval it needs, though, people 
can always ask for a review for any reason they want.


And now this reversal needs to be reverted.  Because the conditional
compare optimization went back in.  I figured the optimization would
go back in and that is why I did not act on reverting my patch that
fast.  The conditional compare patch went in a day after this reversal
went in ;).



Yes, now committed r219838 as obvious.

Thanks,
Tejas.
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_1.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_1.c
index 5b3c0ab..b270bae 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_1.c
@@ -14,6 +14,6 @@ t_frame_pattern (test1, 200, )
 t_frame_run (test1)
 
 /* { dg-final { scan-assembler-times str\tx30, \\\[sp, -\[0-9\]+\\\]! 2 } } 
*/
-/* { dg-final { scan-assembler-times ldr\tx30, \\\[sp\\\], \[0-9\]+ 3 } } */
+/* { dg-final { scan-assembler-times ldr\tx30, \\\[sp\\\], \[0-9\]+ 2 } } */
 
 /* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_2.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_2.c
index 6ec4088..59a089c 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_2.c
@@ -15,6 +15,6 @@ t_frame_run (test2)
 
 
 /* { dg-final { scan-assembler-times stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]! 1 
} } */
-/* { dg-final { scan-assembler-times ldp\tx19, x30, \\\[sp\\\], \[0-9\]+ 2 } 
} */
+/* { dg-final { scan-assembler-times ldp\tx19, x30, \\\[sp\\\], \[0-9\]+ 1 } 
} */
 
 /* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_4.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_4.c
index ebfb290..d717862 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_4.c
@@ -14,6 +14,6 @@ t_frame_pattern (test4, 400, x19)
 t_frame_run (test4)
 
 /* { dg-final { scan-assembler-times stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]! 1 
} } */
-/* { dg-final { scan-assembler-times ldp\tx19, x30, \\\[sp\\\], \[0-9\]+ 2 } 
} */
+/* { dg-final { scan-assembler-times ldp\tx19, x30, \\\[sp\\\], \[0-9\]+ 1 } 
} */
 
 /* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_6.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_6.c
index b5ea7ee..b66ce09 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_6.c
@@ -15,6 +15,6 @@ t_frame_pattern (test6, 700, )
 t_frame_run (test6)
 
 /* { dg-final { scan-assembler-times str\tx30, \\\[sp, -\[0-9\]+\\\]! 2 } } 
*/
-/* { dg-final { scan-assembler-times ldr\tx30, \\\[sp\\\], \[0-9\]+ 3 } } */
+/* { dg-final { scan-assembler-times ldr\tx30, \\\[sp\\\], \[0-9\]+ 2 } } */
 
 /* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_7.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_7.c
index daa1f42..22576c4 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_7.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_7.c
@@ -15,6 +15,6 @@ t_frame_pattern (test7, 700, x19)
 t_frame_run (test7)
 
 /* { dg-final { scan-assembler-times stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]! 1 
} } */
-/* { dg-final { scan-assembler-times ldp\tx19, x30, \\\[sp\\\], \[0-9\]+ 2 } 
} */
+/* { dg-final { scan-assembler-times ldp\tx19, x30, \\\[sp\\\], \[0-9\]+ 1 } 
} */
 
 /* { dg-final { cleanup-saved-temps } } */


Re: [PATCH] [AArch64, NEON] Improve vpmaxX vpminX intrinsics

2015-01-16 Thread Tejas Belagod

On 14/01/15 07:09, Yangfei (Felix) wrote:

On 09/12/14 08:17, Yangfei (Felix) wrote:

On 28 November 2014 at 09:23, Yangfei (Felix) felix.y...@huawei.com

wrote:

Hi,
This patch converts vpmaxX  vpminX intrinsics to use builtin
functions

instead of the previous inline assembly syntax.

Regtested with aarch64-linux-gnu on QEMU.  Also passed the
glorious

testsuite of Christophe Lyon.

OK for the trunk?


Hi Felix,   We know from experience that the advsimd intrinsics tend
to be fragile for big endian and in general it is fairly easy to
break the big endian case.  For these advsimd improvements that you
are working on (that we very much appreciate) it is important to run
both little endian and big endian regressions.

Thanks
/Marcus



Okay.  Any plan for the advsimd big-endian improvement?
I rebased this patch over Alan Lawrance's patch:
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html
No regressions for aarch64_be-linux-gnu target too.  OK for the thunk?


Index: gcc/ChangeLog


=
==

--- gcc/ChangeLog   (revision 218464)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,18 @@
+2014-12-09  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/aarch64-simd.md

(aarch64_maxmin_unspmode): New

+   pattern.
+   * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
+   uminp, smax_nanp, smin_nanp): New builtins.
+   * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
+   vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16,

vpmaxq_s32,

+   vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32,

vpmaxq_f64,

+   vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32,

vpmaxnmq_f64,

+   vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32,

vpmin_u8,

+   vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32,

vpminq_u8,

+   vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64,

vpminqd_f64,

+   vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64,
+ vpminnmqd_f64,
+




   __extension__ static __inline float32x2_t __attribute__
((__always_inline__))
Index: gcc/config/aarch64/aarch64-simd.md


=
==

--- gcc/config/aarch64/aarch64-simd.md  (revision 218464)
+++ gcc/config/aarch64/aarch64-simd.md  (working copy)
@@ -1017,6 +1017,28 @@
 DONE;
   })

+;; Pairwise Integer Max/Min operations.
+(define_insn aarch64_maxmin_unspmode
+ [(set (match_operand:VDQ_BHSI 0 register_operand =w)
+   (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1

register_operand w)

+(match_operand:VDQ_BHSI 2 register_operand

w)]

+   MAXMINV))]
+ TARGET_SIMD
+ maxmin_uns_opp\t%0.Vtype, %1.Vtype, %2.Vtype
+  [(set_attr type neon_minmaxq)]
+)
+


Hi Felix,

Sorry for the delay in getting back to you on this.

If you've rolled aarch64_reduc_maxmin_uns_internalv2si into the above
pattern, do you still need it? For all its call points, just point them to
aarch64_maxmin_unspmode?

Thanks,
Tejas.




Hello Tejas,

   I didn't do this yet.
   Currently the aarch64_reduc_maxmin_uns_internalv2si is only called by 
reduc_maxmin_uns_scal_mode.
   I find it kind of trouble to handle this due to the use of iterators in the 
caller pattern.
   Are you going to rework this part?



You're right. Nevermind. That restructuring, if we choose to do it, is 
another patch. This patch looks good(but I can't approve it).


Thanks,
Tejas.



Re: [[ARM/AArch64][testsuite] 01/36] Add explicit dependency on Neon Cumulative Saturation flag (QC).

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:17, Christophe Lyon wrote:

__set_neon_cumulative_sat() modifies the contents on the QC flag, and
some intrinsics do so too: this patch adds the explicit dependency on
the asm statement, to avoid code reordering or removal.

When writing QC, the asm statement now has a fake input dependency,
which is the output of the intrinsic being tested. Modifying the
__set_neon_cumulative_sat macro is necessary, to be able to accept all
the possible input types.

Update the generic code in unary_sat_op.inc and binary_sat_op.inc
accordingly.

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(Set_Neon_Cumulative_Sat): Add parameter.
(__set_neon_cumulative_sat): Support new parameter.
* gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc
(TEST_BINARY_SAT_OP1): Call Set_Neon_Cumulative_Sat with new
argument.
* gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc
(TEST_UNARY_SAT_OP1): Call Set_Neon_Cumulative_Sat with new
argument.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 8ea1f26..6464c66 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -255,7 +255,11 @@ typedef union {
  #endif /* __ORDER_BIG_ENDIAN__ */

  #define Neon_Cumulative_Sat  __read_neon_cumulative_sat()
-#define Set_Neon_Cumulative_Sat(x)  __set_neon_cumulative_sat((x))
+/* We need a fake dependency to ensure correct ordering of asm
+   statements to preset the QC flag value, and Neon operators writing
+   to QC. */
+#define Set_Neon_Cumulative_Sat(x, depend) \
+  __set_neon_cumulative_sat((x), (depend))

  #if defined(__aarch64__)
  static volatile int __read_neon_cumulative_sat (void) {
@@ -263,13 +267,12 @@ static volatile int __read_neon_cumulative_sat (void) {
  asm volatile (mrs %0,fpsr : =r (_afpscr_for_qc));
  return _afpscr_for_qc.b.QC;
  }
-static void __set_neon_cumulative_sat (int x) {
-_ARM_FPSCR _afpscr_for_qc;
-asm volatile (mrs %0,fpsr : =r (_afpscr_for_qc));
-_afpscr_for_qc.b.QC = x;
-asm volatile (msr fpsr,%0 : : r (_afpscr_for_qc));
-return;
-}
+#define __set_neon_cumulative_sat(x, depend) { \
+_ARM_FPSCR _afpscr_for_qc; \
+asm volatile (mrs %0,fpsr : =r (_afpscr_for_qc));  \
+_afpscr_for_qc.b.QC = x;   \
+asm volatile (msr fpsr,%1 : =X (depend) : r (_afpscr_for_qc)); \
+  }
  #else
  static volatile int __read_neon_cumulative_sat (void) {
  _ARM_FPSCR _afpscr_for_qc;
@@ -277,13 +280,12 @@ static volatile int __read_neon_cumulative_sat (void) {
  return _afpscr_for_qc.b.QC;
  }

-static void __set_neon_cumulative_sat (int x) {
-_ARM_FPSCR _afpscr_for_qc;
-asm volatile (vmrs %0,fpscr : =r (_afpscr_for_qc));
-_afpscr_for_qc.b.QC = x;
-asm volatile (vmsr fpscr,%0 : : r (_afpscr_for_qc));
-return;
-}
+#define __set_neon_cumulative_sat(x, depend) { \
+_ARM_FPSCR _afpscr_for_qc; \
+asm volatile (vmrs %0,fpscr : =r (_afpscr_for_qc));\
+_afpscr_for_qc.b.QC = x;   \
+asm volatile (vmsr fpscr,%1 : =X (depend) : r (_afpscr_for_qc)); \
+  }
  #endif

  /* Declare expected cumulative saturation results, one for each
diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc
index 35d7701..c09a468 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc
@@ -18,7 +18,7 @@ void FNNAME (INSN_NAME) (void)
/* vector_res = OP(vector1,vector2), then store the result.  */

  #define TEST_BINARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, 
CMT) \
-  Set_Neon_Cumulative_Sat(0);  \
+  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W, N));  \
VECT_VAR(vector_res, T1, W, N) =\
  INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),\
  VECT_VAR(vector2, T1, W, N)); \
diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc
index 3f6d984..0da1426 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc
@@ -17,7 +17,7 @@ void FNNAME (INSN_NAME) (void)
  {
/* y=OP(x), then store the result.  */
  #define TEST_UNARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, 
CMT) \
-  

Re: [[ARM/AArch64][testsuite] 02/36] Be more verbose, and actually confirm that a test was checked.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (CHECK):
Add trace.
(CHECK_FP): Likewise.
(CHECK_CUMULATIVE_SAT): Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 6464c66..2730a66 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -79,6 +79,7 @@ extern size_t strlen(const char *);
  abort();  \
}   \
}   
\
+fprintf(stderr, CHECKED %s\n, MSG);\
}

  /* Floating-point variant.  */
@@ -107,6 +108,7 @@ extern size_t strlen(const char *);
  abort();  \
}   \
}   
\
+fprintf(stderr, CHECKED %s\n, MSG);\
}

  /* Clean buffer with a non-zero pattern to help diagnose buffer
@@ -323,6 +325,7 @@ extern int VECT_VAR(expected_cumulative_sat, uint, 64, 2);
  strlen(COMMENT)  0 ?   COMMENT : );   \
abort();
\
  } \
+fprintf(stderr, CHECKED CUMULATIVE SAT %s\n, MSG); \
}

  #define CHECK_CUMULATIVE_SAT_NAMED(test_name,EXPECTED,comment)
\



Looks OK to me(but I can't approve it).

Tejas.



Re: [[ARM/AArch64][testsuite] 03/36] Add vmax, vmin, vhadd, vhsub and vrhadd tests.

2015-01-16 Thread Tejas Belagod

+#ifndef NO_FLOAT_VARIANT
+  VLOAD(vector, buffer, , float, f, 32, 2);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+#endif




+#ifndef NO_FLOAT_VARIANT
+  VDUP(vector2, , float, f, 32, 2, -15.5f);
+  VDUP(vector2, q, float, f, 32, 4, -14.5f);
+#endif
+
+#ifndef NO_FLOAT_VARIANT
+#define FLOAT_VARIANT(MACRO, VAR)  \
+  MACRO(VAR, , float, f, 32, 2);   \
+  MACRO(VAR, q, float, f, 32, 4)
+#else
+#define FLOAT_VARIANT(MACRO, VAR)
+#endif


Double negative! :-) Probably easier on the reader to avoid it, but your 
call.



diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c
new file mode 100644
index 000..2591b16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c
@@ -0,0 +1,64 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vmax
+#define TEST_MSG VMAX/VMAXQ
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+  0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+   0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff1, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc178, 0xc170 };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0xf4, 0xf4, 0xf4, 0xf4,
+   0xf4, 0xf5, 0xf6, 0xf7,
+   0xf8, 0xf9, 0xfa, 0xfb,
+   0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff3, 0xfff3, 0xfff3, 0xfff3,
+   0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xfff1, 0xfff1,
+   0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x,
+   0x };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf9, 0xf9, 0xf9, 0xf9,
+0xf9, 0xf9, 0xf9, 0xf9,
+0xf9, 0xf9, 0xfa, 0xfb,
+0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff3,
+0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfff1, 0xfff1,
+0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x,
+0x };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc168, 0xc168,
+  0xc160, 0xc150 };
+
+/* Expected results with special FP values.  */
+VECT_VAR_DECL(expected_nan,hfloat,32,4) [] = { 0x7fc0, 0x7fc0,
+  0x7fc0, 0x7fc0 };
+VECT_VAR_DECL(expected_mnan,hfloat,32,4) [] = { 0x7fc0, 0x7fc0,
+   0x7fc0, 0x7fc0 };
+VECT_VAR_DECL(expected_inf,hfloat,32,4) [] = { 0x7f80, 0x7f80,
+  0x7f80, 0x7f80 };
+VECT_VAR_DECL(expected_minf,hfloat,32,4) [] = { 0x3f80, 0x3f80,
+   0x3f80, 0x3f80 };
+VECT_VAR_DECL(expected_zero1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_zero2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+#include binary_op_no64.inc
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c
new file mode 100644
index 000..2b5e87c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c
@@ -0,0 +1,66 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+

Re: [[ARM/AArch64][testsuite] 04/36] Add vld1_lane tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c
new file mode 100644
index 000..168cf5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c
@@ -0,0 +1,129 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+  0xaa, 0xaa, 0xf0, 0xaa };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x, 0x, 0x, 0xfff0 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0xfff0 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+   0xaa, 0xaa, 0xaa, 0xf0 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0xfff0 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0xfff0 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+   0xaa, 0xaa, 0xaa, 0xf0 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0xfff0 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0xc180 };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+   0xaa, 0xaa, 0xaa, 0xaa,
+   0xaa, 0xaa, 0xaa, 0xaa,
+   0xaa, 0xaa, 0xaa, 0xf0 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0x, 0x, 0x, 0x,
+   0x, 0xfff0, 0x, 0x };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x, 0x,
+   0xfff0, 0x };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x,
+   0xfff0 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa,
+0xf0, 0xaa, 0xaa, 0xaa };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0xfff0, 0x };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x, 0x,
+0xfff0, 0x };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfff0,
+0x };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa,
+0xaa, 0xaa, 0xaa, 0xaa,
+0xf0, 0xaa, 0xaa, 0xaa };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0xfff0, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0xc180, 0x };
+
+#define TEST_MSG VLD1_LANE/VLD1_LANEQ
+void exec_vld1_lane (void)
+{
+  /* Fill vector_src with 0xAA, then load 1 lane.  */
+#define TEST_VLD1_LANE(Q, T1, T2, W, N, L) \
+  memset (VECT_VAR(buffer_src, T1, W, N), 0xAA, W/8*N);
\
+  VECT_VAR(vector_src, T1, W, N) = \
+vld1##Q##_##T2##W(VECT_VAR(buffer_src, T1, W, N)); \
+  VECT_VAR(vector, T1, W, N) = \
+vld1##Q##_lane_##T2##W(VECT_VAR(buffer, T1, W, N), \
+  VECT_VAR(vector_src, T1, W, N), L);  \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector, T1, W, N))
+
+  DECL_VARIABLE_ALL_VARIANTS(vector);
+  DECL_VARIABLE_ALL_VARIANTS(vector_src);
+
+  ARRAY(buffer_src, int, 8, 8);
+  ARRAY(buffer_src, int, 16, 4);
+  ARRAY(buffer_src, int, 32, 2);
+  ARRAY(buffer_src, int, 64, 1);
+  ARRAY(buffer_src, uint, 8, 8);
+  ARRAY(buffer_src, uint, 16, 4);
+  ARRAY(buffer_src, uint, 32, 2);
+  ARRAY(buffer_src, uint, 64, 1);
+  ARRAY(buffer_src, poly, 8, 8);
+  ARRAY(buffer_src, poly, 16, 4);
+  ARRAY(buffer_src, float, 32, 2);
+
+  ARRAY(buffer_src, int, 8, 16);
+  ARRAY(buffer_src, int, 16, 8);
+  ARRAY(buffer_src, int, 32, 4);
+  ARRAY(buffer_src, int, 64, 2);
+  ARRAY(buffer_src, uint, 8, 16);
+  ARRAY(buffer_src, uint, 16, 8);
+  ARRAY(buffer_src, uint, 32, 4);
+  ARRAY(buffer_src, uint, 64, 2);
+  ARRAY(buffer_src, poly, 8, 16);
+  ARRAY(buffer_src, poly, 16, 8);
+  ARRAY(buffer_src, float, 32, 4);
+
+  clean_results ();
+
+  /* Choose lane arbitrarily.  */
+  TEST_VLD1_LANE(, 

Re: [[ARM/AArch64][testsuite] 05/36] Add vldX_dup test.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:


 * gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c
new file mode 100644
index 000..53cd8f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c
@@ -0,0 +1,671 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+
+/* vld2_dup/chunk 0.  */
+VECT_VAR_DECL(expected_vld2_0,int,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
+  0xf0, 0xf1, 0xf0, 0xf1 };
+VECT_VAR_DECL(expected_vld2_0,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff0, 0xfff1 
};
+VECT_VAR_DECL(expected_vld2_0,int,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_0,int,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected_vld2_0,uint,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
+   0xf0, 0xf1, 0xf0, 0xf1 };
+VECT_VAR_DECL(expected_vld2_0,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff0, 0xfff1 
};
+VECT_VAR_DECL(expected_vld2_0,uint,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_0,uint,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected_vld2_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
+   0xf0, 0xf1, 0xf0, 0xf1 };
+VECT_VAR_DECL(expected_vld2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff0, 0xfff1 
};
+VECT_VAR_DECL(expected_vld2_0,hfloat,32,2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL(expected_vld2_0,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_vld2_0,int,16,8) [] = { 0x, 0x, 0x, 0x,
+   0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,int,32,4) [] = { 0x, 0x,
+   0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,int,64,2) [] = { 0x,
+   0x };
+VECT_VAR_DECL(expected_vld2_0,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_vld2_0,uint,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,uint,32,4) [] = { 0x, 0x,
+0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,uint,64,2) [] = { 0x,
+0x };
+VECT_VAR_DECL(expected_vld2_0,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_vld2_0,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+
+/* vld2_dup/chunk 1.  */
+VECT_VAR_DECL(expected_vld2_1,int,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
+ 0xf0, 0xf1, 0xf0, 0xf1 };
+VECT_VAR_DECL(expected_vld2_1,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff0, 0xfff1 
};
+VECT_VAR_DECL(expected_vld2_1,int,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,int,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,uint,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
+  0xf0, 0xf1, 0xf0, 0xf1 };
+VECT_VAR_DECL(expected_vld2_1,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff0, 0xfff1 
};
+VECT_VAR_DECL(expected_vld2_1,uint,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,uint,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,poly,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
+  0xf0, 0xf1, 0xf0, 0xf1 };
+VECT_VAR_DECL(expected_vld2_1,poly,16,4) [] = { 0xfff0, 0xfff1,
+   0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,hfloat,32,2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL(expected_vld2_1,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_vld2_1,int,16,8) [] = { 0x, 0x, 

Re: [[ARM/AArch64][testsuite] 06/36] Add vmla and vmls tests.

2015-01-16 Thread Tejas Belagod

+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };


No poly ops for vmlx.


+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x45f0ae15, 0x45f0b615,
+  0x45f0be15, 0x45f0c615 };
+


These expected results are calculated using chained(as opposed to fused) 
float MACs, right?


Otherwise, LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 07/36] Add vmla_lane and vmls_lane tests.

2015-01-16 Thread Tejas Belagod

+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };


No poly ops for vmlx_lane.

Otherwise, LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 08/36] Add vtrn tests. Refactor vzup and vzip tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:


 * gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vtrn.c: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Use code from
 vshuffle.inc.
 * gcc.target/aarch64/advsimd-intrinsics/vzip.c: Use code from
 vshuffle.inc.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
new file mode 100644
index 000..928f338
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
@@ -0,0 +1,139 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* In this case, output variables are arrays of vectors.  */
+#define DECL_VSHUFFLE(T1, W, N)
\
+  VECT_ARRAY_TYPE(T1, W, N, 2) VECT_ARRAY_VAR(result_vec, T1, W, N, 2);
\
+  VECT_VAR_DECL(result_bis, T1, W, N)[2 * N]
+
+  /* We need to use a temporary result buffer (result_bis), because
+ the one used for other tests is not large enough. A subset of the
+ result data is moved from result_bis to result, and it is this
+ subset which is used to check the actual behaviour. The next
+ macro enables to move another chunk of data from result_bis to
+ result.  */
+#define TEST_VSHUFFLE(INSN, Q, T1, T2, W, N)   \
+  VECT_ARRAY_VAR(result_vec, T1, W, N, 2) =\
+INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \
+ VECT_VAR(vector2, T1, W, N)); \
+  vst2##Q##_##T2##W(VECT_VAR(result_bis, T1, W, N),\
+   VECT_ARRAY_VAR(result_vec, T1, W, N, 2));   \
+  memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(result_bis, T1, W, N),   \
+sizeof(VECT_VAR(result, T1, W, N)));
+
+  /* Overwrite result with the contents of result_bis[X].  */
+#define TEST_EXTRA_CHUNK(T1, W, N, X)  \
+  memcpy(VECT_VAR(result, T1, W, N), (VECT_VAR(result_bis, T1, W, N)[X*N]), \
+sizeof(VECT_VAR(result, T1, W, N)));
+
+  DECL_VARIABLE_ALL_VARIANTS(vector1);
+  DECL_VARIABLE_ALL_VARIANTS(vector2);
+
+  /* We don't need 64 bits variants.  */
+#define DECL_ALL_VSHUFFLE()\
+  DECL_VSHUFFLE(int, 8, 8);\
+  DECL_VSHUFFLE(int, 16, 4);   \
+  DECL_VSHUFFLE(int, 32, 2);   \
+  DECL_VSHUFFLE(uint, 8, 8);   \
+  DECL_VSHUFFLE(uint, 16, 4);  \
+  DECL_VSHUFFLE(uint, 32, 2);  \
+  DECL_VSHUFFLE(poly, 8, 8);   \
+  DECL_VSHUFFLE(poly, 16, 4);  \
+  DECL_VSHUFFLE(float, 32, 2); \
+  DECL_VSHUFFLE(int, 8, 16);   \
+  DECL_VSHUFFLE(int, 16, 8);   \
+  DECL_VSHUFFLE(int, 32, 4);   \
+  DECL_VSHUFFLE(uint, 8, 16);  \
+  DECL_VSHUFFLE(uint, 16, 8);  \
+  DECL_VSHUFFLE(uint, 32, 4);  \
+  DECL_VSHUFFLE(poly, 8, 16);  \
+  DECL_VSHUFFLE(poly, 16, 8);  \
+  DECL_VSHUFFLE(float, 32, 4)
+
+  DECL_ALL_VSHUFFLE();
+
+  /* Initialize input vector from buffer.  */
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer);
+  VLOAD(vector1, buffer, , float, f, 32, 2);
+  VLOAD(vector1, buffer, q, float, f, 32, 4);
+
+  /* Choose arbitrary initialization values.  */
+  VDUP(vector2, , int, s, 8, 8, 0x11);
+  VDUP(vector2, , int, s, 16, 4, 0x22);
+  VDUP(vector2, , int, s, 32, 2, 0x33);
+  VDUP(vector2, , uint, u, 8, 8, 0x55);
+  VDUP(vector2, , uint, u, 16, 4, 0x66);
+  VDUP(vector2, , uint, u, 32, 2, 0x77);
+  VDUP(vector2, , poly, p, 8, 8, 0x55);
+  VDUP(vector2, , poly, p, 16, 4, 0x66);
+  VDUP(vector2, , float, f, 32, 2, 33.6f);
+
+  VDUP(vector2, q, int, s, 8, 16, 0x11);
+  VDUP(vector2, q, int, s, 16, 8, 0x22);
+  VDUP(vector2, q, int, s, 32, 4, 0x33);
+  VDUP(vector2, q, uint, u, 8, 16, 0x55);
+  VDUP(vector2, q, uint, u, 16, 8, 0x66);
+  VDUP(vector2, q, uint, u, 32, 4, 0x77);
+  VDUP(vector2, q, poly, p, 8, 16, 0x55);
+  VDUP(vector2, q, poly, p, 16, 8, 0x66);
+  VDUP(vector2, q, float, f, 32, 4, 33.8f);
+
+#define TEST_ALL_VSHUFFLE(INSN)\
+  TEST_VSHUFFLE(INSN, , int, s, 8, 8); \
+  TEST_VSHUFFLE(INSN, , int, s, 16, 4);\
+  TEST_VSHUFFLE(INSN, , int, s, 32, 2);\
+  TEST_VSHUFFLE(INSN, , uint, u, 8, 8);\
+  TEST_VSHUFFLE(INSN, , uint, u, 16, 4);   \
+  TEST_VSHUFFLE(INSN, , uint, u, 32, 2);   \
+  TEST_VSHUFFLE(INSN, , poly, p, 8, 8);\
+  

Re: [[ARM/AArch64][testsuite] 09/36] Add vsubhn, vraddhn and vrsubhn tests. Split vaddhn.c into vXXXhn.inc and vaddhn.c to share code with other new tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:


 * gcc.target/aarch64/advsimd-intrinsics/vXXXhn.inc: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vraddhn.c: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vrsubhn.c: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vsubhn.c: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vaddhn.c: Use code from
 vXXXhn.inc.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vXXXhn.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vXXXhn.inc
new file mode 100644
index 000..0dbcc92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vXXXhn.inc
@@ -0,0 +1,50 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: vec64=vaddhn(vec128_a, vec128_b), then store the result.  */
+#define TEST_VADDHN1(INSN, T1, T2, W, W2, N)   \
+  VECT_VAR(vector64, T1, W2, N) = INSN##_##T2##W(VECT_VAR(vector1, T1, W, N), \
+VECT_VAR(vector2, T1, W, N)); \
+  vst1_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector64, T1, W2, N))
+
+#define TEST_VADDHN(INSN, T1, T2, W, W2, N)\
+  TEST_VADDHN1(INSN, T1, T2, W, W2, N)
+


Minor nit. If this is a template file, maybe you should name this macro 
TEST_ADDHN as TEST_XXHN? Just that a template having an INSN name is 
confusing.



+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };


Though never used, poly seems to have sneaked in here too.

Otherwise, LGTM.

Thanks,
Tejas.



Re: [[ARM/AArch64][testsuite] 10/36] Add vmlal and vmlsl tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmlXl.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmlal.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmlsl.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlXl.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlXl.inc
new file mode 100644
index 000..1e6bab3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlXl.inc
@@ -0,0 +1,89 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* vector_res = OP(vector, vector3, vector4),
+ then store the result.  */
+#define TEST_VMLXL1(INSN, T1, T2, W, W2, N)\
+  VECT_VAR(vector_res, T1, W, N) =  \
+INSN##_##T2##W2(VECT_VAR(vector, T1, W, N), \
+VECT_VAR(vector3, T1, W2, N),   \
+VECT_VAR(vector4, T1, W2, N));  \
+  vst1q_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_VMLXL(INSN, T1, T2, W, W2, N) \
+  TEST_VMLXL1(INSN, T1, T2, W, W2, N)
+
+  DECL_VARIABLE(vector, int, 16, 8);
+  DECL_VARIABLE(vector3, int, 8, 8);
+  DECL_VARIABLE(vector4, int, 8, 8);
+  DECL_VARIABLE(vector_res, int, 16, 8);
+
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector3, int, 16, 4);
+  DECL_VARIABLE(vector4, int, 16, 4);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+
+  DECL_VARIABLE(vector, int, 64, 2);
+  DECL_VARIABLE(vector3, int, 32, 2);
+  DECL_VARIABLE(vector4, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+
+  DECL_VARIABLE(vector, uint, 16, 8);
+  DECL_VARIABLE(vector3, uint, 8, 8);
+  DECL_VARIABLE(vector4, uint, 8, 8);
+  DECL_VARIABLE(vector_res, uint, 16, 8);
+
+  DECL_VARIABLE(vector, uint, 32, 4);
+  DECL_VARIABLE(vector3, uint, 16, 4);
+  DECL_VARIABLE(vector4, uint, 16, 4);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+
+  DECL_VARIABLE(vector, uint, 64, 2);
+  DECL_VARIABLE(vector3, uint, 32, 2);
+  DECL_VARIABLE(vector4, uint, 32, 2);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  VLOAD(vector, buffer, q, int, s, 16, 8);
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, int, s, 64, 2);
+  VLOAD(vector, buffer, q, uint, u, 16, 8);
+  VLOAD(vector, buffer, q, uint, u, 32, 4);
+  VLOAD(vector, buffer, q, uint, u, 64, 2);
+
+  VDUP(vector3, , int, s, 8, 8, 0x55);
+  VDUP(vector4, , int, s, 8, 8, 0xBB);
+  VDUP(vector3, , int, s, 16, 4, 0x55);
+  VDUP(vector4, , int, s, 16, 4, 0xBB);
+  VDUP(vector3, , int, s, 32, 2, 0x55);
+  VDUP(vector4, , int, s, 32, 2, 0xBB);
+  VDUP(vector3, , uint, u, 8, 8, 0x55);
+  VDUP(vector4, , uint, u, 8, 8, 0xBB);
+  VDUP(vector3, , uint, u, 16, 4, 0x55);
+  VDUP(vector4, , uint, u, 16, 4, 0xBB);
+  VDUP(vector3, , uint, u, 32, 2, 0x55);
+  VDUP(vector4, , uint, u, 32, 2, 0xBB);
+
+  TEST_VMLXL(INSN_NAME, int, s, 16, 8, 8);
+  TEST_VMLXL(INSN_NAME, int, s, 32, 16, 4);
+  TEST_VMLXL(INSN_NAME, int, s, 64, 32, 2);
+  TEST_VMLXL(INSN_NAME, uint, u, 16, 8, 8);
+  TEST_VMLXL(INSN_NAME, uint, u, 32, 16, 4);
+  TEST_VMLXL(INSN_NAME, uint, u, 64, 32, 2);
+
+  CHECK(TEST_MSG, int, 16, 8, PRIx16, expected, );
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, );
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, );
+  CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected, );
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, );
+  CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected, );
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlal.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlal.c
new file mode 100644
index 000..c147f31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlal.c
@@ -0,0 +1,18 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vmlal
+#define TEST_MSG VMLAL
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xe907, 0xe908, 0xe909, 0xe90a,
+   0xe90b, 0xe90c, 0xe90d, 0xe90e };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x3e07, 0x3e08, 0x3e09, 0x3e0a };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x3e07, 0x3e08 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0x3e07, 0x3e08, 0x3e09, 0x3e0a,
+0x3e0b, 0x3e0c, 0x3e0d, 0x3e0e };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x3e07, 0x3e08, 0x3e09, 0x3e0a };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x3e07, 0x3e08 };
+
+#include vmlXl.inc
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlsl.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlsl.c
new file mode 100644
index 000..6c984ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlsl.c
@@ -0,0 +1,22 @@
+#include arm_neon.h

Re: [[ARM/AArch64][testsuite] 11/36] Add vmlal_lane and vmlsl_lane tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmlXl_lane.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmlal_lane.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmlsl_lane.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlXl_lane.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlXl_lane.inc
new file mode 100644
index 000..ca45134
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlXl_lane.inc
@@ -0,0 +1,70 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* vector_res = vmlxl_lane(vector, vector3, vector4, lane),
+ then store the result.  */
+#define TEST_VMLXL_LANE1(INSN, T1, T2, W, W2, N, V)\
+  VECT_VAR(vector_res, T1, W, N) =  \
+INSN##_##T2##W2(VECT_VAR(vector, T1, W, N), \
+VECT_VAR(vector3, T1, W2, N),   \
+VECT_VAR(vector4, T1, W2, N),   \
+V); \
+  vst1q_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_VMLXL_LANE(INSN, T1, T2, W, W2, N, V) \
+  TEST_VMLXL_LANE1(INSN, T1, T2, W, W2, N, V)
+
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector3, int, 16, 4);
+  DECL_VARIABLE(vector4, int, 16, 4);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+
+  DECL_VARIABLE(vector, int, 64, 2);
+  DECL_VARIABLE(vector3, int, 32, 2);
+  DECL_VARIABLE(vector4, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+
+  DECL_VARIABLE(vector, uint, 32, 4);
+  DECL_VARIABLE(vector3, uint, 16, 4);
+  DECL_VARIABLE(vector4, uint, 16, 4);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+
+  DECL_VARIABLE(vector, uint, 64, 2);
+  DECL_VARIABLE(vector3, uint, 32, 2);
+  DECL_VARIABLE(vector4, uint, 32, 2);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, int, s, 64, 2);
+  VLOAD(vector, buffer, q, uint, u, 32, 4);
+  VLOAD(vector, buffer, q, uint, u, 64, 2);
+
+  VDUP(vector3, , int, s, 16, 4, 0x55);
+  VDUP(vector4, , int, s, 16, 4, 0xBB);
+  VDUP(vector3, , int, s, 32, 2, 0x55);
+  VDUP(vector4, , int, s, 32, 2, 0xBB);
+  VDUP(vector3, , uint, u, 16, 4, 0x55);
+  VDUP(vector4, , uint, u, 16, 4, 0xBB);
+  VDUP(vector3, , uint, u, 32, 2, 0x55);
+  VDUP(vector4, , uint, u, 32, 2, 0xBB);
+
+  TEST_VMLXL_LANE(INSN_NAME, int, s, 32, 16, 4, 2);
+  TEST_VMLXL_LANE(INSN_NAME, int, s, 64, 32, 2, 1);
+  TEST_VMLXL_LANE(INSN_NAME, uint, u, 32, 16, 4, 2);
+  TEST_VMLXL_LANE(INSN_NAME, uint, u, 64, 32, 2, 1);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, );
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, );
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, );
+  CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected, );
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlal_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlal_lane.c
new file mode 100644
index 000..0a384a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlal_lane.c
@@ -0,0 +1,14 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vmlal_lane
+#define TEST_MSG VMLAL_LANE
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x3e07, 0x3e08, 0x3e09, 0x3e0a };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x3e07, 0x3e08 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x3e07, 0x3e08, 0x3e09, 0x3e0a };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x3e07, 0x3e08 };
+
+#include vmlXl_lane.inc
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlsl_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlsl_lane.c
new file mode 100644
index 000..8b944a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlsl_lane.c
@@ -0,0 +1,18 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vmlsl_lane
+#define TEST_MSG VMLSL_LANE
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xc1d9, 0xc1da,
+   0xc1db, 0xc1dc };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xc1d9,
+   0xc1da };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xc1d9, 0xc1da,
+0xc1db, 0xc1dc };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xc1d9,
+0xc1da };
+
+#include vmlXl_lane.inc



LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 12/36] Add vmlal_n and vmlsl_n tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmlXl_n.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmlal_n.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmlsl_n.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlXl_n.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlXl_n.inc
new file mode 100644
index 000..a968584
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlXl_n.inc
@@ -0,0 +1,61 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* vector_res = vmlxl_n(vector, vector2, val),
+ then store the result.  */
+#define TEST_VMLXL_N1(INSN, T1, T2, W, W2, N, V)   \
+  VECT_VAR(vector_res, T1, W, N) = INSN##_##T2##W2(VECT_VAR(vector, T1, W, N), 
\
+  VECT_VAR(vector2, T1, W2, 
N), \
+  V);  \
+  vst1q_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_VMLXL_N(INSN, T1, T2, W, W2, N, V)\
+  TEST_VMLXL_N1(INSN, T1, T2, W, W2, N, V)
+
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+
+  DECL_VARIABLE(vector, int, 64, 2);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+
+  DECL_VARIABLE(vector, uint, 32, 4);
+  DECL_VARIABLE(vector2, uint, 16, 4);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+
+  DECL_VARIABLE(vector, uint, 64, 2);
+  DECL_VARIABLE(vector2, uint, 32, 2);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, int, s, 64, 2);
+  VLOAD(vector, buffer, q, uint, u, 32, 4);
+  VLOAD(vector, buffer, q, uint, u, 64, 2);
+
+  VDUP(vector2, , int, s, 16, 4, 0x55);
+  VDUP(vector2, , int, s, 32, 2, 0x55);
+  VDUP(vector2, , uint, u, 16, 4, 0x55);
+  VDUP(vector2, , uint, u, 32, 2, 0x55);
+
+  /* Choose multiplier arbitrarily.  */
+  TEST_VMLXL_N(INSN_NAME, int, s, 32, 16, 4, 0x11);
+  TEST_VMLXL_N(INSN_NAME, int, s, 64, 32, 2, 0x22);
+  TEST_VMLXL_N(INSN_NAME, uint, u, 32, 16, 4, 0x33);
+  TEST_VMLXL_N(INSN_NAME, uint, u, 64, 32, 2, 0x33);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, );
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, );
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, );
+  CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected, );
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlal_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlal_n.c
new file mode 100644
index 000..118068c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlal_n.c
@@ -0,0 +1,14 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vmlal_n
+#define TEST_MSG VMLAL_N
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x595, 0x596, 0x597, 0x598 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xb3a, 0xb3b };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x10df, 0x10e0, 0x10e1, 0x10e2 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x10df, 0x10e0 };
+
+#include vmlXl_n.inc
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlsl_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlsl_n.c
new file mode 100644
index 000..a26c69f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlsl_n.c
@@ -0,0 +1,18 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vmlsl_n
+#define TEST_MSG VMLSL_N
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xfa4b, 0xfa4c,
+   0xfa4d, 0xfa4e };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xf4a6,
+   0xf4a7 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xef01, 0xef02,
+0xef03, 0xef04 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xef01,
+0xef02 };
+
+#include vmlXl_n.inc



LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 13/36] Add vmla_n and vmls_n tests.

2015-01-16 Thread Tejas Belagod

+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };


No poly vmlx_n, otherwise LGTM.

Tejas.




Re: [[ARM/AArch64][testsuite] 14/36] Add vqdmlal and vqdmlsl tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vqdmlXl.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vqdmlal.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vqdmlsl.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlXl.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlXl.inc
new file mode 100644
index 000..cd61fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlXl.inc
@@ -0,0 +1,63 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* vector_res = OP(vector, vector3, vector4),
+ then store the result.  */
+#define TEST_VQDMLXL1(INSN, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT) \
+  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W, N));  \
+  VECT_VAR(vector_res, T1, W, N) = \
+INSN##_##T2##W2(VECT_VAR(vector, T1, W, N),
\
+   VECT_VAR(vector3, T1, W2, N),   \
+   VECT_VAR(vector4, T1, W2, N));  \
+vst1q_##T2##W(VECT_VAR(result, T1, W, N),  \
+ VECT_VAR(vector_res, T1, W, N));  \
+CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQDMLXL(INSN, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT) \
+  TEST_VQDMLXL1(INSN, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector3, int, 16, 4);
+  DECL_VARIABLE(vector4, int, 16, 4);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector, int, 64, 2);
+  DECL_VARIABLE(vector3, int, 32, 2);
+  DECL_VARIABLE(vector4, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+
+  clean_results ();
+
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, int, s, 64, 2);
+
+  VDUP(vector3, , int, s, 16, 4, 0x55);
+  VDUP(vector4, , int, s, 16, 4, 0xBB);
+  VDUP(vector3, , int, s, 32, 2, 0x55);
+  VDUP(vector4, , int, s, 32, 2, 0xBB);
+
+  TEST_VQDMLXL(INSN_NAME, int, s, 32, 16, 4, expected_cumulative_sat, );
+  TEST_VQDMLXL(INSN_NAME, int, s, 64, 32, 2, expected_cumulative_sat, );
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, );
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, );
+
+  VDUP(vector3, , int, s, 16, 4, 0x8000);
+  VDUP(vector4, , int, s, 16, 4, 0x8000);
+  VDUP(vector3, , int, s, 32, 2, 0x8000);
+  VDUP(vector4, , int, s, 32, 2, 0x8000);
+
+#define TEST_MSG2 with saturation
+  TEST_VQDMLXL(INSN_NAME, int, s, 32, 16, 4, expected_cumulative_sat2, 
TEST_MSG2);
+  TEST_VQDMLXL(INSN_NAME, int, s, 64, 32, 2, expected_cumulative_sat2, 
TEST_MSG2);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected2, TEST_MSG2);
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected2, TEST_MSG2);
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlal.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlal.c
new file mode 100644
index 000..c53a90a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlal.c
@@ -0,0 +1,27 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vqdmlal
+#define TEST_MSG VQDMLAL
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR(expected_cumulative_sat,int,32,4) = 0;
+int VECT_VAR(expected_cumulative_sat,int,64,2) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x7c1e, 0x7c1f, 0x7c20, 0x7c21 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x7c1e, 0x7c1f };
+
+/* Expected values of cumulative_saturation flag when saturation
+   occurs.  */
+int VECT_VAR(expected_cumulative_sat2,int,32,4) = 1;
+int VECT_VAR(expected_cumulative_sat2,int,64,2) = 1;
+
+/* Expected results when saturation occurs.  */
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0x7fef, 0x7ff0,
+0x7ff1, 0x7ff2 };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0x7fef,
+0x7ff0 };
+
+#include vqdmlXl.inc
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlsl.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlsl.c
new file mode 100644
index 000..56e0b61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlsl.c
@@ -0,0 +1,29 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vqdmlsl
+#define TEST_MSG VQDMLSL
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR(expected_cumulative_sat,int,32,4) = 0;
+int VECT_VAR(expected_cumulative_sat,int,64,2) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x83c2, 0x83c3,
+   0x83c4, 0x83c5 

Re: [[ARM/AArch64][testsuite] 15/36] Add vqdmlal_lane and vqdmlsl_lane tests.

2015-01-16 Thread Tejas Belagod

+
+/* Expected values of cumulative_saturation flag when multiplication
+   saturates.  */


Note: Saturation (and hence QC bit-setting) can also occur with the 
accumulation.



LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 16/36] Add vqdmlal_n and vqdmlsl_n tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vqdmlXl_n.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vqdmlal_n.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vqdmlsl_n.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlXl_n.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlXl_n.inc
new file mode 100644
index 000..fd885dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlXl_n.inc
@@ -0,0 +1,59 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* vector_res = vqdmlxl_n(vector, vector3, val),
+ then store the result.  */
+#define TEST_VQDMLXL_N1(INSN, T1, T2, W, W2, N, V, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W, N));  \
+  VECT_VAR(vector_res, T1, W, N) = \
+INSN##_##T2##W2(VECT_VAR(vector, T1, W, N),
\
+   VECT_VAR(vector3, T1, W2, N),   \
+   V); \
+  vst1q_##T2##W(VECT_VAR(result, T1, W, N),\
+   VECT_VAR(vector_res, T1, W, N));\
+  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQDMLXL_N(INSN, T1, T2, W, W2, N, V, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  TEST_VQDMLXL_N1(INSN, T1, T2, W, W2, N, V, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector3, int, 16, 4);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+
+  DECL_VARIABLE(vector, int, 64, 2);
+  DECL_VARIABLE(vector3, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+
+  clean_results ();
+
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, int, s, 64, 2);
+
+  VDUP(vector3, , int, s, 16, 4, 0x55);
+  VDUP(vector3, , int, s, 32, 2, 0x55);
+
+  /* Choose val arbitrarily.  */
+  TEST_VQDMLXL_N(INSN_NAME, int, s, 32, 16, 4, 0x22, expected_cumulative_sat, 
);
+  TEST_VQDMLXL_N(INSN_NAME, int, s, 64, 32, 2, 0x33, expected_cumulative_sat, 
);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, );
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, );
+
+#define TEST_MSG2 (check mul cumulative saturation)
+  VDUP(vector3, , int, s, 16, 4, 0x8000);
+  VDUP(vector3, , int, s, 32, 2, 0x8000);
+
+  TEST_VQDMLXL_N(INSN_NAME, int, s, 32, 16, 4, 0x8000, 
expected_cumulative_sat2, TEST_MSG2);
+  TEST_VQDMLXL_N(INSN_NAME, int, s, 64, 32, 2, 0x8000, 
expected_cumulative_sat2, TEST_MSG2);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected2, TEST_MSG2);
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected2, TEST_MSG2);
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlal_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlal_n.c
new file mode 100644
index 000..b84bca3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlal_n.c
@@ -0,0 +1,27 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vqdmlal_n
+#define TEST_MSG VQDMLAL_N
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR(expected_cumulative_sat,int,32,4) = 0;
+int VECT_VAR(expected_cumulative_sat,int,64,2) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x1684, 0x1685, 0x1686, 0x1687 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x21ce, 0x21cf };
+
+/* Expected values of cumulative_saturation flag when saturation
+   occurs.  */
+int VECT_VAR(expected_cumulative_sat2,int,32,4) = 1;
+int VECT_VAR(expected_cumulative_sat2,int,64,2) = 1;
+
+/* Expected results when saturation occurs.  */
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0x7fef, 0x7ff0,
+0x7ff1, 0x7ff2 };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0x7fef,
+0x7ff0 };
+
+#include vqdmlXl_n.inc
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlsl_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlsl_n.c
new file mode 100644
index 000..ff8d9d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlsl_n.c
@@ -0,0 +1,29 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define INSN_NAME vqdmlsl_n
+#define TEST_MSG VQDMLSL_N
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR(expected_cumulative_sat,int,32,4) = 0;
+int VECT_VAR(expected_cumulative_sat,int,64,2) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xe95c, 0xe95d,
+   0xe95e, 0xe95f };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xde12,
+  

  1   2   3   >