from:"Victor Do Nascimento"

Re: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-12 Thread Victor Do Nascimento


On 7/12/24 03:23, Jiang, Haochen wrote:

-Original Message-
From: Hongtao Liu 
Sent: Thursday, July 11, 2024 9:45 AM
To: Victor Do Nascimento 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com;
richard.earns...@arm.com
Subject: Re: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and
sse targets

On Wed, Jul 10, 2024 at 10:10 PM Victor Do Nascimento
 wrote:


Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

The patch LGTM. BTW you can use existing  instead of
new  and  instead of 


gcc/ChangeLog:

 * config/i386/mmx.md (usdot_prodv8qi): Deleted.
 (usdot_prodv2siv8qi): New.


Hi Victor,

I suppose all the patterns are renamed not deleted and new right?
If that is the case, I suppose the log might be better and easier to understand
if changed to something like:

(old pattern): Renamed to ...
(new pattern): this.

Thx,
Haochen


You're right, it's a straight-forward renaming.  I will amend the 
changelogs as per your suggestion.


Thanks for the tip!,
Victor


 (sdot_prodv8qi): Deleted.
 (sdot_prodv2siv8qi): New.
 (udot_prodv8qi): Deleted.
 (udot_prodv2siv8qi): New.
 (usdot_prodv4hi): Deleted.
 (usdot_prodv2siv4hi): New.
 (udot_prodv4hi): Deleted.
 (udot_prodv2siv4hi): New.
 (sdot_prodv4hi): Deleted.
 (sdot_prodv2siv4hi): New.
 * config/i386/sse.md (fourwayacc): New.
 (twowayacc): New.
 (sdot_prod): Deleted.
 (sdot_prod): New.
 (sdot_prodv4si): Deleted.
 (sdot_prodv2div4si): New.
 (usdot_prod): Deleted.
 (usdot_prod): New.
 (sdot_prod): Deleted.
 (sdot_prod): New.
 (sdot_prodv64qi): Deleted.
 (sdot_prodv16siv64qi): New.
 (udot_prod): Deleted.
 (udot_prod): New.
 (udot_prodv64qi): Deleted.
 (udot_prodv16qiv64qi): New.
 (usdot_prod): Deleted.
 (usdot_prod): New.
 (udot_prod): Deleted.
 (udot_prod): New.
---
  gcc/config/i386/mmx.md | 30 +--
gcc/config/i386/sse.md | 47 +

-

  2 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index
94d3a6e5692..d78739b033d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -6344,7 +6344,7 @@ (define_expand "usadv8qi"
DONE;
  })

-(define_expand "usdot_prodv8qi"
+(define_expand "usdot_prodv2siv8qi"
[(match_operand:V2SI 0 "register_operand")
 (match_operand:V8QI 1 "register_operand")
 (match_operand:V8QI 2 "register_operand") @@ -6363,7 +6363,7 @@
(define_expand "usdot_prodv8qi"
rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
rtx op0 = gen_reg_rtx (V4SImode);

-  emit_insn (gen_usdot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_usdot_prodv4siv16qi (op0, op1, op2, op3));
emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0,

V4SImode));

   }
 else
@@ -6377,7 +6377,7 @@ (define_expand "usdot_prodv8qi"
emit_move_insn (op3, CONST0_RTX (V4SImode));
emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));

/* vec_perm (op0, 2, 3, 0, 1);  */
emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78))); @@
-6388,7 +6388,7 @@ (define_expand "usdot_prodv8qi"
  DONE;
  })

-(define_expand "sdot_prodv8qi"
+(define_expand "sdot_prodv2siv8qi"
[(match_operand:V2SI 0 "register_operand")
 (match_operand:V8QI 1 "register_operand")
 (match_operand:V8QI 2 "register_operand") @@ -6406,7 +6406,7 @@
(define_expand "sdot_prodv8qi"
rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
rtx op0 = gen_reg_rtx (V4SImode);

-  emit_insn (gen_sdot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv16qi (op0, op1, op2, op3));
emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0,

V4SImode));

  }
else
@@ -6420,7 +6420,7 @@ (define_expand "sdot_prodv8qi"
emit_move_insn (op3, CONST0_RTX (V4SImode));
emit_insn (gen_extendv8qiv8hi2 (op1, operands[1]));
emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));

/* vec_perm (op0, 2, 3, 0, 1);  */
emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78))); @@
-6432,7 +6432,7 @@ (define_expand "sdot_prodv8qi"

  })

-(define_expand &q

[PATCH 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.

2024-07-10 Thread Victor Do Nascimento

Given recent changes to the dot_prod standard pattern name, this patch
fixes the aarch64 back-end by implementing the following changes:

1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
2. Rewrite initialization and function expansion mechanism for simd
builtins.
3. Fix all direct calls to back-end `dot_prod' patterns in SVE
builtins.

Finally, given that it is now possible for the compiler to
differentiate between the two- and four-way dot product, we add a test
to ensure that autovectorization picks up on dot-product patterns
where the result is twice the width of the operands.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
New AARCH64_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI,
UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI.
(aarch64_init_builtin_dotprod_functions): New.
(aarch64_init_simd_builtins): Add call to
`aarch64_init_builtin_dotprod_functions'.
(aarch64_general_gimple_fold_builtin): Add DOT_PROD_EXPR
handling.
* config/aarch64/aarch64-simd-builtins.def: Remove macro
expansion-based initialization and expansion
of (u|s|us)dot_prod builtins.
* config/aarch64/aarch64-simd.md
(dot_prod): Deleted.
(dot_prod): New.
(usdot_prod): Deleted.
(usdot_prod): New.
(sadv16qi): Adjust call to gen_udot_prod take second mode.
(popcount): fix use of `udot_prod_optab'.
* config/aarch64/aarch64-sve-builtins-base.cc
(svdot_impl::expand): s/direct/convert/ in
`convert_optab_handler_for_sign' function call.
(svusdot_impl::expand): add second mode argument in call to
`code_for_dot_prod'.
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::convert_optab_handler_for_sign): New class
method.
* config/aarch64/aarch64-sve-builtins.h
(class function_expander): Add prototype for new
`convert_optab_handler_for_sign' method.
* gcc/config/aarch64/aarch64-sve.md
(dot_prod): Deleted.
(dot_prod): New.
(@dot_prod): Deleted.
(@dot_prod): New.
(sad): Adjust call to gen_udot_prod take second mode.
* gcc/config/aarch64/aarch64-sve2.md
(@aarch64_sve_dotvnx4sivnx8hi): Deleted.
(dot_prodvnx4sivnx8hi): New.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 71 +++
 gcc/config/aarch64/aarch64-simd-builtins.def  |  4 --
 gcc/config/aarch64/aarch64-simd.md|  9 +--
 .../aarch64/aarch64-sve-builtins-base.cc  | 13 ++--
 gcc/config/aarch64/aarch64-sve-builtins.cc| 17 +
 gcc/config/aarch64/aarch64-sve-builtins.h |  3 +
 gcc/config/aarch64/aarch64-sve.md |  6 +-
 gcc/config/aarch64/aarch64-sve2.md|  2 +-
 gcc/config/aarch64/iterators.md   |  1 +
 .../aarch64/sme/vect-dotprod-twoway.c | 25 +++
 10 files changed, 133 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 30669f8aa18..6c7c86d0e6e 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -783,6 +783,12 @@ enum aarch64_builtins
   AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1,
   AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_PATTERN_START
  + ARRAY_SIZE (aarch64_simd_builtin_data) - 1,
+  AARCH64_BUILTIN_SDOTV8QI,
+  AARCH64_BUILTIN_SDOTV16QI,
+  AARCH64_BUILTIN_UDOTV8QI,
+  AARCH64_BUILTIN_UDOTV16QI,
+  AARCH64_BUILTIN_USDOTV8QI,
+  AARCH64_BUILTIN_USDOTV16QI,
   AARCH64_CRC32_BUILTIN_BASE,
   AARCH64_CRC32_BUILTINS
   AARCH64_CRC32_BUILTIN_MAX,
@@ -1642,6 +1648,60 @@ handle_arm_neon_h (void)
   aarch64_init_simd_intrinsics ();
 }
 
+void
+aarch64_init_builtin_dotprod_functions (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+
+  tree uv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_unsigned);
+  tree sv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_none);
+  tree uv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_unsigned);
+  tree sv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_none);
+  tree uv2si = aarch64_simd_builtin_type (V2SImode, qualifier_unsigned);
+  tree sv2si = aarch64_simd_builtin_type (V2SImode, qualifier_none);
+  tree uv4si = aarch64_simd_builtin_type (V4SImode, qualifier_unsigned);
+  tree sv4si = aarch64_simd_builtin_type (V4SImode, qualifier_none);
+
+  struct builtin_decls_data
+  {
+tree out_type_node;
+tree in_type1_node;
+tree in_type2_node;
+const char *builtin_name;
+int function_code;
+  };
+
+#define NAME(A) "__builtin_aarch64_" #A
+#define ENUM(B) AARCH64_BUILTIN_##B
+
+  builtin_decls_data bdda[] =
+  {
+{ sv2si, sv8qi,  sv8qi,

[PATCH 00/10] Make `dot_prod' a convert-type optab

2024-07-10 Thread Victor Do Nascimento

Given the specification in the GCC internals manual defines the
{u|s}dot_prod standard name as taking "two signed elements of the
same mode, adding them to a third operand of wider mode", there is
currently ambiguity in the relationship between the mode of the first
two arguments and that of the third.

This vagueness means that, in theory, different modes may be
supportable in the third argument.  This flexibility would allow for a
given backend to add to the accumulator a different number of
vectorized products, e.g. A backend may provide instructions for both:

  accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]

and

  accum += a[0] * b[0] + a[1] * b[1],

as is now seen in the SVE2.1 extension to AArch64.  In spite of the
aforementioned flexibility, modeling the dot-product operation as a
direct optab means that we have no way to encode both input and the
accumulator data modes into the backend pattern name, which prevents
us from harnessing this flexibility.

The purpose of this patch-series is therefore to remedy this current
shortcoming, moving the `dot_prod' from its current implementation as
a direct optab to an implementation where, as a conversion optab, we
are able to differentiate between dot products taking the same input
mode but resulting in a different output mode.

Regression-tested on x86_64, aarch64 and armhf.  I'd appreciate help
running relevant tests on the remaining architectures, i.e. arc, mips,
altivec and c6x to ensure I've not inadvertently broken anything for
those backends.

Victor Do Nascimento (10):
  optabs: Make all `*dot_prod_optab's modeled as conversions
  autovectorizer: Add basic support for convert optabs
  aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.
  arm: Fix arm backend-use of (u|s|us)dot_prod patterns.
  i386: Fix dot_prod backend patterns for mmx and sse targets
  arc: Adjust dot-product backend patterns
  mips:  Adjust dot-product backend patterns
  altivec: Adjust dot-product backend patterns
  c6x:  Adjust dot-product backend patterns
  autovectorizer: Test autovectorization of different dot-prod modes.

 gcc/config/aarch64/aarch64-builtins.cc| 71 ++
 gcc/config/aarch64/aarch64-simd-builtins.def  |  4 -
 gcc/config/aarch64/aarch64-simd.md|  9 +-
 .../aarch64/aarch64-sve-builtins-base.cc  | 13 +--
 gcc/config/aarch64/aarch64-sve-builtins.cc| 17 
 gcc/config/aarch64/aarch64-sve-builtins.h |  3 +
 gcc/config/aarch64/aarch64-sve.md |  6 +-
 gcc/config/aarch64/aarch64-sve2.md|  2 +-
 gcc/config/aarch64/iterators.md   |  1 +
 gcc/config/arc/simdext.md |  8 +-
 gcc/config/arm/arm-builtins.cc| 95 +++
 gcc/config/arm/arm-protos.h   |  3 +
 gcc/config/arm/arm.cc |  1 +
 gcc/config/arm/arm_neon_builtins.def  |  3 -
 gcc/config/arm/neon.md|  4 +-
 gcc/config/c6x/c6x.md |  2 +-
 gcc/config/i386/mmx.md| 30 +++---
 gcc/config/i386/sse.md| 47 +
 gcc/config/mips/loongson-mmi.md   |  2 +-
 gcc/config/rs6000/altivec.md  |  4 +-
 gcc/doc/md.texi   | 18 ++--
 gcc/gimple-match-exports.cc   | 18 
 gcc/gimple-match.h|  2 +
 gcc/optabs.cc |  3 +-
 gcc/optabs.def|  6 +-
 .../gcc.dg/vect/vect-dotprod-twoway.c | 38 
 .../aarch64/sme/vect-dotprod-twoway.c | 25 +
 gcc/tree-vect-loop.cc |  1 +
 gcc/tree-vect-patterns.cc | 43 -
 29 files changed, 399 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c

-- 
2.34.1

[PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-10 Thread Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/i386/mmx.md (usdot_prodv8qi): Deleted.
(usdot_prodv2siv8qi): New.
(sdot_prodv8qi): Deleted.
(sdot_prodv2siv8qi): New.
(udot_prodv8qi): Deleted.
(udot_prodv2siv8qi): New.
(usdot_prodv4hi): Deleted.
(usdot_prodv2siv4hi): New.
(udot_prodv4hi): Deleted.
(udot_prodv2siv4hi): New.
(sdot_prodv4hi): Deleted.
(sdot_prodv2siv4hi): New.
* config/i386/sse.md (fourwayacc): New.
(twowayacc): New.
(sdot_prod): Deleted.
(sdot_prod): New.
(sdot_prodv4si): Deleted.
(sdot_prodv2div4si): New.
(usdot_prod): Deleted.
(usdot_prod): New.
(sdot_prod): Deleted.
(sdot_prod): New.
(sdot_prodv64qi): Deleted.
(sdot_prodv16siv64qi): New.
(udot_prod): Deleted.
(udot_prod): New.
(udot_prodv64qi): Deleted.
(udot_prodv16qiv64qi): New.
(usdot_prod): Deleted.
(usdot_prod): New.
(udot_prod): Deleted.
(udot_prod): New.
---
 gcc/config/i386/mmx.md | 30 +--
 gcc/config/i386/sse.md | 47 +-
 2 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 94d3a6e5692..d78739b033d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -6344,7 +6344,7 @@ (define_expand "usadv8qi"
   DONE;
 })
 
-(define_expand "usdot_prodv8qi"
+(define_expand "usdot_prodv2siv8qi"
   [(match_operand:V2SI 0 "register_operand")
(match_operand:V8QI 1 "register_operand")
(match_operand:V8QI 2 "register_operand")
@@ -6363,7 +6363,7 @@ (define_expand "usdot_prodv8qi"
   rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
   rtx op0 = gen_reg_rtx (V4SImode);
 
-  emit_insn (gen_usdot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_usdot_prodv4siv16qi (op0, op1, op2, op3));
   emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
  }
else
@@ -6377,7 +6377,7 @@ (define_expand "usdot_prodv8qi"
   emit_move_insn (op3, CONST0_RTX (V4SImode));
   emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
   emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
 
   /* vec_perm (op0, 2, 3, 0, 1);  */
   emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
@@ -6388,7 +6388,7 @@ (define_expand "usdot_prodv8qi"
 DONE;
 })
 
-(define_expand "sdot_prodv8qi"
+(define_expand "sdot_prodv2siv8qi"
   [(match_operand:V2SI 0 "register_operand")
(match_operand:V8QI 1 "register_operand")
(match_operand:V8QI 2 "register_operand")
@@ -6406,7 +6406,7 @@ (define_expand "sdot_prodv8qi"
   rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
   rtx op0 = gen_reg_rtx (V4SImode);
 
-  emit_insn (gen_sdot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv16qi (op0, op1, op2, op3));
   emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
 }
   else
@@ -6420,7 +6420,7 @@ (define_expand "sdot_prodv8qi"
   emit_move_insn (op3, CONST0_RTX (V4SImode));
   emit_insn (gen_extendv8qiv8hi2 (op1, operands[1]));
   emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
 
   /* vec_perm (op0, 2, 3, 0, 1);  */
   emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
@@ -6432,7 +6432,7 @@ (define_expand "sdot_prodv8qi"
 
 })
 
-(define_expand "udot_prodv8qi"
+(define_expand "udot_prodv2siv8qi"
   [(match_operand:V2SI 0 "register_operand")
(match_operand:V8QI 1 "register_operand")
(match_operand:V8QI 2 "register_operand")
@@ -6450,7 +6450,7 @@ (define_expand "udot_prodv8qi"
   rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
   rtx op0 = gen_reg_rtx (V4SImode);
 
-  emit_insn (gen_udot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_udot_prodv4siv16qi (op0, op1, op2, op3));
   emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
 }
   else
@@ -6464,7 +6464,7 @@ (define_expand "udot_prodv8qi"
   emit_move_insn (op3, CONST0_RTX (V4SImode));
   emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
   emit_insn (gen_zero_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));
 
   /* vec_perm (op0, 2, 3, 0, 1);  */
   emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78)));
@@ -6476,7 +6476,7 @@ (define_expand "udot_prodv8qi"
 
 })

[PATCH 06/10] arc: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/arc/simdext.md (sdot_prodv2hi): Deleted.
(sdot_prodsiv2hi): New.
(udot_prodv2hi): Deleted.
(udot_prodsiv2hi): New.
(sdot_prodv4hi): Deleted.
(sdot_prodv2siv4hi): New.
(udot_prodv4hi): Deleted.
(udot_prodv2siv4hi): New.
---
 gcc/config/arc/simdext.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arc/simdext.md b/gcc/config/arc/simdext.md
index 4e51a237c3a..0696f0abb70 100644
--- a/gcc/config/arc/simdext.md
+++ b/gcc/config/arc/simdext.md
@@ -1643,7 +1643,7 @@ (define_insn "dmpyh"
 
 ;; We can use dmac as well here.  To be investigated which version
 ;; brings more.
-(define_expand "sdot_prodv2hi"
+(define_expand "sdot_prodsiv2hi"
   [(match_operand:SI 0 "register_operand" "")
(match_operand:V2HI 1 "register_operand" "")
(match_operand:V2HI 2 "register_operand" "")
@@ -1656,7 +1656,7 @@ (define_expand "sdot_prodv2hi"
  DONE;
 })
 
-(define_expand "udot_prodv2hi"
+(define_expand "udot_prodsiv2hi"
   [(match_operand:SI 0 "register_operand" "")
(match_operand:V2HI 1 "register_operand" "")
(match_operand:V2HI 2 "register_operand" "")
@@ -1669,7 +1669,7 @@ (define_expand "udot_prodv2hi"
  DONE;
 })
 
-(define_expand "sdot_prodv4hi"
+(define_expand "sdot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand" "")
(match_operand:V4HI 1 "register_operand" "")
(match_operand:V4HI 2 "register_operand" "")
@@ -1688,7 +1688,7 @@ (define_expand "sdot_prodv4hi"
  DONE;
 })
 
-(define_expand "udot_prodv4hi"
+(define_expand "udot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand" "")
(match_operand:V4HI 1 "register_operand" "")
(match_operand:V4HI 2 "register_operand" "")
-- 
2.34.1

[PATCH 02/10] autovectorizer: Add basic support for convert optabs

2024-07-10 Thread Victor Do Nascimento

Given the shift from modeling dot products as direct optabs to
treating them as conversion optabs, we make necessary changes to the
autovectorizer code to ensure that given the relevant tree code,
together with the input and output data modes, we can retrieve the
relevant optab and subsequently the insn_code for it.

gcc/ChangeLog:

* gimple-match-exports.cc (directly_supported_p): Add overload
for conversion-type optabs.
* gimple-match.h (directly_supported_p): Add new function
prototype.
* optabs.cc (expand_widen_pattern_expr): Make the
DOT_PROD_EXPR tree code use `find_widening_optab_handler' to
retrieve icode.
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): make it
call conversion-type overloaded `directly_supported_p'.
* tree-vect-patterns.cc (vect_supportable_conv_optab_p): New.
(vect_recog_dot_prod_pattern): s/direct/conv/ in call to
`vect_supportable_direct_optab_p'.
---
 gcc/gimple-match-exports.cc | 18 
 gcc/gimple-match.h  |  2 ++
 gcc/optabs.cc   |  3 ++-
 gcc/tree-vect-loop.cc   |  1 +
 gcc/tree-vect-patterns.cc   | 43 +++--
 5 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index aacf3ff0414..c079fa1fb19 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -1381,6 +1381,24 @@ directly_supported_p (code_helper code, tree type, 
optab_subtype query_type)
  && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED));
 }
 
+/* As above, overloading the function for conversion-type optabs.  */
+bool
+directly_supported_p (code_helper code, tree type_out, tree type_in,
+ optab_subtype query_type)
+{
+
+  if (code.is_tree_code ())
+{
+  convert_optab optab = optab_for_tree_code (tree_code (code), type_in,
+   query_type);
+  return (optab != unknown_optab
+ && convert_optab_handler (optab, TYPE_MODE (type_out),
+   TYPE_MODE (type_in)) != 
CODE_FOR_nothing);
+}
+  gcc_unreachable ();
+}
+
+
 /* A wrapper around the internal-fn.cc versions of get_conditional_internal_fn
for a code_helper CODE operating on type TYPE.  */
 
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index d710fcbace2..0333a5db00a 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -419,6 +419,8 @@ code_helper canonicalize_code (code_helper, tree);
 
 #ifdef GCC_OPTABS_TREE_H
 bool directly_supported_p (code_helper, tree, optab_subtype = optab_default);
+bool directly_supported_p (code_helper, tree, tree,
+  optab_subtype = optab_default);
 #endif
 
 internal_fn get_conditional_internal_fn (code_helper, tree);
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 185c5b1a705..32737fb80e8 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -317,7 +317,8 @@ expand_widen_pattern_expr (const_sepops ops, rtx op0, rtx 
op1, rtx wide_op,
 widen_pattern_optab
   = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
   if (ops->code == WIDEN_MULT_PLUS_EXPR
-  || ops->code == WIDEN_MULT_MINUS_EXPR)
+  || ops->code == WIDEN_MULT_MINUS_EXPR
+  || ops->code == DOT_PROD_EXPR)
 icode = find_widening_optab_handler (widen_pattern_optab,
 TYPE_MODE (TREE_TYPE (ops->op2)),
 tmode0);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a64b5082bd1..7e4c1e0f52e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5289,6 +5289,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info stmt_info)
 
   gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
   return !directly_supported_p (DOT_PROD_EXPR,
+   STMT_VINFO_VECTYPE (stmt_info),
STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
optab_vector_mixed_sign);
 }
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 86e893a1c43..c4dd627aa90 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -248,6 +248,45 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree 
otype, tree_code code,
   return true;
 }
 
+/* Return true if the target supports a vector version of CODE,
+   where CODE is known to map to a conversion optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
+
+   When returning true, set *VECOTYPE_OUT to the vector version of OTYPE.
+   Also set *VECITYPE_OUT to the vector version of ITYPE if VECITYPE_OUT
+   is nonnull.  */
+
+static bool
+vect_supportable_conv_optab_p (vec_info *vinfo, tree otype, tree_code code,
+tree itype, tree *vecotype_out,
+

[PATCH 08/10] altivec: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/rs6000/altivec.md (udot_prod): Deleted.
(udot_prodv4si): New.
(sdot_prodv8hi): Deleted.
(sdot_prodv4siv8hi): New.
---
 gcc/config/rs6000/altivec.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..0682c8eb184 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -3699,7 +3699,7 @@ (define_expand "neg2"
 }
 })
 
-(define_expand "udot_prod"
+(define_expand "udot_prodv4si"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
(unspec:V4SI [(match_operand:VIshort 1 "register_operand" 
"v")  
@@ -3711,7 +3711,7 @@ (define_expand "udot_prod"
   DONE;
 })
 
-(define_expand "sdot_prodv8hi"
+(define_expand "sdot_prodv4siv8hi"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
(unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
-- 
2.34.1

[PATCH 09/10] c6x: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/c6x/c6x.md (sdot_prodv2hi): Deleted.
(sdot_prodsiv2hi): New.
---
 gcc/config/c6x/c6x.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/c6x/c6x.md b/gcc/config/c6x/c6x.md
index 5964dd69d0d..ea9ffe8b4e1 100644
--- a/gcc/config/c6x/c6x.md
+++ b/gcc/config/c6x/c6x.md
@@ -3082,7 +3082,7 @@ (define_insn "v2hi3"
 ;; Widening vector multiply and dot product.
 ;; See c6x-mult.md.in for the define_insn patterns
 
-(define_expand "sdot_prodv2hi"
+(define_expand "sdot_prodsiv2hi"
   [(match_operand:SI 0 "register_operand" "")
(match_operand:V2HI 1 "register_operand" "")
(match_operand:V2HI 2 "register_operand" "")
-- 
2.34.1

[PATCH 07/10] mips: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento

Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

gcc/ChangeLog:

* config/mips/loongson-mmi.md (sdot_prodv4hi): Deleted.
(sdot_prodv2siv4hi): New.
---
 gcc/config/mips/loongson-mmi.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/loongson-mmi.md b/gcc/config/mips/loongson-mmi.md
index dd166bfa4c9..4d958730139 100644
--- a/gcc/config/mips/loongson-mmi.md
+++ b/gcc/config/mips/loongson-mmi.md
@@ -394,7 +394,7 @@ (define_insn "loongson_pmaddhw"
   "pmaddhw\t%0,%1,%2"
   [(set_attr "type" "fmul")])
 
-(define_expand "sdot_prodv4hi"
+(define_expand "sdot_prodv2siv4hi"
   [(match_operand:V2SI 0 "register_operand" "")
(match_operand:V4HI 1 "register_operand" "")
(match_operand:V4HI 2 "register_operand" "")
-- 
2.34.1

[PATCH 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-07-10 Thread Victor Do Nascimento

From: Victor Do Nascimento 

Given the novel treatment of the dot product optab as a conversion we
are now able to target, for a given architecture, different
relationships between output modes and input modes.

This is made clearer by way of example. Previously, on AArch64, the
following loop was vectorizable:

uint32_t udot4(int n, uint8_t* data) {
  uint32_t sum = 0;
  for (int i=0; i
+
+uint32_t udot4(int n, uint8_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i

[PATCH 01/10] optabs: Make all `*dot_prod_optab's modeled as conversions

2024-07-10 Thread Victor Do Nascimento

Given the specification in the GCC internals manual defines the
{u|s}dot_prod standard name as taking "two signed elements of the
same mode, adding them to a third operand of wider mode", there is
currently ambiguity in the relationship between the mode of the first
two arguments and that of the third.

This vagueness means that, in theory, different modes may be
supportable in the third argument.  This flexibility would allow for a
given backend to add to the accumulator a different number of
vectorized products, e.g. A backend may provide instructions for both:

  accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]

and

  accum += a[0] * b[0] + a[1] * b[1],

as is now seen in the SVE2.1 extension to AArch64.  In spite of the
aforementioned flexibility, modeling the dot-product operation as a
direct optab means that we have no way to encode both input and the
accumulator data modes into the backend pattern name, which prevents
us from harnessing this flexibility.

We therefore make all dot_prod optabs conversions, allowing, for
example, for the encoding of both 2-way and 4-way dot product backend
patterns.

gcc/ChangeLog:

* optabs.def (sdot_prod_optab): Convert from OPTAB_D to
OPTAB_CD.
(udot_prod_optab): Likewise.
(usdot_prod_optab): Likewise.
* doc/md.texi (Standard Names): update entries for u,s and us
dot_prod names.
---
 gcc/doc/md.texi | 18 +-
 gcc/optabs.def  |  6 +++---
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 7f4335e0aac..2a74e473f05 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5748,15 +5748,15 @@ for (i = 0; i < LEN + BIAS; i++)
 operand0 += operand2[i];
 @end smallexample
 
-@cindex @code{sdot_prod@var{m}} instruction pattern
-@item @samp{sdot_prod@var{m}}
+@cindex @code{sdot_prod@var{m}@var{n}} instruction pattern
+@item @samp{sdot_prod@var{m}@var{n}}
 
 Compute the sum of the products of two signed elements.
 Operand 1 and operand 2 are of the same mode. Their
 product, which is of a wider mode, is computed and added to operand 3.
 Operand 3 is of a mode equal or wider than the mode of the product. The
 result is placed in operand 0, which is of the same mode as operand 3.
-@var{m} is the mode of operand 1 and operand 2.
+@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 and 
2.
 
 Semantically the expressions perform the multiplication in the following signs
 
@@ -5766,15 +5766,15 @@ sdot ==
 @dots{}
 @end smallexample
 
-@cindex @code{udot_prod@var{m}} instruction pattern
-@item @samp{udot_prod@var{m}}
+@cindex @code{udot_prod@var{m}@var{n}} instruction pattern
+@item @samp{udot_prod@var{m}@var{n}}
 
 Compute the sum of the products of two unsigned elements.
 Operand 1 and operand 2 are of the same mode. Their
 product, which is of a wider mode, is computed and added to operand 3.
 Operand 3 is of a mode equal or wider than the mode of the product. The
 result is placed in operand 0, which is of the same mode as operand 3.
-@var{m} is the mode of operand 1 and operand 2.
+@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 and 
2.
 
 Semantically the expressions perform the multiplication in the following signs
 
@@ -5784,14 +5784,14 @@ udot ==
 @dots{}
 @end smallexample
 
-@cindex @code{usdot_prod@var{m}} instruction pattern
-@item @samp{usdot_prod@var{m}}
+@cindex @code{usdot_prod@var{m}@var{n}} instruction pattern
+@item @samp{usdot_prod@var{m}@var{n}}
 Compute the sum of the products of elements of different signs.
 Operand 1 must be unsigned and operand 2 signed. Their
 product, which is of a wider mode, is computed and added to operand 3.
 Operand 3 is of a mode equal or wider than the mode of the product. The
 result is placed in operand 0, which is of the same mode as operand 3.
-@var{m} is the mode of operand 1 and operand 2.
+@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 and 
2.
 
 Semantically the expressions perform the multiplication in the following signs
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 45e117a7f50..fce4b2d5b08 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -106,6 +106,9 @@ OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
 OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b")
 OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
 OPTAB_CD(vec_init_optab, "vec_init$a$b")
+OPTAB_CD (sdot_prod_optab, "sdot_prod$I$a$b")
+OPTAB_CD (udot_prod_optab, "udot_prod$I$a$b")
+OPTAB_CD (usdot_prod_optab, "usdot_prod$I$a$b")
 
 OPTAB_CD (while_ult_optab, "while_ult$a$b")
 
@@ -409,10 +412,7 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor")
 OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
 OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
 OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
-OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
-OPTAB_D (udot_prod_optab, "udot_prod$I$a")
-OPTAB_D

[PATCH 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns.

2024-07-10 Thread Victor Do Nascimento

gcc/ChangeLog:

* config/arm/arm-builtins.cc (enum arm_builtins): Add new
ARM_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, UDOTV8QI,
UDOTV16QI, USDOTV8QI, USDOTV16QI.
(arm_init_dotprod_builtins): New.
(arm_init_builtins): Add call to `arm_init_dotprod_builtins'.
(arm_general_gimple_fold_builtin): New.
* config/arm/arm-protos.h (arm_general_gimple_fold_builtin):
New prototype.
* config/arm/arm.cc (arm_gimple_fold_builtin): Add call to
`arm_general_gimple_fold_builtin'.
* config/arm/neon.md (dot_prod): Deleted.
(dot_prod): New.
(neon_usdot): Deleted.
(neon_usdot): New.
---
 gcc/config/arm/arm-builtins.cc   | 95 
 gcc/config/arm/arm-protos.h  |  3 +
 gcc/config/arm/arm.cc|  1 +
 gcc/config/arm/arm_neon_builtins.def |  3 -
 gcc/config/arm/neon.md   |  4 +-
 5 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index c9d50bf8fbb..b23b6caa063 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -45,6 +45,8 @@
 #include "arm-builtins.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "basic-block.h"
+#include "gimple.h"
 
 #define SIMD_MAX_BUILTIN_ARGS 7
 
@@ -1298,6 +1300,13 @@ enum arm_builtins
 #define VAR1(T, N, X) \
   ARM_BUILTIN_##N,
 
+  ARM_BUILTIN_NEON_SDOTV8QI,
+  ARM_BUILTIN_NEON_SDOTV16QI,
+  ARM_BUILTIN_NEON_UDOTV8QI,
+  ARM_BUILTIN_NEON_UDOTV16QI,
+  ARM_BUILTIN_NEON_USDOTV8QI,
+  ARM_BUILTIN_NEON_USDOTV16QI,
+
   ARM_BUILTIN_ACLE_BASE,
   ARM_BUILTIN_SAT_IMM_CHECK = ARM_BUILTIN_ACLE_BASE,
 
@@ -2648,6 +2657,60 @@ arm_init_fp16_builtins (void)
   "__fp16");
 }
 
+static void
+arm_init_dotprod_builtins (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+
+  tree uv8qi = arm_simd_builtin_type (V8QImode, qualifier_unsigned);
+  tree sv8qi = arm_simd_builtin_type (V8QImode, qualifier_none);
+  tree uv16qi = arm_simd_builtin_type (V16QImode, qualifier_unsigned);
+  tree sv16qi = arm_simd_builtin_type (V16QImode, qualifier_none);
+  tree uv2si = arm_simd_builtin_type (V2SImode, qualifier_unsigned);
+  tree sv2si = arm_simd_builtin_type (V2SImode, qualifier_none);
+  tree uv4si = arm_simd_builtin_type (V4SImode, qualifier_unsigned);
+  tree sv4si = arm_simd_builtin_type (V4SImode, qualifier_none);
+
+  struct builtin_decls_data
+  {
+tree out_type_node;
+tree in_type1_node;
+tree in_type2_node;
+const char *builtin_name;
+int function_code;
+  };
+
+#define NAME(A) "__builtin_neon_" #A
+#define ENUM(B) ARM_BUILTIN_NEON_##B
+
+  builtin_decls_data bdda[] =
+  {
+{ sv2si, sv8qi,  sv8qi,  NAME (sdotv8qi),  ENUM (SDOTV8QI)   },
+{ uv2si, uv8qi,  uv8qi,  NAME (udotv8qi_),  ENUM (UDOTV8QI)   },
+{ sv2si, uv8qi,  sv8qi,  NAME (usdotv8qi_ssus), ENUM (USDOTV8QI)  },
+{ sv4si, sv16qi, sv16qi, NAME (sdotv16qi), ENUM (SDOTV16QI)  },
+{ uv4si, uv16qi, uv16qi, NAME (udotv16qi_),  ENUM (UDOTV16QI)  },
+{ sv4si, uv16qi, sv16qi, NAME (usdotv16qi_ssus), ENUM (USDOTV16QI) },
+  };
+
+#undef NAME
+#undef ENUM
+
+  builtin_decls_data *bdd = bdda;
+  builtin_decls_data *bdd_end = bdd + (ARRAY_SIZE (bdda));
+
+  for (; bdd < bdd_end; bdd++)
+  {
+ftype = build_function_type_list (bdd->out_type_node, bdd->out_type_node,
+ bdd->in_type1_node, bdd->in_type2_node,
+ NULL_TREE);
+fndecl = arm_general_add_builtin_function (bdd->builtin_name,
+  ftype, bdd->function_code);
+arm_builtin_decls[bdd->function_code] = fndecl;
+  }
+}
+
 void
 arm_init_builtins (void)
 {
@@ -2676,6 +2739,7 @@ arm_init_builtins (void)
arm_init_neon_builtins ();
   arm_init_vfp_builtins ();
   arm_init_crypto_builtins ();
+  arm_init_dotprod_builtins ();
 }
 
   if (TARGET_CDE)
@@ -2738,6 +2802,37 @@ arm_builtin_decl (unsigned code, bool initialize_p 
ATTRIBUTE_UNUSED)
 }
 }
 
+/* Try to fold STMT, given that it's a call to the built-in function with
+   subcode FCODE.  Return the new statement on success and null on
+   failure.  */
+gimple *
+arm_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt,
+gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED)
+{
+  gimple *new_stmt = NULL;
+  unsigned nargs = gimple_call_num_args (stmt);
+  tree *args = (nargs > 0
+   ? gimple_call_arg_ptr (stmt, 0)
+   : _mark_node);
+
+  switch (fcode)
+{
+case ARM_BUILTIN_NEON_SDOTV8QI:
+case ARM_BUILTIN_NEON_SDOTV16QI:
+case ARM_BUILTIN_NEON_UDOTV8QI:
+case ARM_BUILTIN_NEON_UDOTV16QI:
+case ARM_BUILTIN_NEON_USDOTV8QI:
+case ARM_BUILTIN_NEON_USDOTV16QI:
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+

[PATCH v2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-06-12 Thread Victor Do Nascimento

The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.

These operations are single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S (libat_load_16): Add LRCPC3
variant.
(libat_store_16): Likewise.
* config/linux/aarch64/host-config.h (HWCAP2_LRCPC3): New.
(LSE2_LRCPC3_ATOP): Previously LSE2_ATOP.  New ifuncs guarded
under it.
(has_rcpc3): New.
---
 libatomic/config/linux/aarch64/atomic_16.S   | 46 +++-
 libatomic/config/linux/aarch64/host-config.h | 34 +--
 2 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index c44c31c6418..5767fba5c03 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -35,16 +35,21 @@
writes, this will be true when using atomics in actual code.
 
The libat__16 entry points are ARMv8.0.
-   The libat__16_i1 entry points are used when LSE128 is available.
+   The libat__16_i1 entry points are used when LSE128 or LRCPC3 is 
available.
The libat__16_i2 entry points are used when LSE2 is available.  */
 
 #include "auto-config.h"
 
.arch   armv8-a+lse
 
+/* There is overlap in atomic instructions implemented in RCPC3 and LSE2.
+   Consequently, both _i1 and _i2 suffixes are needed for functions using 
these.
+   Elsewhere, all extension-specific implementations are mapped to _i1.  */
+
+#define LRCPC3(NAME)   libat_##NAME##_i1
 #define LSE128(NAME)   libat_##NAME##_i1
 #define LSE(NAME)  libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i1
+#define LSE2(NAME) libat_##NAME##_i2
 #define CORE(NAME) libat_##NAME
 #define ATOMIC(NAME)   __atomic_##NAME
 
@@ -513,6 +518,43 @@ END (test_and_set_16)
 /* ifunc implementations: Carries run-time dependence on the presence of 
further
architectural extensions.  */
 
+ENTRY_FEAT (load_16, LRCPC3)
+   cbnzw1, 1f
+
+   /* RELAXED.  */
+   ldp res0, res1, [x0]
+   ret
+1:
+   cmp w1, SEQ_CST
+   b.eq2f
+
+   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
+   /* ldiapp res0, res1, [x0]  */
+   .inst   0xd9411800
+   ret
+
+   /* SEQ_CST.  */
+2: ldartmp0, [x0]  /* Block reordering with Store-Release instr.  
*/
+   /* ldiapp res0, res1, [x0]  */
+   .inst   0xd9411800
+   ret
+END_FEAT (load_16, LRCPC3)
+
+
+ENTRY_FEAT (store_16, LRCPC3)
+   cbnzw4, 1f
+
+   /* RELAXED.  */
+   stp in0, in1, [x0]
+   ret
+
+   /* RELEASE/SEQ_CST.  */
+1: /* stilp in0, in1, [x0]  */
+   .inst   0xd9031802
+   ret
+END_FEAT (store_16, LRCPC3)
+
+
 ENTRY_FEAT (exchange_16, LSE128)
mov tmp0, x0
mov res0, in0
diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index d05e9eb628f..8adf0563001 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -33,6 +33,9 @@
 #ifndef HWCAP_USCAT
 # define HWCAP_USCAT   (1 << 25)
 #endif
+#ifndef HWCAP2_LRCPC3
+# define HWCAP2_LRCPC3 (1UL << 46)
+#endif
 #ifndef HWCAP2_LSE128
 # define HWCAP2_LSE128 (1UL << 47)
 #endif
@@ -54,7 +57,7 @@ typedef struct __ifunc_arg_t {
 #if defined (LAT_CAS_N)
 # define LSE_ATOP
 #elif defined (LAT_LOAD_N) || defined (LAT_STORE_N)
-# define LSE2_ATOP
+# define LSE2_LRCPC3_ATOP
 #elif defined (LAT_EXCH_N) || defined (LAT_FIOR_N) || defined (LAT_FAND_N)
 # define LSE128_ATOP
 #endif
@@ -63,9 +66,10 @@ typedef struct __ifunc_arg_t {
 #  if defined (LSE_ATOP)
 #   define IFUNC_NCOND(N)  1
 #   define IFUNC_COND_1(hwcap & HWCAP_ATOMICS)
-#  elif defined (LSE2_ATOP)
-#   define IFUNC_NCOND(N)  1
-#   define IFUNC_COND_1(has_lse2 (hwcap, features))
+#  elif defined (LSE2_LRCPC3_ATOP)
+#   define IFUNC_NCOND(N)  2
+#   define IFUNC_COND_1(has_rcpc3 (hwcap, features))
+#   define IFUNC_COND_2(has_lse2 (hwcap, features))
 #  elif defined (LSE128_ATOP)
 #   define IFUNC_NCOND(N)  1
 #   define IFUNC_COND_1(has_lse128 (hwcap, features))
@@ -131,6 +135,28 @@ has_lse128 (unsigned long hwcap, const __ifunc_arg_t 
*features)
   return false;
 }
 
+/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20].  The
+   expected value is 0b0011.  Check that.  */
+
+static inline bool
+has_rcpc3 (unsigned long hwcap, const __ifunc_arg_t *features)
+{
+  if (hwcap & _IFUNC_ARG_HWCAP
+  &&

[PATCH v2 2/4] Libatomic: Define per-file identifier macros

2024-06-11 Thread Victor Do Nascimento

In order to facilitate the fine-tuning of how `libatomic_i.h' and
`host-config.h' headers are used by different atomic functions, we
define distinct identifier macros for each file which, in implementing
atomic operations, imports these headers.

The idea is that different parts of these headers could then be
conditionally defined depending on the macros set by the file that
`#include'd them.

Given how it is possible that some file names are generic enough that
using them as-is for macro names (e.g. flag.c -> FLAG) may potentially
lead to name clashes with other macros, all file names first have LAT_
prepended to them such that, for example, flag.c is assigned the
LAT_FLAG macro.

Libatomic/ChangeLog:

* cas_n.c (LAT_CAS_N): New.
* exch_n.c (LAT_EXCH_N): Likewise.
* fadd_n.c (LAT_FADD_N): Likewise.
* fand_n.c (LAT_FAND_N): Likewise.
* fence.c (LAT_FENCE): Likewise.
* fenv.c (LAT_FENV): Likewise.
* fior_n.c (LAT_FIOR_N): Likewise.
* flag.c (LAT_FLAG): Likewise.
* fnand_n.c (LAT_FNAND_N): Likewise.
* fop_n.c (LAT_FOP_N): Likewise
* fsub_n.c (LAT_FSUB_N): Likewise.
* fxor_n.c (LAT_FXOR_N): Likewise.
* gcas.c (LAT_GCAS): Likewise.
* gexch.c (LAT_GEXCH): Likewise.
* glfree.c (LAT_GLFREE): Likewise.
* gload.c (LAT_GLOAD): Likewise.
* gstore.c (LAT_GSTORE): Likewise.
* load_n.c (LAT_LOAD_N): Likewise.
* store_n.c (LAT_STORE_N): Likewise.
* tas_n.c (LAT_TAS_N): Likewise.
---
 libatomic/cas_n.c   | 2 ++
 libatomic/exch_n.c  | 2 ++
 libatomic/fadd_n.c  | 2 ++
 libatomic/fand_n.c  | 2 ++
 libatomic/fence.c   | 2 ++
 libatomic/fenv.c| 2 ++
 libatomic/fior_n.c  | 2 ++
 libatomic/flag.c| 2 ++
 libatomic/fnand_n.c | 2 ++
 libatomic/fop_n.c   | 2 ++
 libatomic/fsub_n.c  | 2 ++
 libatomic/fxor_n.c  | 2 ++
 libatomic/gcas.c| 2 ++
 libatomic/gexch.c   | 2 ++
 libatomic/glfree.c  | 2 ++
 libatomic/gload.c   | 2 ++
 libatomic/gstore.c  | 2 ++
 libatomic/load_n.c  | 2 ++
 libatomic/store_n.c | 2 ++
 libatomic/tas_n.c   | 2 ++
 20 files changed, 40 insertions(+)

diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index a080b990371..2a6357e48db 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_CAS_N
 #include "libatomic_i.h"
 
 
@@ -122,3 +123,4 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, 
UTYPE newval,
 #endif
 
 EXPORT_ALIAS (SIZE(compare_exchange));
+#undef LAT_CAS_N
diff --git a/libatomic/exch_n.c b/libatomic/exch_n.c
index e5ff80769b9..184d3de1009 100644
--- a/libatomic/exch_n.c
+++ b/libatomic/exch_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_EXCH_N
 #include "libatomic_i.h"
 
 
@@ -126,3 +127,4 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel 
UNUSED)
 #endif
 
 EXPORT_ALIAS (SIZE(exchange));
+#undef LAT_EXCH_N
diff --git a/libatomic/fadd_n.c b/libatomic/fadd_n.c
index bc15b8bc0e6..32b75cec654 100644
--- a/libatomic/fadd_n.c
+++ b/libatomic/fadd_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FADD_N
 #include 
 
 #define NAME   add
@@ -43,3 +44,4 @@
 #endif
 
 #include "fop_n.c"
+#undef LAT_FADD_N
diff --git a/libatomic/fand_n.c b/libatomic/fand_n.c
index ffe9ed8700f..9eab55bcd72 100644
--- a/libatomic/fand_n.c
+++ b/libatomic/fand_n.c
@@ -1,3 +1,5 @@
+#define LAT_FAND_N
 #define NAME   and
 #define OP(X,Y)((X) & (Y))
 #include "fop_n.c"
+#undef LAT_FAND_N
diff --git a/libatomic/fence.c b/libatomic/fence.c
index a9b1e280c5a..4022194a57a 100644
--- a/libatomic/fence.c
+++ b/libatomic/fence.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FENCE
 #include "libatomic_i.h"
 
 #include 
@@ -43,3 +44,4 @@ void
 {
   atomic_signal_fence (order);
 }
+#undef LAT_FENCE
diff --git a/libatomic/fenv.c b/libatomic/fenv.c
index 41f187c1f85..dccad356a31 100644
--- a/libatomic/fenv.c
+++ b/libatomic/fenv.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FENV
 #include "libatomic_i.h"
 
 #ifdef HAVE_FENV_H
@@ -70,3 +71,4 @@ __atomic_feraiseexcept (int excepts __attribute__ ((unused)))
 }
 #endif
 }
+#undef LAT_FENV
diff --git a/libatomic/fior_n.c b/libatomic/fior_n.c
index 55d0d66b469..2b58d4805d6 100644
--- a/libatomic/fior_n.c
+++ b/libatomic/fior_n.c
@@ -1,3 +1,5 @@
+#define LAT_FIOR_N
 #define NAME   or
 #define OP(X,Y)((X) | (Y))
 #include "fop_n.c"
+#undef LAT_FIOR_N
diff --git a/libatomic/flag.c b/libatomic/flag.c
index e4a5a27819a..8afd80c9130 100644
---

[PATCH v2 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file

2024-06-11 Thread Victor Do Nascimento

At present, `atomic_16.S' groups different implementations of the
same functions together in the file.  Therefore, as an example,
the LSE2 implementation of `load_16' follows on immediately from its
core implementation, as does the `store_16' LSE2 implementation.

Such architectural extension-dependent implementations are dependent
on ifunc support, such that they are guarded by the relevant
preprocessor macro, i.e.  `#if HAVE_IFUNC'.

Having to apply these guards on a per-function basis adds unnecessary
clutter to the file and makes its maintenance more error-prone.

We therefore reorganize the layout of the file in such a way that all
core implementations needing no `#ifdef's are placed first, followed
by all ifunc-dependent implementations, which can all be guarded by a
single `#if HAVE_IFUNC', greatly reducing the overall number of
required `#ifdef' macros.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S: Reorganize functions in
file.
(HAVE_FEAT_LSE2): Delete.
---
 libatomic/config/linux/aarch64/atomic_16.S | 445 +++--
 1 file changed, 223 insertions(+), 222 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 11a296dacc3..c44c31c6418 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,8 +40,6 @@
 
 #include "auto-config.h"
 
-#define HAVE_FEAT_LSE2 HAVE_IFUNC
-
.arch   armv8-a+lse
 
 #define LSE128(NAME)   libat_##NAME##_i1
@@ -116,6 +114,9 @@ NAME:   \
 #define SEQ_CST 5
 
 
+/* Core implementations: Not dependent on the presence of further architectural
+   extensions.  */
+
 ENTRY (load_16)
mov x5, x0
cbnzw1, 2f
@@ -134,31 +135,6 @@ ENTRY (load_16)
 END (load_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (load_16, LSE2)
-   cbnzw1, 1f
-
-   /* RELAXED.  */
-   ldp res0, res1, [x0]
-   ret
-1:
-   cmp w1, SEQ_CST
-   b.eq2f
-
-   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
-   ldp res0, res1, [x0]
-   dmb ishld
-   ret
-
-   /* SEQ_CST.  */
-2: ldartmp0, [x0]  /* Block reordering with Store-Release instr.  
*/
-   ldp res0, res1, [x0]
-   dmb ishld
-   ret
-END_FEAT (load_16, LSE2)
-#endif
-
-
 ENTRY (store_16)
cbnzw4, 2f
 
@@ -176,23 +152,6 @@ ENTRY (store_16)
 END (store_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (store_16, LSE2)
-   cbnzw4, 1f
-
-   /* RELAXED.  */
-   stp in0, in1, [x0]
-   ret
-
-   /* RELEASE/SEQ_CST.  */
-1: ldxpxzr, tmp0, [x0]
-   stlxp   w4, in0, in1, [x0]
-   cbnzw4, 1b
-   ret
-END_FEAT (store_16, LSE2)
-#endif
-
-
 ENTRY (exchange_16)
mov x5, x0
cbnzw4, 2f
@@ -220,32 +179,6 @@ ENTRY (exchange_16)
 END (exchange_16)
 
 
-ENTRY_FEAT (exchange_16, LSE128)
-   mov tmp0, x0
-   mov res0, in0
-   mov res1, in1
-   cbnzw4, 1f
-
-   /* RELAXED.  */
-   /* swpp res0, res1, [tmp0]  */
-   .inst   0x192180c0
-   ret
-1:
-   cmp w4, ACQUIRE
-   b.hi2f
-
-   /* ACQUIRE/CONSUME.  */
-   /* swppa res0, res1, [tmp0]  */
-   .inst   0x19a180c0
-   ret
-
-   /* RELEASE/ACQ_REL/SEQ_CST.  */
-2: /* swppal res0, res1, [tmp0]  */
-   .inst   0x19e180c0
-   ret
-END_FEAT (exchange_16, LSE128)
-
-
 ENTRY (compare_exchange_16)
ldp exp0, exp1, [x1]
cbz w4, 3f
@@ -293,42 +226,6 @@ ENTRY (compare_exchange_16)
 END (compare_exchange_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE)
-   ldp exp0, exp1, [x1]
-   mov tmp0, exp0
-   mov tmp1, exp1
-   cbz w4, 2f
-   cmp w4, RELEASE
-   b.hs3f
-
-   /* ACQUIRE/CONSUME.  */
-   caspa   exp0, exp1, in0, in1, [x0]
-0:
-   cmp exp0, tmp0
-   ccmpexp1, tmp1, 0, eq
-   bne 1f
-   mov x0, 1
-   ret
-1:
-   stp exp0, exp1, [x1]
-   mov x0, 0
-   ret
-
-   /* RELAXED.  */
-2: caspexp0, exp1, in0, in1, [x0]
-   b   0b
-
-   /* RELEASE.  */
-3: b.hi4f
-   caspl   exp0, exp1, in0, in1, [x0]
-   b   0b
-
-   /* ACQ_REL/SEQ_CST.  */
-4: caspal  exp0, exp1, in0, in1, [x0]
-   b   0b
-END_FEAT (compare_exchange_16, LSE)
-#endif
 
 
 ENTRY_ALIASED (fetch_add_16)
@@ -441,32 +338,6 @@ ENTRY (fetch_or_16)
 END (fetch_or_16)
 
 
-ENTRY_FEAT (fetch_or_16, LSE128)
-   mov tmp0, x0
-   mov res0, in0
-   mov res1, in1
-   cbnzw4, 1f
-
-   /* RELAXED.  */
-   /* ldsetp res0, res1, [tmp0]  */
-   .inst   0x192130c0
-   ret
-1:
-   cmp w4, ACQUIRE
-   b.hi2f
-
-   /* ACQUIRE/CONSUME.  */
-   /* ldsetpa res0, res1, [tmp0]  */
-   .inst   0x19a130c0
-   ret
-
-   /*

[PATCH v2 3/4] Libatomic: Make ifunc selector behavior contingent on importing file

2024-06-11 Thread Victor Do Nascimento

By querying previously-defined file-identifier macros, `host-config.h'
is able to get information about its environment and, based on this
information, select more appropriate function-specific ifunc
selectors.  This reduces the number of unnecessary feature tests that
need to be carried out in order to find the best atomic implementation
for a function at run-time.

An immediate benefit of this is that we can further fine-tune the
architectural requirements for each atomic function without risk of
incurring the maintenance and runtime-performance penalties of having
to maintain an ifunc selector with a huge number of alternatives, most
of which are irrelevant for any particular function.  Consequently,
for AArch64 targets, we relax the architectural requirements of
`compare_exchange_16', which now requires only LSE as opposed to the
newer LSE2.

The new flexibility provided by this approach also means that certain
functions can now be called directly, doing away with ifunc selectors
altogether when only a single implementation is available for it on a
given target.  As per the macro expansion framework laid out in
`libatomic_i.h', such functions should have their names prefixed with
`__atomic_' as opposed to `libat_'.  This is the same prefix applied
to function names when Libatomic is configured with
`--disable-gnu-indirect-function'.

To achieve this, these functions unconditionally apply the aliasing
rule that at present is conditionally applied only when libatomic is
built without ifunc support, which ensures that the default
`libat_##NAME' is accessible via the equivalent `__atomic_##NAME' too.
This is ensured by using the new `ENTRY_ALIASED' macro.

Finally, this means we are able to do away with a whole set of
function aliases that were needed until now, thus considerably
cleaning up the implementation.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S: Remove unnecessary
aliasing.
(LSE): New.
(ENTRY_ALIASED): Likewise.
* config/linux/aarch64/host-config.h (LSE_ATOP): New.
(LSE2_ATOP): Likewise.
(LSE128_ATOP): Likewise.
(IFUNC_COND_1): Make its definition conditional on above 3
macros.
(IFUNC_NCOND): Likewise.
---
 libatomic/config/linux/aarch64/atomic_16.S   | 64 ++--
 libatomic/config/linux/aarch64/host-config.h | 35 ---
 2 files changed, 45 insertions(+), 54 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index d6e71ba6e16..11a296dacc3 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -45,17 +45,20 @@
.arch   armv8-a+lse
 
 #define LSE128(NAME)   libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i2
+#define LSE(NAME)  libat_##NAME##_i1
+#define LSE2(NAME) libat_##NAME##_i1
 #define CORE(NAME) libat_##NAME
 #define ATOMIC(NAME)   __atomic_##NAME
 
+/* Emit __atomic_* entrypoints if no ifuncs.  */
+#define ENTRY_ALIASED(NAME)ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+
 #if HAVE_IFUNC
 # define ENTRY(NAME)   ENTRY2 (CORE (NAME), )
 # define ENTRY_FEAT(NAME, FEAT) ENTRY2 (FEAT (NAME), )
 # define END_FEAT(NAME, FEAT)  END2 (FEAT (NAME))
 #else
-/* Emit __atomic_* entrypoints if no ifuncs.  */
-# define ENTRY(NAME)   ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+# define ENTRY(NAME)   ENTRY_ALIASED (NAME)
 #endif
 
 #define END(NAME)  END2 (CORE (NAME))
@@ -291,7 +294,7 @@ END (compare_exchange_16)
 
 
 #if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE2)
+ENTRY_FEAT (compare_exchange_16, LSE)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -324,11 +327,11 @@ ENTRY_FEAT (compare_exchange_16, LSE2)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END_FEAT (compare_exchange_16, LSE2)
+END_FEAT (compare_exchange_16, LSE)
 #endif
 
 
-ENTRY (fetch_add_16)
+ENTRY_ALIASED (fetch_add_16)
mov x5, x0
cbnzw4, 2f
 
@@ -350,7 +353,7 @@ ENTRY (fetch_add_16)
 END (fetch_add_16)
 
 
-ENTRY (add_fetch_16)
+ENTRY_ALIASED (add_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -372,7 +375,7 @@ ENTRY (add_fetch_16)
 END (add_fetch_16)
 
 
-ENTRY (fetch_sub_16)
+ENTRY_ALIASED (fetch_sub_16)
mov x5, x0
cbnzw4, 2f
 
@@ -394,7 +397,7 @@ ENTRY (fetch_sub_16)
 END (fetch_sub_16)
 
 
-ENTRY (sub_fetch_16)
+ENTRY_ALIASED (sub_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -620,7 +623,7 @@ ENTRY_FEAT (and_fetch_16, LSE128)
 END_FEAT (and_fetch_16, LSE128)
 
 
-ENTRY (fetch_xor_16)
+ENTRY_ALIASED (fetch_xor_16)
mov x5, x0
cbnzw4, 2f
 
@@ -642,7 +645,7 @@ ENTRY (fetch_xor_16)
 END (fetch_xor_16)
 
 
-ENTRY (xor_fetch_16)
+ENTRY_ALIASED (xor_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -664,7 +667,7 @@ ENTRY (xor_fetch_16)
 END

[PATCH v2 1/4] Libatomic: AArch64: Convert all lse128 assembly to .insn directives

2024-06-11 Thread Victor Do Nascimento

Given the lack of support for the LSE128 instructions in all but the
the most up-to-date version of Binutils (2.42), having the build-time
test for assembler support for these instructions often leads to the
building of Libatomic without support for LSE128-dependent atomic
function implementations.  This ultimately leads to different people
having different versions of Libatomic on their machines, depending on
which assembler was available at compilation time.

Furthermore, the conditional inclusion of these atomic function
implementations predicated on assembler support leads to a series of
`#if HAVE_FEAT_LSE128' guards scattered throughout the codebase and
the need for a series of aliases when the feature flag evaluates
to false.  The preprocessor macro guards, together with the
conditional aliasing leads to code that is cumbersome to understand
and maintain.

Both of the issues highlighted above will only get worse with the
coming support for LRCPC3 atomics which under the current scheme will
also require build-time checks.

Consequently, a better option for both consistency across builds and
code cleanness is to make recourse to the `.inst' directive.  By
replacing all novel assembly instructions for their hexadecimal
representation within `.inst's, we ensure that the Libatomic code is
both considerably cleaner and all machines build the same binary,
irrespective of binutils version available at compile time.

This patch therefore removes all configure checks for LSE128-support
in the assembler and all the guards and aliases that were associated
with `HAVE_FEAT_LSE128'

libatomic/ChangeLog:

* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): Delete.
* auto-config.h.in (HAVE_FEAT_LSE128): Likewise
* config/linux/aarch64/atomic_16.S: Replace all LSE128
instructions with equivalent `.inst' directives.
(HAVE_FEAT_LSE128): Remove all references.
* configure: Regenerate.
* configure.ac: Remove call to LIBAT_TEST_FEAT_AARCH64_LSE128.
---
 libatomic/acinclude.m4 | 18 -
 libatomic/auto-config.h.in |  3 -
 libatomic/config/linux/aarch64/atomic_16.S | 76 +-
 libatomic/configure| 43 
 libatomic/configure.ac |  3 -
 5 files changed, 32 insertions(+), 111 deletions(-)

diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index 6d2e0b1c355..f35ab5b60a5 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,24 +83,6 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
   ])
 ])
 
-dnl
-dnl Test if the host assembler supports armv9.4-a LSE128 isns.
-dnl
-AC_DEFUN([LIBAT_TEST_FEAT_AARCH64_LSE128],[
-  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
-[libat_cv_have_feat_lse128],[
-AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
-if AC_TRY_EVAL(ac_compile); then
-  eval libat_cv_have_feat_lse128=yes
-else
-  eval libat_cv_have_feat_lse128=no
-fi
-rm -f conftest*
-  ])
-  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
-   [Have LSE128 support for 16 byte integers.])
-])
-
 dnl
 dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
 dnl
diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
index 7c78933b07d..ab3424a759e 100644
--- a/libatomic/auto-config.h.in
+++ b/libatomic/auto-config.h.in
@@ -105,9 +105,6 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_DLFCN_H
 
-/* Have LSE128 support for 16 byte integers. */
-#undef HAVE_FEAT_LSE128
-
 /* Define to 1 if you have the  header file. */
 #undef HAVE_FENV_H
 
diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index b63e97ac5a2..d6e71ba6e16 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,18 +40,9 @@
 
 #include "auto-config.h"
 
-#if !HAVE_IFUNC
-# undef HAVE_FEAT_LSE128
-# define HAVE_FEAT_LSE128 0
-#endif
-
 #define HAVE_FEAT_LSE2 HAVE_IFUNC
 
-#if HAVE_FEAT_LSE128
-   .arch   armv9-a+lse128
-#else
.arch   armv8-a+lse
-#endif
 
 #define LSE128(NAME)   libat_##NAME##_i1
 #define LSE2(NAME) libat_##NAME##_i2
@@ -226,7 +217,6 @@ ENTRY (exchange_16)
 END (exchange_16)
 
 
-#if HAVE_FEAT_LSE128
 ENTRY_FEAT (exchange_16, LSE128)
mov tmp0, x0
mov res0, in0
@@ -234,21 +224,23 @@ ENTRY_FEAT (exchange_16, LSE128)
cbnzw4, 1f
 
/* RELAXED.  */
-   swppres0, res1, [tmp0]
+   /* swpp res0, res1, [tmp0]  */
+   .inst   0x192180c0
ret
 1:
cmp w4, ACQUIRE
b.hi2f
 
/* ACQUIRE/CONSUME.  */
-   swppa   res0, res1, [tmp0]
+   /* swppa res0, res1, [tmp0]  */
+   .inst   0x19a180c0
ret
 
/* RELEASE/ACQ_REL/SEQ_CST.  */
-2: swppal  res0, res1, [tmp0]
+2: /* swppal res0, res1, [tmp0]  */
+   .inst   0x19e180c0

[PATCH v2 0/4] Libatomic: Cleanup ifunc selector and aliasing

2024-06-11 Thread Victor Do Nascimento

Changes in V2:

As explained in patch v2 1/4, it has become clear that the current
approach of querying assembler support for newer architectural
extensions at compile time is undesirable both from a maintainability
as well as a consistency standpoint - Different compiled versions of
Libatomic may have different features depending on the machine on
which they were built.

These issues make for difficult testing as the explosion in number of
`#ifdef' guards makes maintenance error-prone and the dependence on
binutils version means that, as well as deploying changes for testing
in a variety of target configurations, testing must also involve
compiling the library on an increasing number of host configurations,
meaning that the chance of bugs going undetected increases (as was
proved in the pre-commit CI which, due to the use of an older version
of Binutils, picked up on a runtime-error that had hitherto gone
unnoticed).

We therefore do away with the use of all assembly instructions
dependent on Binutils 2.42, choosing to replace them with `.inst's
instead.  This eliminates the latent bug picked up by CI and will
ensure consistent builds of Libatomic across all versions of Binutils.

---

The recent introduction of the optional LSE128 and RCPC3 architectural
extensions to AArch64 has further led to the increased flexibility of
atomic support in the architecture, with many extensions providing
support for distinct atomic operations, each with different potential
applications in mind.

This has led to maintenance difficulties in Libatomic, in particular
regarding the way the ifunc selector is generated via a series of
macro expansions at compile-time.

Until now, irrespective of the atomic operation in question, all atomic
functions for a particular operand size were expected to have the same
number of ifunc alternatives, meaning that a one-size-fits-all
approach could reasonably be taken for the selector.

This meant that if, hypothetically, for a particular architecture and
operand size one particular atomic operation was to have 3 different
implementations associated with different extensions, libatomic would
likewise be required to present three ifunc alternatives for all other
atomic functions.

The consequence in the design choice was the unnecessary use of
function aliasing and the unwieldy code which resulted from this.

This patch series attempts to remediate this issue by making the
preprocessor macros defining the number of ifunc alternatives and
their respective selection functions dependent on the file importing
the ifunc selector-generating framework.

all files are given `LAT_' macros, defined at the beginning
and undef'd at the end of the file.  It is these macros that are
subsequently used to fine-tune the behaviors of `libatomic_i.h' and
`host-config.h'.

In particular, the definition of the `IFUNC_NCOND(N)' and
`IFUNC_COND_' macros in host-config.h can now be guarded behind
these new file-specific macros, which ultimately control what the
`GEN_SELECTOR(X)' macro in `libatomic_i.h' expands to.  As both of
these headers are imported once per file implementing some atomic
operation, fine-tuned control is now possible.

Regtested with both `--enable-gnu-indirect-function' and
`--disable-gnu-indirect-function' configurations on armv9.4-a target
with LRCPC3 and LSE128 support and without.

Victor Do Nascimento (4):
  Libatomic: AArch64: Convert all lse128 assembly to .insn directives
  Libatomic: Define per-file identifier macros
  Libatomic: Make ifunc selector behavior contingent on importing file
  Libatomic: Clean up AArch64 `atomic_16.S' implementation file

 libatomic/acinclude.m4   |  18 -
 libatomic/auto-config.h.in   |   3 -
 libatomic/cas_n.c|   2 +
 libatomic/config/linux/aarch64/atomic_16.S   | 511 +--
 libatomic/config/linux/aarch64/host-config.h |  35 +-
 libatomic/configure  |  43 --
 libatomic/configure.ac   |   3 -
 libatomic/exch_n.c   |   2 +
 libatomic/fadd_n.c   |   2 +
 libatomic/fand_n.c   |   2 +
 libatomic/fence.c|   2 +
 libatomic/fenv.c |   2 +
 libatomic/fior_n.c   |   2 +
 libatomic/flag.c |   2 +
 libatomic/fnand_n.c  |   2 +
 libatomic/fop_n.c|   2 +
 libatomic/fsub_n.c   |   2 +
 libatomic/fxor_n.c   |   2 +
 libatomic/gcas.c |   2 +
 libatomic/gexch.c|   2 +
 libatomic/glfree.c   |   2 +
 libatomic/gload.c|   2 +
 libatomic/gstore.c   |   2 +
 libatomic/load_n.c   |   2 +
 libatomic/store_n.c

[PATCH v2] middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061]

2024-06-11 Thread Victor Do Nascimento

At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'.  A simple example of such
loop is given below:

void foo(double * restrict a, double * restrict b, int n){
  int i;
  for(i=0; i *references)
clobbers_memory = true;
break;
  }
+  else if (gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH))
+   clobbers_memory = false;
   else
clobbers_memory = true;
 }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c471f1564a7..89cc6e64589 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12177,8 +12177,10 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   !gsi_end_p (si);)
{
  stmt = gsi_stmt (si);
- /* During vectorization remove existing clobber stmts.  */
- if (gimple_clobber_p (stmt))
+ /* During vectorization remove existing clobber stmts and
+prefetches.  */
+ if (gimple_clobber_p (stmt)
+ || gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH))
{
  unlink_stmt_vdef (stmt);
  gsi_remove (, true);
-- 
2.34.1

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Victor Do Nascimento


Dear Richard and Tamar,

Thanks to the both of you for the various bits of feedback.
I've implemented all the more straightforward bits of feedback given, 
leaving "only" the merging of the two- and four-way dot product optabs 
into one, together with the necessary changes to the various backends 
which, though a little time-consuming, should be rather mechanical.


I had originally implemented the new two-way dotprod optab as a covert 
optab anyway, so going back to the work on that local branch will give 
me a good starting point from which to do this.


And Tamar, thanks very much for the feedback regarding the unit-tests. 
I knew my testing as it currently is was rather anaemic and was eager to 
get the relevant feedback on it.  Rest assured it's all been taken on board.


Cheers,
Victor


On 5/17/24 11:13, Richard Biener wrote:

On Fri, May 17, 2024 at 11:56 AM Tamar Christina
 wrote:



-Original Message-
From: Richard Biener 
Sent: Friday, May 17, 2024 10:46 AM
To: Tamar Christina 
Cc: Victor Do Nascimento ; gcc-
patc...@gcc.gnu.org; Richard Sandiford ; Richard
Earnshaw ; Victor Do Nascimento

Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
autovectorizer

On Fri, May 17, 2024 at 11:05 AM Tamar Christina
 wrote:



-Original Message-
From: Richard Biener 
Sent: Friday, May 17, 2024 6:51 AM
To: Victor Do Nascimento 
Cc: gcc-patches@gcc.gnu.org; Richard Sandiford

;

Richard Earnshaw ; Victor Do Nascimento

Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
autovectorizer

On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
 wrote:


From: Victor Do Nascimento 

At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
optabs for dealing with vectorizable dot product code sequences.  The
consequence of using a direct optab for this is that backend-pattern
selection is only ever able to match against one datatype - Either
that of the operands or of the accumulated value, never both.

With the introduction of the 2-way (un)signed dot-product insn [1][2]
in AArch64 SVE2, the existing direct opcode approach is no longer
sufficient for full specification of all the possible dot product
machine instructions to be matched to the code sequence; a dot product
resulting in VNx4SI may result from either dot products on VNx16QI or
VNx8HI values for the 4- and 2-way dot product operations, respectively.

This means that the following example fails autovectorization:

uint32_t foo(int n, uint16_t* data) {
   uint32_t sum = 0;
   for (int i=0; i

I don't like this too much.  I'll note we document dot_prod as

@cindex @code{sdot_prod@var{m}} instruction pattern
@item @samp{sdot_prod@var{m}}

Compute the sum of the products of two signed elements.
Operand 1 and operand 2 are of the same mode. Their
product, which is of a wider mode, is computed and added to operand 3.
Operand 3 is of a mode equal or wider than the mode of the product. The
result is placed in operand 0, which is of the same mode as operand 3.
@var{m} is the mode of operand 1 and operand 2.

with no restriction on the wider mode but we don't specify it which is
bad design.  This should have been a convert optab with two modes
from the start - adding a _twoway variant is just a hack.


We did discuss this at the time we started implementing it.  There was two
options, one was indeed to change it to a convert dot_prod optab, but doing
this means we have to update every target that uses it.

Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and

altivec.


Which sure could be possible, but there's also every use in the backends that

need

to be updated, and tested, which for some targets we don't even know how to

begin.


So it seems very hard to correct dotprod to a convert optab now.


It's still the correct way to go.  At _least_ your new pattern should
have been this,
otherwise what do you do when you have two-way, four-way and eight-way
variants?
Add yet another optab?


I guess that's fair, but having the new optab only be convert resulted in messy
code as everywhere you must check for both variants.

Additionally that optab would then overlap with the existing optabs as, as you
Say, the documentation only says it's of a wider type and doesn't indicate
precision.

So to avoid issues down the line then If the new optab isn't acceptable then
we'll have to do a wholesale conversion then..


Yep.  It shouldn't be difficult though.



Another thing is that when you do it your way you should fix the existing optab
to be two-way by documenting how the second mode derives from the first.

And sure, it's not the only optab suffering from this issue.


Sure, all the zero and sign extending optabs for instance 


But for example the scalar ones are correct:

OPTAB_CL(sext_optab, "extend$b$a2", SIGN_EXTEND, "extend",
gen_extend_conv_libfunc)

Richard.


Tamar



Richard.


Tamar



Richard.


In order to minimize changes to the existing

[PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Victor Do Nascimento

From: Victor Do Nascimento 

At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
optabs for dealing with vectorizable dot product code sequences.  The
consequence of using a direct optab for this is that backend-pattern
selection is only ever able to match against one datatype - Either
that of the operands or of the accumulated value, never both.

With the introduction of the 2-way (un)signed dot-product insn [1][2]
in AArch64 SVE2, the existing direct opcode approach is no longer
sufficient for full specification of all the possible dot product
machine instructions to be matched to the code sequence; a dot product
resulting in VNx4SI may result from either dot products on VNx16QI or
VNx8HI values for the 4- and 2-way dot product operations, respectively.

This means that the following example fails autovectorization:

uint32_t foo(int n, uint16_t* data) {
  uint32_t sum = 0;
  for (int i=0; ihttps://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
[2] 
https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4sivnx8hi):
renamed to `dot_prod_twoway_vnx8hi'.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
update icodes used in line with above rename.
* optabs-tree.cc (optab_for_tree_code_1): Renamed
`optab_for_tree_code' and added new argument.
(optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
* optabs-tree.h (optab_for_tree_code_1): New.
* optabs.cc (expand_widen_pattern_expr): Expand support for
DOT_PROD_EXPR patterns.
* optabs.def (udot_prod_twoway_optab): New.
(sdot_prod_twoway_optab): Likewise.
* tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
support for misc optabs that use two modes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-dotprod-twoway.c: New.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
 gcc/config/aarch64/aarch64-sve2.md|  2 +-
 gcc/optabs-tree.cc| 23 --
 gcc/optabs-tree.h |  2 ++
 gcc/optabs.cc |  2 +-
 gcc/optabs.def|  2 ++
 .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
 gcc/tree-vect-patterns.cc |  2 +-
 8 files changed, 54 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 0d2edf3f19e..e457db09f66 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -764,8 +764,8 @@ public:
   icode = (e.type_suffix (0).float_p
   ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
   : e.type_suffix (0).unsigned_p
-  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
-  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
+  ? CODE_FOR_udot_prod_twoway_vnx8hi
+  : CODE_FOR_sdot_prod_twoway_vnx8hi);
 return e.use_unpred_insn (icode);
   }
 };
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 934e57055d3..5677de7108d 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -2021,7 +2021,7 @@ (define_insn "@aarch64_sve_qsub__lane_"
 )
 
 ;; Two-way dot-product.
-(define_insn "@aarch64_sve_dotvnx4sivnx8hi"
+(define_insn "dot_prod_twoway_vnx8hi"
   [(set (match_operand:VNx4SI 0 "register_operand")
(plus:VNx4SI
  (unspec:VNx4SI
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index b69a5bc3676..e3c5a618ea2 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
cannot give complete results for multiplication or division) but probably
ought to be relied on more widely throughout the expander.  */
 optab
-optab_for_tree_code (enum tree_code code, const_tree type,
-enum optab_subtype subtype)
+optab_for_tree_code_1 (enum tree_code code, const_tree type,
+  const_tree otype, enum optab_subtype subtype)
 {
   bool trapv;
   switch (code)
@@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 
 case DOT_PROD_EXPR:
   {
+   if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
+ == TYPE_PRECISION (TREE_TYPE (otype
+ {
+   if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
+ return udot_prod_twoway_optab;
+   if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
+

Re: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento


On 5/16/24 15:16, Andrew Pinski wrote:



On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento 
mailto:victor.donascime...@arm.com>> wrote:


At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'.  A simple example of such
loop is given below:

void foo(double * restrict a, double * restrict b, int n){
   int i;
   for(i=0; iThis most likely be tree-optimization/114061 since it is a generic 
vectorizer issue. Oh maybe reference the bug # in summary next time just 
for easier reference.


Thanks,
Andrew


My bad.

You're right, it's tree-optimization/114061.  Thanks for catching this.

Cheers,
Victor



gcc/ChangeLog:

         * tree-data-ref.cc (get_references_in_stmt): set
         `clobbers_memory' to false for __builtin_prefetch.
         * tree-vect-loop.cc (vect_transform_loop): Drop all
         __builtin_prefetch calls from loops.

gcc/testsuite/ChangeLog:

         * gcc.dg/vect/vect-prefetch-drop.c: New test.
---
  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++
  gcc/tree-data-ref.cc                           |  9 +
  gcc/tree-vect-loop.cc                          |  7 ++-
  3 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
new file mode 100644
index 000..57723a8c972
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve
-fdump-tree-vect-details" { target { aarch64*-*-* } } } */
+
+void foo(double * restrict a, double * restrict b, int n){
+  int i;
+  for(i=0; i+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" 
} } */

diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index f37734b5340..47bfec0f922 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt,
vec *references)
             clobbers_memory = true;
             break;
           }
+
+      else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+       {
+         enum built_in_function fn_type = DECL_FUNCTION_CODE
(TREE_OPERAND (gimple_call_fn (stmt), 0));
+         if (fn_type == BUILT_IN_PREFETCH)
+           clobbers_memory = false;
+         else
+           clobbers_memory = true;
+       }
        else
         clobbers_memory = true;
      }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 361aec06488..65e8b421d80 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12069,13 +12069,18 @@ vect_transform_loop (loop_vec_info
loop_vinfo, gimple *loop_vectorized_call)
            !gsi_end_p (si);)
         {
           stmt = gsi_stmt (si);
-         /* During vectorization remove existing clobber stmts.  */
+         /* During vectorization remove existing clobber stmts and
+            prefetches.  */
           if (gimple_clobber_p (stmt))
             {
               unlink_stmt_vdef (stmt);
               gsi_remove (, true);
               release_defs (stmt);
             }
+         else if (gimple_call_builtin_p (stmt) &&
+                  DECL_FUNCTION_CODE (TREE_OPERAND (gimple_call_fn
(stmt),
+                                                    0)) ==
BUILT_IN_PREFETCH)
+               gsi_remove (, true);
           else
             {
               /* Ignore vector stmts created in the outer loop.  */
-- 
2.34.1

[PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento

At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'.  A simple example of such
loop is given below:

void foo(double * restrict a, double * restrict b, int n){
  int i;
  for(i=0; i *references)
clobbers_memory = true;
break;
  }
+
+  else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+   {
+ enum built_in_function fn_type = DECL_FUNCTION_CODE (TREE_OPERAND 
(gimple_call_fn (stmt), 0));
+ if (fn_type == BUILT_IN_PREFETCH)
+   clobbers_memory = false;
+ else
+   clobbers_memory = true;
+   }
   else
clobbers_memory = true;
 }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 361aec06488..65e8b421d80 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12069,13 +12069,18 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   !gsi_end_p (si);)
{
  stmt = gsi_stmt (si);
- /* During vectorization remove existing clobber stmts.  */
+ /* During vectorization remove existing clobber stmts and
+prefetches.  */
  if (gimple_clobber_p (stmt))
{
  unlink_stmt_vdef (stmt);
  gsi_remove (, true);
  release_defs (stmt);
}
+ else if (gimple_call_builtin_p (stmt) &&
+  DECL_FUNCTION_CODE (TREE_OPERAND (gimple_call_fn (stmt),
+0)) == BUILT_IN_PREFETCH)
+   gsi_remove (, true);
  else
{
  /* Ignore vector stmts created in the outer loop.  */
-- 
2.34.1

[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-05-16 Thread Victor Do Nascimento

The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.

These operations are single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.

libatomic/ChangeLog:

* configure.ac: Add call to LIBAT_TEST_FEAT_LRCPC3() test.
* configure: Regenerate.
* config/linux/aarch64/host-config.h (has_rcpc3): New.
(HWCAP2_LRCPC3): Likewise.
(LSE2_LRCPC3_ATOP): Likewise.
* libatomic/config/linux/aarch64/atomic_16.S: New +rcpc3 .arch
directives.
* config/linux/aarch64/atomic_16.S (libat_load_16): Add LRCPC3
variant.
(libat_store_16): Likewise.
* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LRCPC3): New.
(HAVE_FEAT_LRCPC3): Likewise
(ARCH_AARCH64_HAVE_LRCPC3): Likewise.
* auto-config.h.in (HAVE_FEAT_LRCPC3): New.
---
 libatomic/acinclude.m4   | 18 +++
 libatomic/auto-config.h.in   |  3 ++
 libatomic/config/linux/aarch64/atomic_16.S   | 55 +++-
 libatomic/config/linux/aarch64/host-config.h | 39 --
 libatomic/configure  | 41 +++
 libatomic/configure.ac   |  1 +
 6 files changed, 152 insertions(+), 5 deletions(-)

diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index 6d2e0b1c355..628275b9945 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -101,6 +101,24 @@ AC_DEFUN([LIBAT_TEST_FEAT_AARCH64_LSE128],[
[Have LSE128 support for 16 byte integers.])
 ])
 
+dnl
+dnl Test if the host assembler supports armv8.2-a RCPC3 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_AARCH64_LRCPC3],[
+  AC_CACHE_CHECK([for armv8.2-a LRCPC3 insn support],
+[libat_cv_have_feat_lrcpc3],[
+AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv8.2-a+rcpc3")])])
+if AC_TRY_EVAL(ac_link); then
+  eval libat_cv_have_feat_lrcpc3=yes
+else
+  eval libat_cv_have_feat_lrcpc3=no
+fi
+rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LRCPC3], [$libat_cv_have_feat_lrcpc3],
+   [Have LRCPC3 support for 16 byte integers.])
+])
+
 dnl
 dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
 dnl
diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
index 7c78933b07d..a925686effa 100644
--- a/libatomic/auto-config.h.in
+++ b/libatomic/auto-config.h.in
@@ -108,6 +108,9 @@
 /* Have LSE128 support for 16 byte integers. */
 #undef HAVE_FEAT_LSE128
 
+/* Have LRCPC3 support for 16 byte integers. */
+#undef HAVE_FEAT_LRCPC3
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_FENV_H
 
diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 27363f82b75..47ceb7301c9 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -42,7 +42,13 @@
 
 #if HAVE_IFUNC
 # if HAVE_FEAT_LSE128
+#  if HAVE_FEAT_LRCPC3
+   .arch   armv9-a+lse128+rcpc3
+#  else
.arch   armv9-a+lse128
+#  endif
+# elif HAVE_FEAT_LRCPC3
+   .arch   armv8-a+lse+rcpc3
 # else
.arch   armv8-a+lse
 # endif
@@ -50,9 +56,20 @@
.arch   armv8-a+lse
 #endif
 
+/* There is overlap in some atomic instructions being implemented in both RCPC3
+   and LSE2 extensions, so both _i1 and _i2 suffixes are needed in such
+   situations.  Otherwise, all extension-specific implementations are mapped
+   to _i1.  */
+
+#if HAVE_FEAT_LRCPC3
+# define LRCPC3(NAME)  libat_##NAME##_i1
+# define LSE2(NAME)libat_##NAME##_i2
+#else
+# define LSE2(NAME)libat_##NAME##_i1
+#endif
+
 #define LSE128(NAME)   libat_##NAME##_i1
 #define LSE(NAME)  libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i1
 #define CORE(NAME) libat_##NAME
 #define ATOMIC(NAME)   __atomic_##NAME
 
@@ -722,6 +739,42 @@ ENTRY_FEAT (and_fetch_16, LSE128)
ret
 END_FEAT (and_fetch_16, LSE128)
 #endif /* HAVE_FEAT_LSE128 */
+
+
+#if HAVE_FEAT_LRCPC3
+ENTRY_FEAT (load_16, LRCPC3)
+   cbnzw1, 1f
+
+   /* RELAXED.  */
+   ldp res0, res1, [x0]
+   ret
+1:
+   cmp w1, SEQ_CST
+   b.eq2f
+
+   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
+   ldiapp  res0, res1, [x0]
+   ret
+
+   /* SEQ_CST.  */
+2: ldartmp0, [x0]  /* Block reordering with Store-Release instr.  
*/
+   ldiapp  res0, res1, [x0]
+   ret
+END_FEAT (load_16, LRCPC3)
+
+
+ENTRY_FEAT (store_16, LRCPC3)
+   cbnzw4, 1f
+
+   /* RELAXED.  */
+   stp in0, in1, [x0]
+   ret
+
+   /* RELEASE/SEQ_CST.  */
+1: stilp   in0, in1, [x0]
+   ret
+END_FEAT

[PATCH 1/4] Libatomic: Define per-file identifier macros

2024-05-16 Thread Victor Do Nascimento

In order to facilitate the fine-tuning of how `libatomic_i.h' and
`host-config.h' headers are used by different atomic functions, we
define distinct identifier macros for each file which, in implementing
atomic operations, imports these headers.

The idea is that different parts of these headers could then be
conditionally defined depending on the macros set by the file that
`#include'd them.

Given how it is possible that some file names are generic enough that
using them as-is for macro names (e.g. flag.c -> FLAG) may potentially
lead to name clashes with other macros, all file names first have LAT_
prepended to them such that, for example, flag.c is assigned the
LAT_FLAG macro.

Libatomic/ChangeLog:

* cas_n.c (LAT_CAS_N): New.
* exch_n.c (LAT_EXCH_N): Likewise.
* fadd_n.c (LAT_FADD_N): Likewise.
* fand_n.c (LAT_FAND_N): Likewise.
* fence.c (LAT_FENCE): Likewise.
* fenv.c (LAT_FENV): Likewise.
* fior_n.c (LAT_FIOR_N): Likewise.
* flag.c (LAT_FLAG): Likewise.
* fnand_n.c (LAT_FNAND_N): Likewise.
* fop_n.c (LAT_FOP_N): Likewise
* fsub_n.c (LAT_FSUB_N): Likewise.
* fxor_n.c (LAT_FXOR_N): Likewise.
* gcas.c (LAT_GCAS): Likewise.
* gexch.c (LAT_GEXCH): Likewise.
* glfree.c (LAT_GLFREE): Likewise.
* gload.c (LAT_GLOAD): Likewise.
* gstore.c (LAT_GSTORE): Likewise.
* load_n.c (LAT_LOAD_N): Likewise.
* store_n.c (LAT_STORE_N): Likewise.
* tas_n.c (LAT_TAS_N): Likewise.
---
 libatomic/cas_n.c   | 2 ++
 libatomic/exch_n.c  | 2 ++
 libatomic/fadd_n.c  | 2 ++
 libatomic/fand_n.c  | 2 ++
 libatomic/fence.c   | 2 ++
 libatomic/fenv.c| 2 ++
 libatomic/fior_n.c  | 2 ++
 libatomic/flag.c| 2 ++
 libatomic/fnand_n.c | 2 ++
 libatomic/fop_n.c   | 2 ++
 libatomic/fsub_n.c  | 2 ++
 libatomic/fxor_n.c  | 2 ++
 libatomic/gcas.c| 2 ++
 libatomic/gexch.c   | 2 ++
 libatomic/glfree.c  | 2 ++
 libatomic/gload.c   | 2 ++
 libatomic/gstore.c  | 2 ++
 libatomic/load_n.c  | 2 ++
 libatomic/store_n.c | 2 ++
 libatomic/tas_n.c   | 2 ++
 20 files changed, 40 insertions(+)

diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index a080b990371..2a6357e48db 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_CAS_N
 #include "libatomic_i.h"
 
 
@@ -122,3 +123,4 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, 
UTYPE newval,
 #endif
 
 EXPORT_ALIAS (SIZE(compare_exchange));
+#undef LAT_CAS_N
diff --git a/libatomic/exch_n.c b/libatomic/exch_n.c
index e5ff80769b9..184d3de1009 100644
--- a/libatomic/exch_n.c
+++ b/libatomic/exch_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_EXCH_N
 #include "libatomic_i.h"
 
 
@@ -126,3 +127,4 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel 
UNUSED)
 #endif
 
 EXPORT_ALIAS (SIZE(exchange));
+#undef LAT_EXCH_N
diff --git a/libatomic/fadd_n.c b/libatomic/fadd_n.c
index bc15b8bc0e6..32b75cec654 100644
--- a/libatomic/fadd_n.c
+++ b/libatomic/fadd_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FADD_N
 #include 
 
 #define NAME   add
@@ -43,3 +44,4 @@
 #endif
 
 #include "fop_n.c"
+#undef LAT_FADD_N
diff --git a/libatomic/fand_n.c b/libatomic/fand_n.c
index ffe9ed8700f..9eab55bcd72 100644
--- a/libatomic/fand_n.c
+++ b/libatomic/fand_n.c
@@ -1,3 +1,5 @@
+#define LAT_FAND_N
 #define NAME   and
 #define OP(X,Y)((X) & (Y))
 #include "fop_n.c"
+#undef LAT_FAND_N
diff --git a/libatomic/fence.c b/libatomic/fence.c
index a9b1e280c5a..4022194a57a 100644
--- a/libatomic/fence.c
+++ b/libatomic/fence.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FENCE
 #include "libatomic_i.h"
 
 #include 
@@ -43,3 +44,4 @@ void
 {
   atomic_signal_fence (order);
 }
+#undef LAT_FENCE
diff --git a/libatomic/fenv.c b/libatomic/fenv.c
index 41f187c1f85..dccad356a31 100644
--- a/libatomic/fenv.c
+++ b/libatomic/fenv.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FENV
 #include "libatomic_i.h"
 
 #ifdef HAVE_FENV_H
@@ -70,3 +71,4 @@ __atomic_feraiseexcept (int excepts __attribute__ ((unused)))
 }
 #endif
 }
+#undef LAT_FENV
diff --git a/libatomic/fior_n.c b/libatomic/fior_n.c
index 55d0d66b469..2b58d4805d6 100644
--- a/libatomic/fior_n.c
+++ b/libatomic/fior_n.c
@@ -1,3 +1,5 @@
+#define LAT_FIOR_N
 #define NAME   or
 #define OP(X,Y)((X) | (Y))
 #include "fop_n.c"
+#undef LAT_FIOR_N
diff --git a/libatomic/flag.c b/libatomic/flag.c
index e4a5a27819a..8afd80c9130 100644
---

[PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file

2024-05-16 Thread Victor Do Nascimento

At present, `atomic_16.S' groups different implementations of the
same functions together in the file.  Therefore, as an example,
the LSE128 implementation of `exchange_16' follows on immediately
from its core implementation, as does the `fetch_or_16' LSE128
implementation.

Such architectural extension-dependent implementations are dependent
both on ifunc and assembler support.  They may therefore conceivably
be guarded by 2 preprocessor macros, e.g. `#if HAVE_IFUNC' and `#if
HAVE_FEAT_LSE128'.

Having to apply these guards on a per-function basis adds unnecessary
clutter to the file and makes its maintenance more error-prone.

We therefore reorganize the layout of the file in such a way that all
core implementations needing no `#ifdef's are placed first, followed
by all ifunc-dependent implementations, which can all be guarded by a
single `#if HAVE_IFUNC'.  Within the guard, these are then subdivided
and organized according to architectural extension requirements such
that in the case of LSE128-specific functions, for example, they can
all be guarded by a single `#if HAVE_FEAT_LSE128', greatly reducing
the overall number of required `#ifdef' macros.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S: reshuffle functions.
---
 libatomic/config/linux/aarch64/atomic_16.S | 583 ++---
 1 file changed, 288 insertions(+), 295 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 16ff03057ab..27363f82b75 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,15 +40,12 @@
 
 #include "auto-config.h"
 
-#if !HAVE_IFUNC
-# undef HAVE_FEAT_LSE128
-# define HAVE_FEAT_LSE128 0
-#endif
-
-#define HAVE_FEAT_LSE2 HAVE_IFUNC
-
-#if HAVE_FEAT_LSE128
+#if HAVE_IFUNC
+# if HAVE_FEAT_LSE128
.arch   armv9-a+lse128
+# else
+   .arch   armv8-a+lse
+# endif
 #else
.arch   armv8-a+lse
 #endif
@@ -124,6 +121,8 @@ NAME:   \
 #define ACQ_REL 4
 #define SEQ_CST 5
 
+/* Core atomic operation implementations.  These are available irrespective of
+   ifunc support or the presence of additional architectural extensions.  */
 
 ENTRY (load_16)
mov x5, x0
@@ -143,31 +142,6 @@ ENTRY (load_16)
 END (load_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (load_16, LSE2)
-   cbnzw1, 1f
-
-   /* RELAXED.  */
-   ldp res0, res1, [x0]
-   ret
-1:
-   cmp w1, SEQ_CST
-   b.eq2f
-
-   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
-   ldp res0, res1, [x0]
-   dmb ishld
-   ret
-
-   /* SEQ_CST.  */
-2: ldartmp0, [x0]  /* Block reordering with Store-Release instr.  
*/
-   ldp res0, res1, [x0]
-   dmb ishld
-   ret
-END_FEAT (load_16, LSE2)
-#endif
-
-
 ENTRY (store_16)
cbnzw4, 2f
 
@@ -185,23 +159,6 @@ ENTRY (store_16)
 END (store_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (store_16, LSE2)
-   cbnzw4, 1f
-
-   /* RELAXED.  */
-   stp in0, in1, [x0]
-   ret
-
-   /* RELEASE/SEQ_CST.  */
-1: ldxpxzr, tmp0, [x0]
-   stlxp   w4, in0, in1, [x0]
-   cbnzw4, 1b
-   ret
-END_FEAT (store_16, LSE2)
-#endif
-
-
 ENTRY (exchange_16)
mov x5, x0
cbnzw4, 2f
@@ -229,31 +186,6 @@ ENTRY (exchange_16)
 END (exchange_16)
 
 
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (exchange_16, LSE128)
-   mov tmp0, x0
-   mov res0, in0
-   mov res1, in1
-   cbnzw4, 1f
-
-   /* RELAXED.  */
-   swppres0, res1, [tmp0]
-   ret
-1:
-   cmp w4, ACQUIRE
-   b.hi2f
-
-   /* ACQUIRE/CONSUME.  */
-   swppa   res0, res1, [tmp0]
-   ret
-
-   /* RELEASE/ACQ_REL/SEQ_CST.  */
-2: swppal  res0, res1, [tmp0]
-   ret
-END_FEAT (exchange_16, LSE128)
-#endif
-
-
 ENTRY (compare_exchange_16)
ldp exp0, exp1, [x1]
cbz w4, 3f
@@ -301,43 +233,97 @@ ENTRY (compare_exchange_16)
 END (compare_exchange_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE)
-   ldp exp0, exp1, [x1]
-   mov tmp0, exp0
-   mov tmp1, exp1
-   cbz w4, 2f
-   cmp w4, RELEASE
-   b.hs3f
+ENTRY (fetch_or_16)
+   mov x5, x0
+   cbnzw4, 2f
 
-   /* ACQUIRE/CONSUME.  */
-   caspa   exp0, exp1, in0, in1, [x0]
-0:
-   cmp exp0, tmp0
-   ccmpexp1, tmp1, 0, eq
-   bne 1f
-   mov x0, 1
+   /* RELAXED.  */
+1: ldxpres0, res1, [x5]
+   orr tmp0, res0, in0
+   orr tmp1, res1, in1
+   stxpw4, tmp0, tmp1, [x5]
+   cbnzw4, 1b
ret
-1:
-   stp exp0, exp1, [x1]
-   mov x0, 0
+
+   /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
+2: ldaxp   res0, res1, [x5]
+   orr tmp0, res0, in0
+   orr tmp1, res1, in1
+   stlxp   w4, tmp0, tmp1, [x5]
+   cbnz

[PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file

2024-05-16 Thread Victor Do Nascimento

By querying previously-defined file-identifier macros, `host-config.h'
is able to get information about its environment and, based on this
information, select more appropriate function-specific ifunc
selectors.  This reduces the number of unnecessary feature tests that
need to be carried out in order to find the best atomic implementation
for a function at run-time.

An immediate benefit of this is that we can further fine-tune the
architectural requirements for each atomic function without risk of
incurring the maintenance and runtime-performance penalties of having
to maintain an ifunc selector with a huge number of alternatives, most
of which are irrelevant for any particular function.  Consequently,
for AArch64 targets, we relax the architectural requirements of
`compare_exchange_16', which now requires only LSE as opposed to the
newer LSE2.

The new flexibility provided by this approach also means that certain
functions can now be called directly, doing away with ifunc selectors
altogether when only a single implementation is available for it on a
given target.  As per the macro expansion framework laid out in
`libatomic_i.h', such functions should have their names prefixed with
`__atomic_' as opposed to `libat_'.  This is the same prefix applied
to function names when Libatomic is configured with
`--disable-gnu-indirect-function'.

To achieve this, these functions unconditionally apply the aliasing
rule that at present is conditionally applied only when libatomic is
built without ifunc support, which ensures that the default
`libat_##NAME' is accessible via the equivalent `__atomic_##NAME' too.
This is ensured by using the new `ENTRY_ALIASED' macro.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S (LSE): New.
(ENTRY_ALIASED): Likewise.
* config/linux/aarch64/host-config.h (LSE_ATOP): New.
(LSE2_ATOP): Likewise.
(LSE128_ATOP): Likewise.
(IFUNC_COND_1): Make its definition conditional on above 3
macros.
(IFUNC_NCOND): Likewise.
---
 libatomic/config/linux/aarch64/atomic_16.S   | 31 +
 libatomic/config/linux/aarch64/host-config.h | 35 
 2 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index b63e97ac5a2..1517e9e78df 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -54,17 +54,20 @@
 #endif
 
 #define LSE128(NAME)   libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i2
+#define LSE(NAME)  libat_##NAME##_i1
+#define LSE2(NAME) libat_##NAME##_i1
 #define CORE(NAME) libat_##NAME
 #define ATOMIC(NAME)   __atomic_##NAME
 
+/* Emit __atomic_* entrypoints if no ifuncs.  */
+#define ENTRY_ALIASED(NAME)ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+
 #if HAVE_IFUNC
 # define ENTRY(NAME)   ENTRY2 (CORE (NAME), )
 # define ENTRY_FEAT(NAME, FEAT) ENTRY2 (FEAT (NAME), )
 # define END_FEAT(NAME, FEAT)  END2 (FEAT (NAME))
 #else
-/* Emit __atomic_* entrypoints if no ifuncs.  */
-# define ENTRY(NAME)   ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+# define ENTRY(NAME)   ENTRY_ALIASED (NAME)
 #endif
 
 #define END(NAME)  END2 (CORE (NAME))
@@ -299,7 +302,7 @@ END (compare_exchange_16)
 
 
 #if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE2)
+ENTRY_FEAT (compare_exchange_16, LSE)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -332,11 +335,11 @@ ENTRY_FEAT (compare_exchange_16, LSE2)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END_FEAT (compare_exchange_16, LSE2)
+END_FEAT (compare_exchange_16, LSE)
 #endif
 
 
-ENTRY (fetch_add_16)
+ENTRY_ALIASED (fetch_add_16)
mov x5, x0
cbnzw4, 2f
 
@@ -358,7 +361,7 @@ ENTRY (fetch_add_16)
 END (fetch_add_16)
 
 
-ENTRY (add_fetch_16)
+ENTRY_ALIASED (add_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -380,7 +383,7 @@ ENTRY (add_fetch_16)
 END (add_fetch_16)
 
 
-ENTRY (fetch_sub_16)
+ENTRY_ALIASED (fetch_sub_16)
mov x5, x0
cbnzw4, 2f
 
@@ -402,7 +405,7 @@ ENTRY (fetch_sub_16)
 END (fetch_sub_16)
 
 
-ENTRY (sub_fetch_16)
+ENTRY_ALIASED (sub_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -624,7 +627,7 @@ END_FEAT (and_fetch_16, LSE128)
 #endif
 
 
-ENTRY (fetch_xor_16)
+ENTRY_ALIASED (fetch_xor_16)
mov x5, x0
cbnzw4, 2f
 
@@ -646,7 +649,7 @@ ENTRY (fetch_xor_16)
 END (fetch_xor_16)
 
 
-ENTRY (xor_fetch_16)
+ENTRY_ALIASED (xor_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -668,7 +671,7 @@ ENTRY (xor_fetch_16)
 END (xor_fetch_16)
 
 
-ENTRY (fetch_nand_16)
+ENTRY_ALIASED (fetch_nand_16)
mov x5, x0
mvn in0, in0
mvn in1, in1
@@ -692,7 +695,7 @@ ENTRY (fetch_nand_16)
 END (fetch_nand_16)
 
 
-ENTRY (nand_fetch_16)
+ENTRY_ALIASED

[PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing

2024-05-16 Thread Victor Do Nascimento

Following improvements to the way ifuncs are selected based on
detected architectural features, we are able to do away with many of
the aliases that were previously needed for subsets of atomic
functions that were not implemented in a given extension.

This may be clarified by virtue of an example. Before, LSE128
functions carried the suffix _i1 and LSE2 functions the _i2.

Using a single ifunc selector for all atomic functions meant that if
LSE128 was detected, the _i1 function variant would be used
indiscriminately, irrespective of whether or not a function had an
LSE128-specific implementation.  Aliasing was thus needed to redirect
calls to these missing functions to their _i2 LSE2 alternatives.

The more architectural extensions for which support was added, the
more complex the aliasing chain.

With the per-file configuration of ifuncs, we do away with the need
for such aliasing.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S: Remove unnecessary
aliasing.
---
 libatomic/config/linux/aarch64/atomic_16.S | 41 --
 1 file changed, 41 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 1517e9e78df..16ff03057ab 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -732,47 +732,6 @@ ENTRY_ALIASED (test_and_set_16)
 END (test_and_set_16)
 
 
-/* Alias entry points which are the same in LSE2 and LSE128.  */
-
-#if HAVE_IFUNC
-# if !HAVE_FEAT_LSE128
-ALIAS (exchange_16, LSE128, LSE2)
-ALIAS (fetch_or_16, LSE128, LSE2)
-ALIAS (fetch_and_16, LSE128, LSE2)
-ALIAS (or_fetch_16, LSE128, LSE2)
-ALIAS (and_fetch_16, LSE128, LSE2)
-# endif
-ALIAS (load_16, LSE128, LSE2)
-ALIAS (store_16, LSE128, LSE2)
-ALIAS (compare_exchange_16, LSE128, LSE2)
-ALIAS (fetch_add_16, LSE128, LSE2)
-ALIAS (add_fetch_16, LSE128, LSE2)
-ALIAS (fetch_sub_16, LSE128, LSE2)
-ALIAS (sub_fetch_16, LSE128, LSE2)
-ALIAS (fetch_xor_16, LSE128, LSE2)
-ALIAS (xor_fetch_16, LSE128, LSE2)
-ALIAS (fetch_nand_16, LSE128, LSE2)
-ALIAS (nand_fetch_16, LSE128, LSE2)
-ALIAS (test_and_set_16, LSE128, LSE2)
-
-/* Alias entry points which are the same in baseline and LSE2.  */
-
-ALIAS (exchange_16, LSE2, CORE)
-ALIAS (fetch_add_16, LSE2, CORE)
-ALIAS (add_fetch_16, LSE2, CORE)
-ALIAS (fetch_sub_16, LSE2, CORE)
-ALIAS (sub_fetch_16, LSE2, CORE)
-ALIAS (fetch_or_16, LSE2, CORE)
-ALIAS (or_fetch_16, LSE2, CORE)
-ALIAS (fetch_and_16, LSE2, CORE)
-ALIAS (and_fetch_16, LSE2, CORE)
-ALIAS (fetch_xor_16, LSE2, CORE)
-ALIAS (xor_fetch_16, LSE2, CORE)
-ALIAS (fetch_nand_16, LSE2, CORE)
-ALIAS (nand_fetch_16, LSE2, CORE)
-ALIAS (test_and_set_16, LSE2, CORE)
-#endif
-
 /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.  */
 #define FEATURE_1_AND 0xc000
 #define FEATURE_1_BTI 1
-- 
2.34.1

[PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing

2024-05-16 Thread Victor Do Nascimento

The recent introduction of the optional LSE128 and RCPC3 architectural
extensions to AArch64 has further led to the increased flexibility of
atomic support in the architecture, with many extensions providing
support for distinct atomic operations, each with different potential
applications in mind.

This has led to maintenance difficulties in Libatomic, in particular
regarding the way the ifunc selector is generated via a series of
macro expansions at compile-time.

Until now, irrespective of the atomic operation in question, all atomic
functions for a particular operand size were expected to have the same
number of ifunc alternatives, meaning that a one-size-fits-all
approach could reasonably be taken for the selector.

This meant that if, hypothetically, for a particular architecture and
operand size one particular atomic operation was to have 3 different
implementations associated with different extensions, libatomic would
likewise be required to present three ifunc alternatives for all other
atomic functions.

The consequence in the design choice was the unnecessary use of
function aliasing and the unwieldy code which resulted from this.

This patch series attempts to remediate this issue by making the
preprocessor macros defining the number of ifunc alternatives and
their respective selection functions dependent on the file importing
the ifunc selector-generating framework.

all files are given `LAT_' macros, defined at the beginning
and undef'd at the end of the file.  It is these macros that are
subsequently used to fine-tune the behaviors of `libatomic_i.h' and
`host-config.h'.

In particular, the definition of the `IFUNC_NCOND(N)' and
`IFUNC_COND_' macros in host-config.h can now be guarded behind
these new file-specific macros, which ultimately control what the
`GEN_SELECTOR(X)' macro in `libatomic_i.h' expands to.  As both of
these headers are imported once per file implementing some atomic
operation, fine-tuned control is now possible.

Regtested with both `--enable-gnu-indirect-function' and
`--disable-gnu-indirect-function' configurations on armv9.4-a target
with LRCPC3 and LSE128 support and without.

Victor Do Nascimento (4):
  Libatomic: Define per-file identifier macros
  Libatomic: Make ifunc selector behavior contingent on importing file
  Libatomic: Clean up AArch64 ifunc aliasing
  Libatomic: Clean up AArch64 `atomic_16.S' implementation file

 libatomic/cas_n.c|   2 +
 libatomic/config/linux/aarch64/atomic_16.S   | 623 +--
 libatomic/config/linux/aarch64/host-config.h |  35 +-
 libatomic/exch_n.c   |   2 +
 libatomic/fadd_n.c   |   2 +
 libatomic/fand_n.c   |   2 +
 libatomic/fence.c|   2 +
 libatomic/fenv.c |   2 +
 libatomic/fior_n.c   |   2 +
 libatomic/flag.c |   2 +
 libatomic/fnand_n.c  |   2 +
 libatomic/fop_n.c|   2 +
 libatomic/fsub_n.c   |   2 +
 libatomic/fxor_n.c   |   2 +
 libatomic/gcas.c |   2 +
 libatomic/gexch.c|   2 +
 libatomic/glfree.c   |   2 +
 libatomic/gload.c|   2 +
 libatomic/gstore.c   |   2 +
 libatomic/load_n.c   |   2 +
 libatomic/store_n.c  |   2 +
 libatomic/tas_n.c|   2 +
 22 files changed, 357 insertions(+), 341 deletions(-)

-- 
2.34.1

Re: [PATCH] aarch64: Add +lse128 architectural extension command-line flag

2024-03-27 Thread Victor Do Nascimento


On 3/26/24 12:26, Richard Sandiford wrote:

Victor Do Nascimento  writes:

Given how, at present, the choice of using LSE128 atomic instructions
by the toolchain is delegated to run-time selection in the form of
Libatomic ifuncs, responsible for querying target support, the
`+lse128' target architecture compile-time flag is absent from GCC.

This, however, contrasts with the Binutils implementation, which gates
LSE128 instructions behind the `+lse128' flag.  This can lead to
problems in GCC for certain use-cases.  One such example is in the use
of inline assembly, whereby the inability of enabling the feature in
the command-line prevents the compiler from automatically issuing the
necessary LSE128 `.arch' directive.

This patch therefore brings GCC into alignment with LLVM and Binutils
in adding support for the `+lse128' architectural extension flag.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def: Add LSE128
AARCH64_OPT_EXTENSION, adding it as a dependency for the D128
feature.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/lse128-flag.c: New.
* gcc.target/aarch64/cpunative/info_23: Likewise.
* gcc.target/aarch64/cpunative/native_cpu_23.c: Likewise.


The new extension should be documented in doc/invoke.texi.


---
  gcc/config/aarch64/aarch64-option-extensions.def  |  4 +++-
  gcc/testsuite/gcc.target/aarch64/cpunative/info_23|  8 
  .../gcc.target/aarch64/cpunative/native_cpu_23.c  | 11 +++
  gcc/testsuite/gcc.target/aarch64/lse128-flag.c| 10 ++
  4 files changed, 32 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_23
  create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/lse128-flag.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 1a3b91c68cf..ac54b899a06 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -275,7 +275,9 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
  
  AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
  
-AARCH64_OPT_EXTENSION("d128", D128, (), (), (), "d128")

+AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")
+
+AARCH64_OPT_EXTENSION("d128", D128, (LSE128), (), (), "d128")
  
  AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
  
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_23 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_23

new file mode 100644
index 000..d77c25d2f61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_23
@@ -0,0 +1,8 @@
+processor  : 0
+BogoMIPS   : 100.00
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp atomics 
lse128
+CPU implementer: 0xfe
+CPU architecture: 8
+CPU variant: 0x0
+CPU part   : 0xd08
+CPU revision   : 2
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c
new file mode 100644
index 000..8a1e235d8ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
+/* { dg-set-compiler-env-var GCC_CPUINFO 
"$srcdir/gcc.target/aarch64/cpunative/info_23" } */
+/* { dg-additional-options "-mcpu=native" } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+lse128} 
} } */
+/* Test one where lse128 is available and so should be emitted.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/lse128-flag.c 
b/gcc/testsuite/gcc.target/aarch64/lse128-flag.c
new file mode 100644
index 000..71339c3af6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/lse128-flag.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { aarch64*-*-*} } } */
+/* { dg-additional-options "-march=armv9.4-a+lse128" } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv9\.4-a\+crc\+lse128} } } */
+/* Test a normal looking procinfo.  */


Not sure I understand the comment.  Is procinfo part of this test?


Thank you for catching this, Richard.  This was originally a 
procinfo-reliant test which ended up being redesigned without reference 
to procinfo and this remained.  I've now dropped it from the patch.


Cheers!


Looks good otherwise.

Thanks,
Richard

[PATCH] aarch64: Align lrcpc3 FEAT_STRING with /proc/cpuinfo 'Features' entry

2024-03-25 Thread Victor Do Nascimento

Due to the Linux kernel exposing the lrcpc3 architectural feature as
"lrcpc3", this patch corrects the relevant FEATURE_STRING entry in the
"rcpc3" AARCH64_OPT_FMV_EXTENSION macro, such that the feature can be
correctly detected when doing native compilation on rcpc3-enabled
targets.

Regtested on aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def: Fix 'lrcpc3'
entry.

gcc/testsuite/ChangeLog:

* testsuite/gcc.target/aarch64/cpunative/info_24: New.
* testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c:
Likewise.
---
 gcc/config/aarch64/aarch64-option-extensions.def  |  2 +-
 gcc/testsuite/gcc.target/aarch64/cpunative/info_24|  8 
 .../gcc.target/aarch64/cpunative/native_cpu_24.c  | 11 +++
 3 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_24
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 1a3b91c68cf..975e7b84cec 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -174,7 +174,7 @@ AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
 
 AARCH64_FMV_FEATURE("rcpc2", RCPC2, (RCPC))
 
-AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "rcpc3")
+AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
 
 AARCH64_FMV_FEATURE("frintts", FRINTTS, ())
 
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
new file mode 100644
index 000..8d3c16a1091
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
@@ -0,0 +1,8 @@
+processor  : 0
+BogoMIPS   : 100.00
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc3
+CPU implementer: 0xfe
+CPU architecture: 8
+CPU variant: 0x0
+CPU part   : 0xd08
+CPU revision   : 2
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c
new file mode 100644
index 000..05dc870885f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
+/* { dg-set-compiler-env-var GCC_CPUINFO 
"$srcdir/gcc.target/aarch64/cpunative/info_23" } */
+/* { dg-additional-options "-mcpu=native --save-temps " } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+rcpc3} } 
} */
+/* Test one where rcpc3 is available and so should be emitted.  */
-- 
2.34.1

[PATCH] aarch64: Add +lse128 architectural extension command-line flag

2024-03-15 Thread Victor Do Nascimento

Given how, at present, the choice of using LSE128 atomic instructions
by the toolchain is delegated to run-time selection in the form of
Libatomic ifuncs, responsible for querying target support, the
`+lse128' target architecture compile-time flag is absent from GCC.

This, however, contrasts with the Binutils implementation, which gates
LSE128 instructions behind the `+lse128' flag.  This can lead to
problems in GCC for certain use-cases.  One such example is in the use
of inline assembly, whereby the inability of enabling the feature in
the command-line prevents the compiler from automatically issuing the
necessary LSE128 `.arch' directive.

This patch therefore brings GCC into alignment with LLVM and Binutils
in adding support for the `+lse128' architectural extension flag.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def: Add LSE128
AARCH64_OPT_EXTENSION, adding it as a dependency for the D128
feature.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/lse128-flag.c: New.
* gcc.target/aarch64/cpunative/info_23: Likewise.
* gcc.target/aarch64/cpunative/native_cpu_23.c: Likewise.
---
 gcc/config/aarch64/aarch64-option-extensions.def  |  4 +++-
 gcc/testsuite/gcc.target/aarch64/cpunative/info_23|  8 
 .../gcc.target/aarch64/cpunative/native_cpu_23.c  | 11 +++
 gcc/testsuite/gcc.target/aarch64/lse128-flag.c| 10 ++
 4 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_23
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/lse128-flag.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 1a3b91c68cf..ac54b899a06 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -275,7 +275,9 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
-AARCH64_OPT_EXTENSION("d128", D128, (), (), (), "d128")
+AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")
+
+AARCH64_OPT_EXTENSION("d128", D128, (LSE128), (), (), "d128")
 
 AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
 
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_23 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_23
new file mode 100644
index 000..d77c25d2f61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_23
@@ -0,0 +1,8 @@
+processor  : 0
+BogoMIPS   : 100.00
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp atomics 
lse128
+CPU implementer: 0xfe
+CPU architecture: 8
+CPU variant: 0x0
+CPU part   : 0xd08
+CPU revision   : 2
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c
new file mode 100644
index 000..8a1e235d8ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_23.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
+/* { dg-set-compiler-env-var GCC_CPUINFO 
"$srcdir/gcc.target/aarch64/cpunative/info_23" } */
+/* { dg-additional-options "-mcpu=native" } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+lse128} 
} } */
+/* Test one where lse128 is available and so should be emitted.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/lse128-flag.c 
b/gcc/testsuite/gcc.target/aarch64/lse128-flag.c
new file mode 100644
index 000..71339c3af6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/lse128-flag.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { aarch64*-*-*} } } */
+/* { dg-additional-options "-march=armv9.4-a+lse128" } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv9\.4-a\+crc\+lse128} } } */
+/* Test a normal looking procinfo.  */
-- 
2.34.1

Re: [libatomic PATCH] PR other/113336: Fix libatomic testsuite regressions on ARM.

2024-02-14 Thread Victor Do Nascimento

Though I'm not in a position to approve the patch, I'm happy to confirm 
the proposed changes look good to me.


Thanks for the updated version,
Victor


On 1/28/24 16:24, Roger Sayle wrote:


This patch is a revised version of the fix for PR other/113336.

This patch has been tested on arm-linux-gnueabihf with --with-arch=armv6
with make bootstrap and make -k check where it fixes all of the FAILs in
libatomic.  Ok for mainline?


2024-01-28  Roger Sayle  
 Victor Do Nascimento  

libatomic/ChangeLog
 PR other/113336
 * Makefile.am: Build tas_1_2_.o on ARCH_ARM_LINUX
 * Makefile.in: Regenerate.

Thanks in advance.
Roger
--

[PATCH] AArch64: Update system register database.

2024-02-06 Thread Victor Do Nascimento

With the release of Binutils 2.42, this brings the level of
system-register support in GCC in line with the current
state-of-the-art in Binutils, ensuring everything available in
Binutils is plainly accessible from GCC.

Where Binutils uses a more detailed description of which features are
responsible for enabling a given system register, GCC aliases the
binutils-equivalent feature flag macro constant to that of the base
architecture implementing the feature, resulting in entries such as

  #define AARCH64_FL_S2PIE AARCH64_FL_V8_9A

in `aarch64.h', thus ensuring that the Binutils `aarch64-sys-regs.def'
file can be understood by GCC without the need for modification.

To accompany the addition of the new system registers, a new test is
added confirming they were successfully added to the list of
recognized registers.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-sys-regs.def: Copy from Binutils.
* /config/aarch64/aarch64.h (AARCH64_FL_AIE): New.
(AARCH64_FL_DEBUGv8p9): Likewise.
(AARCH64_FL_FGT2): Likewise.Likewise.
(AARCH64_FL_ITE): Likewise.
(AARCH64_FL_PFAR): Likewise.
(AARCH64_FL_PMUv3_ICNTR): Likewise.
(AARCH64_FL_PMUv3_SS): Likewise.
(AARCH64_FL_PMUv3p9): Likewise.
(AARCH64_FL_RASv2): Likewise.
(AARCH64_FL_S1PIE): Likewise.
(AARCH64_FL_S1POE): Likewise.
(AARCH64_FL_S2PIE): Likewise.
(AARCH64_FL_S2POE): Likewise.
(AARCH64_FL_SCTLR2): Likewise.
(AARCH64_FL_SEBEP): Likewise.
(AARCH64_FL_SPE_FDS): Likewise.
(AARCH64_FL_TCR2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rwsr-armv8p9.c: New.
---
 gcc/config/aarch64/aarch64-sys-regs.def   | 85 
 gcc/config/aarch64/aarch64.h  | 20 
 .../gcc.target/aarch64/acle/rwsr-armv8p9.c| 99 +++
 3 files changed, 204 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-armv8p9.c

diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def
index fffc35f72c8..6a948171d6e 100644
--- a/gcc/config/aarch64/aarch64-sys-regs.def
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -54,6 +54,10 @@
   SYSREG ("amair_el12",CPENC (3,5,10,3,0), F_ARCHEXT,  
AARCH64_FEATURE (V8_1A))
   SYSREG ("amair_el2", CPENC (3,4,10,3,0), 0,  
AARCH64_NO_FEATURES)
   SYSREG ("amair_el3", CPENC (3,6,10,3,0), 0,  
AARCH64_NO_FEATURES)
+  SYSREG ("amair2_el1",CPENC (3,0,10,3,1), F_ARCHEXT,  
AARCH64_FEATURE (AIE))
+  SYSREG ("amair2_el12",   CPENC (3,5,10,3,1), F_ARCHEXT,  
AARCH64_FEATURE (AIE))
+  SYSREG ("amair2_el2",CPENC (3,4,10,3,1), F_ARCHEXT,  
AARCH64_FEATURE (AIE))
+  SYSREG ("amair2_el3",CPENC (3,6,10,3,1), F_ARCHEXT,  
AARCH64_FEATURE (AIE))
   SYSREG ("amcfgr_el0",CPENC (3,3,13,2,1), 
F_REG_READ|F_ARCHEXT,   AARCH64_FEATURE (V8_4A))
   SYSREG ("amcg1idr_el0",  CPENC (3,3,13,2,6), F_REG_READ|F_ARCHEXT,   
AARCH64_FEATURE (V8_6A))
   SYSREG ("amcgcr_el0",CPENC (3,3,13,2,2), 
F_REG_READ|F_ARCHEXT,   AARCH64_FEATURE (V8_4A))
@@ -400,6 +404,7 @@
   SYSREG ("erxaddr_el1",   CPENC (3,0,5,4,3),  F_ARCHEXT,  
AARCH64_FEATURE (RAS))
   SYSREG ("erxctlr_el1",   CPENC (3,0,5,4,1),  F_ARCHEXT,  
AARCH64_FEATURE (RAS))
   SYSREG ("erxfr_el1", CPENC (3,0,5,4,0),  F_REG_READ|F_ARCHEXT,   
AARCH64_FEATURE (RAS))
+  SYSREG ("erxgsr_el1",CPENC (3,0,5,3,2),  
F_REG_READ|F_ARCHEXT,   AARCH64_FEATURE (RASv2))
   SYSREG ("erxmisc0_el1",  CPENC (3,0,5,5,0),  F_ARCHEXT,  
AARCH64_FEATURE (RAS))
   SYSREG ("erxmisc1_el1",  CPENC (3,0,5,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (RAS))
   SYSREG ("erxmisc2_el1",  CPENC (3,0,5,5,2),  F_ARCHEXT,  
AARCH64_FEATURE (RAS))
@@ -438,10 +443,14 @@
   SYSREG ("hcr_el2",   CPENC (3,4,1,1,0),  0,  
AARCH64_NO_FEATURES)
   SYSREG ("hcrx_el2",  CPENC (3,4,1,2,2),  F_ARCHEXT,  
AARCH64_FEATURE (V8_7A))
   SYSREG ("hdfgrtr_el2",   CPENC (3,4,3,1,4),  F_ARCHEXT,  
AARCH64_FEATURE (V8_6A))
+  SYSREG ("hdfgrtr2_el2",  CPENC (3,4,3,1,0),  F_ARCHEXT,  
AARCH64_FEATURE (FGT2))
   SYSREG ("hdfgwtr_el2",   CPENC (3,4,3,1,5),  F_ARCHEXT,  
AARCH64_FEATURE (V8_6A))
+  SYSREG ("hdfgwtr2_el2",  CPENC (3,4,3,1,1),  F_ARCHEXT,  
AARCH64_FEATURE (FGT2))
   SYSREG ("hfgitr_el2",CPENC (3,4,1,1,6),  F_ARCHEXT,  
AARCH64_FEATURE (V8_6A))
   SYSREG ("hfgrtr_el2",CPENC (3,4,1,1,4),  F_ARCHEXT,  
AARCH64_FEATURE (V8_6A))
+  SYSREG

Re: [PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-26 Thread Victor Do Nascimento

On 1/26/24 10:53, Richard Sandiford wrote:
> Victor Do Nascimento  writes:
>> @@ -712,6 +760,27 @@ ENTRY (libat_test_and_set_16)
>>  END (libat_test_and_set_16)
>>  
>>  
>> +/* Alias all LSE128_LRCPC3 ifuncs to their specific implementations,
>> +   that is, map it to LSE128, LRCPC or CORE as appropriate.  */
>> +
>> +ALIAS (libat_exchange_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_fetch_or_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_fetch_and_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_or_fetch_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_and_fetch_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_load_16, LSE128_LRCPC3, LRCPC3)
>> +ALIAS (libat_store_16, LSE128_LRCPC3, LRCPC3)
>> +ALIAS (libat_compare_exchange_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_fetch_add_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_add_fetch_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_fetch_sub_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_sub_fetch_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_fetch_xor_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_xor_fetch_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_fetch_nand_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_nand_fetch_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_test_and_set_16, LSE128_LRCPC3, LSE2)
>> +
>>  /* Alias entry points which are the same in LSE2 and LSE128.  */
>>  
>>  #if !HAVE_FEAT_LSE128
>> @@ -734,6 +803,29 @@ ALIAS (libat_fetch_nand_16, LSE128, LSE2)
>>  ALIAS (libat_nand_fetch_16, LSE128, LSE2)
>>  ALIAS (libat_test_and_set_16, LSE128, LSE2)
>>  
>> +
>> +/* Alias entry points which are the same in LRCPC3 and LSE2.  */
>> +
>> +#if !HAVE_FEAT_LRCPC3
>> +ALIAS (libat_load_16, LRCPC3, LSE2)
>> +ALIAS (libat_store_16, LRCPC3, LSE2)
>> +#endif
>> +ALIAS (libat_exchange_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_or_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_and_16, LRCPC3, LSE2)
>> +ALIAS (libat_or_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_and_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_compare_exchange_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_add_16, LRCPC3, LSE2)
>> +ALIAS (libat_add_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_sub_16, LRCPC3, LSE2)
>> +ALIAS (libat_sub_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_xor_16, LRCPC3, LSE2)
>> +ALIAS (libat_xor_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_nand_16, LRCPC3, LSE2)
>> +ALIAS (libat_nand_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_test_and_set_16, LRCPC3, LSE2)
>> +
>>  /* Alias entry points which are the same in baseline and LSE2.  */
>>  
>>  ALIAS (libat_exchange_16, LSE2, CORE)
> 
> Sorry to be awkward, but I think this is becoming a bit unwieldly.
> It wasn't too bad using aliases for LSE128->LSE2 fallbacks since LSE128
> could optimise a decent number of routines.  But here we're using two
> sets of aliases for every function in order to express "LRCPC3 loads and
> stores should be used where possible, independently of the choices for
> other routines".
> 
> This also complicates the ifuncs, since we need to detect LRCPC3+LSE128
> as a distinct combination even though none of the underlying routines
> depend on both LRCPC3 and LSE128 together.
> 
> I think instead we should make the ifunc mechanism more granular,
> so that we can have default/lse2/rcpc3 for loads and stores,
> default/lse128 for things that LSE128 can do, etc.  I realise that
> would require some rework of the generic framework code, but I think
> it's still worth doing.
> 
> I won't object if another maintainer is happy with the patch in its
> current form though.
> 
> Thanks,
> Richard

In all fairness, I have to agree with you.

While I had wanted to be as noninvasive as possible with existing
generic framework code, I do agree that with the addition of RCPC3
following LSE128 we're rapidly approaching a kind of maintainability
"critical mass" regarding combinatorial explosion.

In this patch series I'd already had to expand the number of supported
ifunc alternatives to 4 and with n atomic architectural extensions we'll
end up w/ 2^n possible cases to cater for in our selector alone.

Happy to go back to the drawing board and see how we can add the
flexibility to Libatomic.

As always, thanks for the feedback!

V.


>> diff --git a/libatomic/config/linux/aarch64/host-config.h 
>> b/libatomic/config/linux/aarch64/host-config.h
>> index 4e354124063..d03fcfe4a64 100644
>> --- a/libatomic/config/linux/aarch64/host-config.h
>> +++ b/libatomic/config/linux/aarch64/host-config.h
>> @@ -37,9 +37,12 @@ typedef struct __ifunc_arg_t {
>>  
>>  #ifdef HWCAP_USCAT
>>  # if N == 16
>&

Re: [libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-25 Thread Victor Do Nascimento





On 1/11/24 15:55, Roger Sayle wrote:


Hi Richard,
As you've recommended, this issue has now been filed in bugzilla
as PR other/113336.  As explained in the new PR, libatomic's testsuite
used to pass on armv6 (raspberry pi) in previous GCC releases, but
the code was incorrect/non-synchronous; this was reported as
PR target/107567 and PR target/109166.  Now that those issues
have been fixed, we now see that there's a missing dependency in
libatomic that's required to implement this functionality correctly.

I'm more convinced that my fix is correct, but it's perhaps a little
disappointing that libatomic doesn't have a (multi-threaded) run-time
test to search for race conditions, and confirm its implementations
are correctly serializing.

Please let me know what you think.
Best regards,
Roger
--


I do think that if the regression is caused by HAVE_ATOMIC_TAS now being 
detected as false due to a bugfix elsewhere as you kindly pointed out, 
then the fix perhaps ought to change the compile-time behavior for TAS 
alone.


As I point out in Bugzilla, we can get away with replacing the proposed

  libatomic_la_LIBADD += $(addsuffix _1_2_.lo,$(SIZEOBJS))

with

  libatomic_la_LIBADD += tas_1_2_.lo

so that we generate the missing `libat_test_and_set_1_i2' specifically.
I've not manage to detect the need for any other *_1_i2 thus far and 
this alone appears sufficient to fix all observed regressions.


Happy to investigate further, but my initial findings seem to be that 
this may be a better fix.


Let me know if you disagree ;).

Regards,
Victor


-Original Message-
From: Richard Earnshaw 
Sent: 10 January 2024 15:34
To: Roger Sayle ; gcc-patches@gcc.gnu.org
Subject: Re: [libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].



On 08/01/2024 16:07, Roger Sayle wrote:


Bootstrapping GCC on arm-linux-gnueabihf with --with-arch=armv6
currently has a large number of FAILs in libatomic (regressions since
last time I attempted this).  The failure mode is related to IFUNC
handling with the file tas_8_2_.o containing an unresolved reference
to the function libat_test_and_set_1_i2.

Bearing in mind I've no idea what's going on, the following one line
change, to build tas_1_2_.o when building tas_8_2_.o, resolves the
problem for me and restores the libatomic testsuite to 44 expected
passes and 5 unsupported tests [from 22 unexpected failures and 22 unresolved

testcases].


If this looks like the correct fix, I'm not confident with rebuilding
Makefile.in with correct version of automake, so I'd very much
appreciate it if someone/the reviewer/mainainer could please check this in for

me.

Thanks in advance.


2024-01-08  Roger Sayle  

libatomic/ChangeLog
  * Makefile.am: Build tas_1_2_.o on ARCH_ARM_LINUX
  * Makefile.in: Regenerate.


Roger
--



Hi Roger,

I don't really understand all this make foo :( so I'm not sure if this is the 
right fix
either.  If this is, as you say, a regression, have you been able to track down 
when
it first started to occur?  That might also help me to understand what changed 
to
cause this.

Perhaps we should have a PR for this, to make tracking the fixes easier.

R.

[PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-24 Thread Victor Do Nascimento

The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.

These operations are single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.

libatomic/ChangeLog:

  * configure.ac: Add call to LIBAT_TEST_FEAT_LRCPC3() test.
  * configure: Regenerate.
  * config/linux/aarch64/host-config.h (HAS_LRCPC3): New.
  (has_rcpc3): Likewise.
  (HWCAP2_LRCPC3): Likewise.
  * config/linux/aarch64/atomic_16.S (libat_load_16): Add
  LRCPC3 variant.
  (libat_store_16): Likewise.
  * acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LRCPC3): New.
  (HAVE_FEAT_LRCPC3): Likewise
  (ARCH_AARCH64_HAVE_LRCPC3): Likewise.
  * Makefile.am (AM_CPPFLAGS): Conditionally append
  -DHAVE_FEAT_LRCPC3 flag.
---
 libatomic/Makefile.am|   6 +-
 libatomic/Makefile.in|  22 ++--
 libatomic/acinclude.m4   |  19 
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 102 ++-
 libatomic/config/linux/aarch64/host-config.h |  33 +-
 libatomic/configure  |  59 ++-
 libatomic/configure.ac   |   1 +
 8 files changed, 225 insertions(+), 20 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index 0623a0bf2d1..1e5481fa580 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,8 +130,12 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
_$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+AM_CPPFLAGS  =
 if ARCH_AARCH64_HAVE_LSE128
-AM_CPPFLAGS = -DHAVE_FEAT_LSE128
+AM_CPPFLAGS += -DHAVE_FEAT_LSE128
+endif
+if ARCH_AARCH64_HAVE_LRCPC3
+AM_CPPFLAGS+= -DHAVE_FEAT_LRCPC3
 endif
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index cd48fa21334..8e87d12907a 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -89,15 +89,17 @@ POST_UNINSTALL = :
 build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
-@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_1 = $(foreach 
s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
-@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2 = atomic_16.S
-@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(foreach \
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_1
 = -DHAVE_FEAT_LSE128
+@ARCH_AARCH64_HAVE_LRCPC3_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2
 = -DHAVE_FEAT_LRCPC3
+@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(foreach 
s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
+@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = atomic_16.S
+@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_5 = $(foreach \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ s,$(SIZES),$(addsuffix \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ _$(s)_1_.lo,$(SIZEOBJS))) \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ $(addsuffix \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ _8_2_.lo,$(SIZEOBJS))
-@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = $(addsuffix 
_8_1_.lo,$(SIZEOBJS))
-@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_5 = $(addsuffix 
_16_1_.lo,$(SIZEOBJS)) \
+@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_6 = $(addsuffix 
_8_1_.lo,$(SIZEOBJS))
+@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_7 = $(addsuffix 
_16_1_.lo,$(SIZEOBJS)) \
 @ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@   $(addsuffix 
_16_2_.lo,$(SIZEOBJS))
 
 subdir = .
@@ -424,7 +426,7 @@ libatomic_la_LDFLAGS = $(libatomic_version_info) 
$(libatomic_version_script) \
$(lt_host_flags) $(libatomic_darwin_rpath)
 
 libatomic_la_SOURCES = gload.c gstore.c gcas.c gexch.c glfree.c lock.c \
-   init.c fenv.c fence.c flag.c $(am__append_2)
+   init.c fenv.c fence.c flag.c $(am__append_4)
 SIZEOBJS = load store cas exch fadd fsub fand fior fxor fnand tas
 EXTRA_libatomic_la_SOURCES = $(addsuffix _n.c,$(SIZEOBJS))
 libatomic_la_DEPENDENCIES = $(libatomic_la_LIBADD) $(libatomic_version_dep)
@@ -450,9 +452,11 @@ all_c_files := $(foreach dir,$(search_path),$(wildcard 
$(dir)/*.c))
 # Then sort through them to find the one we want, and select the first.
 M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
-   _$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
-   $(am__append_4)

[PATCH v2 1/2] libatomic: Increase max IFUNC_NCOND(N) from 3 to 4.

2024-01-24 Thread Victor Do Nascimento

libatomic/ChangeLog:
* libatomic_i.h: Add GEN_SELECTOR implementation for
IFUNC_NCOND(N) == 4.
---
 libatomic/libatomic_i.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h
index 861a22da152..0a854fd908c 100644
--- a/libatomic/libatomic_i.h
+++ b/libatomic/libatomic_i.h
@@ -275,6 +275,24 @@ bool libat_is_lock_free (size_t, void *) MAN(is_lock_free);
return C3(libat_,X,_i3);\
  return C2(libat_,X);  \
}
+# elif IFUNC_NCOND(N) == 4
+#  define GEN_SELECTOR(X)  \
+   extern typeof(C2(libat_,X)) C3(libat_,X,_i1) HIDDEN;\
+   extern typeof(C2(libat_,X)) C3(libat_,X,_i2) HIDDEN;\
+   extern typeof(C2(libat_,X)) C3(libat_,X,_i3) HIDDEN;\
+   extern typeof(C2(libat_,X)) C3(libat_,X,_i4) HIDDEN;\
+   static typeof(C2(libat_,X)) * C2(select_,X) (IFUNC_RESOLVER_ARGS) \
+   {   \
+ if (IFUNC_COND_1) \
+   return C3(libat_,X,_i1);\
+ if (IFUNC_COND_2) \
+   return C3(libat_,X,_i2);\
+ if (IFUNC_COND_3) \
+   return C3(libat_,X,_i3);\
+ if (IFUNC_COND_4) \
+   return C3(libat_,X,_i4);\
+ return C2(libat_,X);  \
+   }
 # else
 #  error "Unsupported number of ifunc alternatives."
 # endif
-- 
2.42.0

[PATCH v2 0/2] libatomic: AArch64 rcpc3 128-bit atomic operation enablement

2024-01-24 Thread Victor Do Nascimento

The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing both the Load-Acquire RCpc Pair
Ordered, and Store-Release Pair Ordered operations in the form of
LDIAPP and STILP.

In light of this, continuing on from previously-proposed Libatomic
enablement work [1], this patch series therefore makes the following
changes to Libatomic:

  1. Extend the number of allowed ifunc alternatives to 4.
  2. Add LDIAPP and STILP instructions to 16-byte atomic operations.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643841.html

Victor Do Nascimento (2):
  libatomic: Increase max IFUNC_NCOND(N) from 3 to 4.
  libatomic: Add rcpc3 128-bit atomic operations for AArch64

 libatomic/Makefile.am|   6 +-
 libatomic/Makefile.in|  22 ++--
 libatomic/acinclude.m4   |  19 
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 102 ++-
 libatomic/config/linux/aarch64/host-config.h |  33 +-
 libatomic/configure  |  59 ++-
 libatomic/configure.ac   |   1 +
 libatomic/libatomic_i.h  |  18 
 9 files changed, 243 insertions(+), 20 deletions(-)

-- 
2.42.0

[PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-24 Thread Victor Do Nascimento

The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

  #define LSE(NAME) NAME##_i1

Where we wish to generate ifunc names with the pre-processor's token
concatenation feature, we add a level of indirection to previous macro
calls.  If before we would have had`MACRO(_i)', we now have
`MACRO_FEAT(name, feature)'.  Where we wish to refer to base
functionality (i.e., functions where ifunc suffixes are absent), the
original `MACRO()' may be used to bypass suffixing.

Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same.  For example, the entry
and endpoints of `libat_store_16' remain defined by:

  ENTRY (libat_store_16)

and

  END (libat_store_16)

For the LSE2 implementation of the same 16-byte atomic store, we now
have:

  ENTRY_FEAT (libat_store_16, LSE2)

and

  END_FEAT (libat_store_16, LSE2)

For the aliasing of function names, we define the following new
implementation of the ALIAS macro:

  ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)

Defining the `CORE(NAME)' macro to be the identity operator, it
returns the base function name unaltered and allows us to alias
target-specific ifuncs to the corresponding base implementation.
For example, we'd alias the LSE2 `libat_exchange_16' to it base
implementation with:

  ALIAS (libat_exchange_16, LSE2, CORE)

libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY_FEAT): Likewise.
(ENTRY_FEAT1): Likewise.
(END_FEAT): Likewise.
(END_FEAT1): Likewise.
(ALIAS): Modify macro to take in `arch' arguments.
(ALIAS1): New.
---
 libatomic/config/linux/aarch64/atomic_16.S | 79 +-
 1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index ad14f8f2e6e..16a42925903 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,22 +40,38 @@
 
.arch   armv8-a+lse
 
-#define ENTRY(name)\
-   .global name;   \
-   .hidden name;   \
-   .type name,%function;   \
+#define LSE2(NAME) NAME##_i1
+#define CORE(NAME) NAME
+
+#define ENTRY(NAME) ENTRY_FEAT1 (NAME)
+
+#define ENTRY_FEAT(NAME, FEAT)  \
+   ENTRY_FEAT1 (FEAT (NAME))
+
+#define ENTRY_FEAT1(NAME)  \
+   .global NAME;   \
+   .hidden NAME;   \
+   .type NAME,%function;   \
.p2align 4; \
-name:  \
-   .cfi_startproc; \
+NAME:  \
+   .cfi_startproc; \
hint34  // bti c
 
-#define END(name)  \
+#define END(NAME) END_FEAT1 (NAME)
+
+#define END_FEAT(NAME, FEAT)   \
+   END_FEAT1 (FEAT (NAME))
+
+#define END_FEAT1(NAME)\
.cfi_endproc;   \
-   .size name, .-name;
+   .size NAME, .-NAME;
+
+#define ALIAS(NAME, FROM, TO)  \
+   ALIAS1 (FROM (NAME),TO (NAME))
 
-#define ALIAS(alias,name)  \
-   .global alias;  \
-   .set alias, name;
+#define ALIAS1(ALIAS, NAME)\
+   .global ALIAS;  \
+   .set ALIAS, NAME;
 
 #define res0 x0
 #define res1 x1
@@ -108,7 +124,7 @@ ENTRY (libat_load_16)
 END (libat_load_16)
 
 
-ENTRY (libat_load_16_i1)
+ENTRY_FEAT (libat_load_16, LSE2)
cbnzw1, 1f
 
/* RELAXED.  */
@@ -128,7 +144,7 @@ ENTRY (libat_load_16_i1)
ldp res0, res1, [x0]
dmb ishld
ret
-END (libat_load_16_i1)
+END_FEAT (libat_load_16, LSE2)
 
 
 ENTRY (libat_store_16)
@@ -148,7 +164,7 @@ ENTRY (libat_store_16)
 END (libat_store_16)
 
 
-ENTRY (libat_store_16_i1)
+ENTRY_FEAT (libat_store_16, LSE2)
cbnzw4, 1f
 
/* RELAXED.  */
@@ -160,7 +176,7 @@ ENTRY (libat_store_16_i1)
stlxp   w4, in0, in1, [x0]
cbnzw4, 1b
ret
-END (libat_store_16_i1)
+END_FEAT (libat_store_16, LSE2)
 
 
 ENTRY (libat_exchange_16)
@@ -237,7 +253,7 @@ ENTRY (libat_compare_exchange_16)
 END (libat_compare_exchange_16)
 
 
-ENTRY (libat_compare_exchange_16_i1)
+ENTRY_FEAT (libat_compare_exchange_16, LSE2)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -270,7 +286,7 @@ ENTRY (libat_compare_exchange_16_i1)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END (libat_compare_exchange_16_i1)
+END_FEAT (libat_compare_exchange_16, LSE2)
 
 
 ENTRY (libat_fetch_add_16)
@@ -556,21 +572,20 @@ END

[PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-24 Thread Victor Do Nascimento

At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register.  This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register.  Given the undesirable nature of the
computational expense associated with the context switch, it is
important to implement mechanisms to, wherever possible, forgo the
operation.

In light of this, given how other architectural requirements serving
as prerequisites have long been assigned HWCAP bits by the kernel, we
can inexpensively query for their availability before attempting to
read any system registers.  Where one of these early tests fail, we
can assert that the main feature of interest (be it LSE2 or LSE128)
cannot be present, allowing us to return from the function early and
skip the unnecessary expensive kernel-mediated access to system
registers.

libatomic/ChangeLog:

* config/linux/aarch64/host-config.h (has_lse2): Add test for LSE.
(has_lse128): Add test for LSE2.
---
 libatomic/config/linux/aarch64/host-config.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index 1bc7d839232..4e354124063 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -64,8 +64,13 @@ typedef struct __ifunc_arg_t {
 static inline bool
 has_lse2 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
+  /* Check for LSE2.  */
   if (hwcap & HWCAP_USCAT)
 return true;
+  /* No point checking further for atomic 128-bit load/store if LSE
+ prerequisite not met.  */
+  if (!(hwcap & HWCAP_ATOMICS))
+return false;
   if (!(hwcap & HWCAP_CPUID))
 return false;
 
@@ -99,9 +104,11 @@ has_lse128 (unsigned long hwcap, const __ifunc_arg_t 
*features)
  support in older kernels as it is of CPU feature absence.  Try fallback
  method to guarantee LSE128 is not implemented.
 
- In the absence of HWCAP_CPUID, we are unable to check for LSE128.  */
-  if (!(hwcap & HWCAP_CPUID))
-return false;
+ In the absence of HWCAP_CPUID, we are unable to check for LSE128.
+ If feature check available, check LSE2 prerequisite before proceeding.  */
+  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
+ return false;
+
   unsigned long isar0;
   asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
   if (AT_FEAT_FIELD (isar0) >= 3)
-- 
2.42.0

[PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-24 Thread Victor Do Nascimento

The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

  * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
  value held in a pair of registers, with original data loaded into
  the same 2 registers.
  * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
  in a pair of registers, with original data loaded into the same 2
  registers.
  * SWPP - Atomic swap of one 128-bit value with 128-bit value held
  in a pair of registers.

It is worth noting that in keeping with existing 128-bit atomic
operations in `atomic_16.S', we have chosen to merge certain
less-restrictive orderings into more restrictive ones.  This is done
to minimize the number of branches in the atomic functions, minimizing
both the likelihood of branch mispredictions and, in keeping code
small, limit the need for extra fetch cycles.

Past benchmarking has revealed that acquire is typically slightly
faster than release (5-10%), such that for the most frequently used
atomics (CAS and SWP) it makes sense to add support for acquire, as
well as release.

Likewise, it was identified that combining acquire and release typically
results in little to no penalty, such that it is of negligible benefit
to distinguish between release and acquire-release, making the
combining release/acq_rel/seq_cst a worthwhile design choice.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

  1. Add a configure-time check to check for LSE128 support in the
  assembler.
  2. Edit host-config.h so that when N == 16, nifunc = 2.
  3. Where available due to LSE128, implement the second ifunc, making
  use of the novel instructions.
  4. For atomic functions unable to make use of these new
  instructions, define a new alias which causes the _i1 function
  variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

* Makefile.am (AM_CPPFLAGS): add conditional setting of
-DHAVE_FEAT_LSE128.
* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): New.
* config/linux/aarch64/atomic_16.S (LSE128): New macro
definition.
(libat_exchange_16): New LSE128 variant.
(libat_fetch_or_16): Likewise.
(libat_or_fetch_16): Likewise.
(libat_fetch_and_16): Likewise.
(libat_and_fetch_16): Likewise.
* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
(IFUNC_NCOND): Add operand size checking.
(has_lse2): Renamed from `ifunc1`.
(has_lse128): New.
(HWCAP2_LSE128): Likewise.
* libatomic/configure.ac: Add call to
LIBAT_TEST_FEAT_AARCH64_LSE128.
* configure (ac_subst_vars): Regenerated via autoreconf.
* libatomic/Makefile.in: Likewise.
* libatomic/auto-config.h.in: Likewise.
---
 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 +++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 170 ++-
 libatomic/config/linux/aarch64/host-config.h |  42 -
 libatomic/configure  |  61 ++-
 libatomic/configure.ac   |   3 +
 8 files changed, 293 insertions(+), 9 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index cfad90124f9..0623a0bf2d1 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
_$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+if ARCH_AARCH64_HAVE_LSE128
+AM_CPPFLAGS = -DHAVE_FEAT_LSE128
+endif
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
 libatomic_la_SOURCES += atomic_16.S
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index dc2330b91fd..cd48fa21334 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
_$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
$(am__append_4) $(am__append_5)
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS
 = -DHAVE_FEAT_LSE128
 @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp 
-DHAVE_KERNEL64
 @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586
diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index f35ab5b60a5..d4f13174e2c 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,6 +83,25 @@

[PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver

2024-01-24 Thread Victor Do Nascimento

With support for new atomic features in Armv9.4-a being indicated by
HWCAP2 bits, Libatomic's ifunc resolver must now query its second
argument, of type __ifunc_arg_t*.

We therefore make this argument known to libatomic, allowing us to
query hwcap2 bits in the following manner:

  bool
  resolver (unsigned long hwcap, const __ifunc_arg_t *features);
  {
return (features->hwcap2 & HWCAP2_);
  }

libatomic/ChangeLog:

* config/linux/aarch64/host-config.h (__ifunc_arg_t):
Conditionally-defined if `sys/ifunc.h' not found.
(_IFUNC_ARG_HWCAP): Likewise.
(IFUNC_COND_1): Pass __ifunc_arg_t argument to ifunc.
(ifunc1): Modify function signature to accept __ifunc_arg_t
argument.
* configure.tgt: Add second `const __ifunc_arg_t *features'
argument to IFUNC_RESOLVER_ARGS.
---
 libatomic/config/linux/aarch64/host-config.h | 15 +--
 libatomic/configure.tgt  |  2 +-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index 4200293c4e3..8fd4fe3321a 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -24,9 +24,20 @@
 #if HAVE_IFUNC
 #include 
 
+#if __has_include()
+# include 
+#else
+typedef struct __ifunc_arg_t {
+  unsigned long _size;
+  unsigned long _hwcap;
+  unsigned long _hwcap2;
+} __ifunc_arg_t;
+# define _IFUNC_ARG_HWCAP (1ULL << 62)
+#endif
+
 #ifdef HWCAP_USCAT
 # if N == 16
-#  define IFUNC_COND_1 ifunc1 (hwcap)
+#  define IFUNC_COND_1 ifunc1 (hwcap, features)
 # else
 #  define IFUNC_COND_1 (hwcap & HWCAP_ATOMICS)
 # endif
@@ -48,7 +59,7 @@
 #define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
 
 static inline bool
-ifunc1 (unsigned long hwcap)
+ifunc1 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
   if (hwcap & HWCAP_USCAT)
 return true;
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index b7609132c58..67a5f2dff80 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -194,7 +194,7 @@ esac
 # The type may be different on different architectures.
 case "${target}" in
   aarch64*-*-*)
-   IFUNC_RESOLVER_ARGS="uint64_t hwcap"
+   IFUNC_RESOLVER_ARGS="uint64_t hwcap, const __ifunc_arg_t *features"
;;
   *)
IFUNC_RESOLVER_ARGS="void"
-- 
2.42.0

[PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64

2024-01-24 Thread Victor Do Nascimento

v4 updates

  1. Make use of HWCAP2_LSE128, as defined in the  Linux kernel v6.7
  for feature check.  This has required adding a new patch to the
  series, enabling ifunc resolvers to read a second arg of type
  `__ifunc_arg_t *', from which the `_hwcap2' member can be queried
  for LSE128 support.  HWCAP2_LSE128, HWCAP_ATOMICS and __ifunc_arg_t
  are conditionally defined in the `host-config.h' file to allow
  backwards compatibility with older versions of glibc which lack
  definitions for these.

  2. Run configure test LIBAT_TEST_FEAT_LSE128 unconditionally,
  renaming it to LIBAT_TEST_FEAT_AARCH64_LSE128.  While it may seem
  counter-intuitive to run an aarch64 test on non-aarch64 targets, the
  Automake manual makes it clear:

"Note that you must arrange for every AM_CONDITIONAL to be
 invoked every time configure is run. If AM_CONDITIONAL is
 run conditionally (e.g., in a shell if statement), then
 the result will confuse automake."

  Failure to do so has been found to result in Libatomic build
  failures on arm and x86_64 targets.

  3. Minor changes in the implementations of {ENTRY|END}_FEAT and
  ALIAS macros used in `config/linux/aarch64/atomic_16.S'

  4. Improve commit message in PATCH 2/3 documenting design choice
  around merging REL and ACQ_REL memory orderings in LSE128 atomic
  functions.

Regression-tested on aarch64-none-linux-gnu on Cortex-A72 and
LSE128-enabled Armv-A Base RevC AEM FVP.

---

Building upon Wilco Dijkstra's work on AArch64 128-bit atomics for
Libatomic, namely the patches from [1] and [2],  this patch series
extends the library's  capabilities to dynamically select and emit
Armv9.4-a LSE128 implementations of atomic operations via ifuncs at
run-time whenever architectural support is present.

Regression tested on aarch64-linux-gnu target with LSE128-support.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html

Victor Do Nascimento (4):
  libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
  libatomic: Add support for __ifunc_arg_t arg in ifunc resolver
  libatomic: Enable LSE128 128-bit atomics for armv9.4-a
  aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 ++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 247 ---
 libatomic/config/linux/aarch64/host-config.h |  60 -
 libatomic/configure  |  61 -
 libatomic/configure.ac   |   3 +
 libatomic/configure.tgt  |   2 +-
 9 files changed, 358 insertions(+), 41 deletions(-)

-- 
2.42.0

Re: [PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-08 Thread Victor Do Nascimento





On 1/5/24 11:10, Richard Sandiford wrote:

Victor Do Nascimento  writes:

The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

   #define LSE2 _i1

and reconstructs function names with the pre-processor's token
concatenation feature, such that for `MACRO(_i)', we would
now have `MACRO_FEAT(name, feature)' and in the macro definition body
we replace `name` with `name##feature`.


FWIW, another way of doing this would be to have:

#define CORE(NAME) NAME
#define LSE2(NAME) NAME##_i1

and use feature(name) instead of name##feature.  This has the slight
advantage of not using ## on empty tokens, and the maybe slightly
better advantage of not needing the extra forwarding step in:

#define ENTRY_FEAT(name, feat)  \
ENTRY_FEAT1(name, feat)

#define ENTRY_FEAT1(name, feat) \

WDYT?

Richard



While from a strictly stylistic point of view, I'm not so keen on the 
resulting interface and its 'function call within a function call' look, 
e.g.


  ENTRY (LSE2 (libat_compare_exchange_16))

and

  ALIAS (LSE128 (libat_compare_exchange_16), \
 LSE2 (libat_compare_exchange_16))

on the implementation-side of things, I like the benefits this brings 
about.  Namely allowing the use of the unaltered original 
implementations of the ENTRY, END and ALIAS macros with the 
aforementioned advantages of not having to use ## on empty tokens and 
abolishing the need for the extra forwarding step.


I'm happy enough to go with this approach.

Cheers


Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same.  For example, the entry
and endpoints of `libat_store_16' remain defined by:

   - ENTRY (libat_store_16)
and
   - END (libat_store_16)

For the LSE2 implementation of the same 16-byte atomic store, we now
have:

   - ENTRY_FEAT (libat_store_16, LSE2)
and
   - END_FEAT (libat_store_16, LSE2)

For the alising of ifunc names, we define the following new
implementation of the ALIAS macro:

   - ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)

Defining the base feature name macro to map `CORE' to the empty string,
mapping LSE2 to the base implementation, we'd alias the LSE2
`libat_exchange_16' to it base implementation with:

   - ALIAS (libat_exchange_16, LSE2, CORE)

libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY_FEAT): Likewise.
(END_FEAT): Likewise.
(ENTRY_FEAT1): Likewise.
(END_FEAT1): Likewise.
(ALIAS): Modify macro to take in `arch' arguments.
---
  libatomic/config/linux/aarch64/atomic_16.S | 83 +-
  1 file changed, 49 insertions(+), 34 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index a099037179b..eb8e749b8a2 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,22 +40,38 @@
  
  	.arch	armv8-a+lse
  
-#define ENTRY(name)		\

-   .global name;   \
-   .hidden name;   \
-   .type name,%function;   \
-   .p2align 4; \
-name:  \
-   .cfi_startproc; \
+#define ENTRY(name) ENTRY_FEAT (name, CORE)
+
+#define ENTRY_FEAT(name, feat) \
+   ENTRY_FEAT1(name, feat)
+
+#define ENTRY_FEAT1(name, feat)\
+   .global name##feat; \
+   .hidden name##feat; \
+   .type name##feat,%function; \
+   .p2align 4; \
+name##feat:\
+   .cfi_startproc; \
hint34  // bti c
  
-#define END(name)		\

-   .cfi_endproc;   \
-   .size name, .-name;
+#define END(name) END_FEAT (name, CORE)
  
-#define ALIAS(alias,name)	\

-   .global alias;  \
-   .set alias, name;
+#define END_FEAT(name, feat)   \
+   END_FEAT1(name, feat)
+
+#define END_FEAT1(name, feat)  \
+   .cfi_endproc;   \
+   .size name##feat, .-name##feat;
+
+#define ALIAS(alias, from, to) \
+   ALIAS1(alias,from,to)
+
+#define ALIAS1(alias, from, to)\
+   .global alias##from;\
+   .set alias##from, alias##to;
+
+#define CORE
+#define LSE2   _i1
  
  #define res0 x0

  #define res1 x1
@@ -108,7 +124,7 @@ ENTRY (libat_load_16)
  END (libat_load_16)
  
  
-ENTRY (libat_load_16_i1)

+ENTRY_FEAT (libat_load_16, LSE2)
cbnzw1, 1f
  
  	/* RELAXED.  */

@@ -128,7 +144,7 @@ ENTRY (libat_load_16_i1)
ldp res0, res1

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Victor Do Nascimento





On 1/5/24 11:47, Richard Sandiford wrote:

Victor Do Nascimento  writes:

The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

   * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
   value held in a pair of registers, with original data loaded into
   the same 2 registers.
   * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
   in a pair of registers, with original data loaded into the same 2
   registers.
   * SWPP - Atomic swap of one 128-bit value with 128-bit value held
   in a pair of registers.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

   1. Add a configure-time check to check for LSE128 support in the
   assembler.
   2. Edit host-config.h so that when N == 16, nifunc = 2.
   3. Where available due to LSE128, implement the second ifunc, making
   use of the novel instructions.
   4. For atomic functions unable to make use of these new
   instructions, define a new alias which causes the _i1 function
   variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

* Makefile.am (AM_CPPFLAGS): add conditional setting of
-DHAVE_FEAT_LSE128.
* acinclude.m4 (LIBAT_TEST_FEAT_LSE128): New.
* config/linux/aarch64/atomic_16.S (LSE128): New macro
definition.
(libat_exchange_16): New LSE128 variant.
(libat_fetch_or_16): Likewise.
(libat_or_fetch_16): Likewise.
(libat_fetch_and_16): Likewise.
(libat_and_fetch_16): Likewise.
* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
(IFUNC_NCOND): Add operand size checking.
(has_lse2): Renamed from `ifunc1`.
(has_lse128): New.
(HAS_LSE128): Likewise.
* libatomic/configure.ac: Add call to LIBAT_TEST_FEAT_LSE128.
* configure (ac_subst_vars): Regenerated via autoreconf.
* libatomic/Makefile.in: Likewise.
* libatomic/auto-config.h.in: Likewise.
---
  libatomic/Makefile.am|   3 +
  libatomic/Makefile.in|   1 +
  libatomic/acinclude.m4   |  19 +++
  libatomic/auto-config.h.in   |   3 +
  libatomic/config/linux/aarch64/atomic_16.S   | 170 ++-
  libatomic/config/linux/aarch64/host-config.h |  29 +++-
  libatomic/configure  |  59 ++-
  libatomic/configure.ac   |   1 +
  8 files changed, 276 insertions(+), 9 deletions(-)

[...]
diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index f35ab5b60a5..4197db8f404 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
])
  ])
  
+dnl

+dnl Test if the host assembler supports armv9.4-a LSE128 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_LSE128],[
+  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
+[libat_cv_have_feat_lse128],[
+AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
+if AC_TRY_EVAL(ac_link); then


ac_compile should be enough for this.  The link step isn't really
adding anything.


+  eval libat_cv_have_feat_lse128=yes
+else
+  eval libat_cv_have_feat_lse128=no
+fi
+rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
+   [Have LSE128 support for 16 byte integers.])
+  AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 
= xyes])
+])
+
  dnl
  dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
  dnl
[...]
@@ -206,6 +211,31 @@ ENTRY (libat_exchange_16)
  END (libat_exchange_16)
  
  
+#if HAVE_FEAT_LSE128

+ENTRY_FEAT (libat_exchange_16, LSE128)
+   mov tmp0, x0
+   mov res0, in0
+   mov res1, in1
+   cbnzw4, 1f
+
+   /* RELAXED.  */
+   swppres0, res1, [tmp0]
+   ret
+1:
+   cmp w4, ACQUIRE
+   b.hi2f
+
+   /* ACQUIRE/CONSUME.  */
+   swppa   res0, res1, [tmp0]
+   ret
+
+   /* RELEASE/ACQ_REL/SEQ_CST.  */
+2: swppal  res0, res1, [tmp0]
+   ret
+END_FEAT (libat_exchange_16, LSE128)
+#endif


Is there no benefit to using SWPPL for RELEASE here?  Similarly for the
others.


We started off implementing all possible memory orderings available. 
Wilco saw value in merging less restricted orderings into more 
restricted ones - mainly to reduce codesize in less frequently used atomics.


This saw us combine RELEASE and ACQ_REL/SEQ_CST cases to make functions 
a little smaller.




Looks good otherwise.

Thanks,
Richard

[PATCH v3 3/3] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-02 Thread Victor Do Nascimento

At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register.  This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register.  Given the undesirable nature of the
computational expense associated with the context switch, it is
important to implement mechanisms to, wherever possible, forgo the
operation.

In light of this, given how other architectural requirements serving
as prerequisites have long been assigned HWCAP bits by the kernel, we
can inexpensively query for their availability before attempting to
read any system registers.  Where one of these early tests fail, we
can assert that the main feature of interest (be it LSE2 or LSE128)
cannot be present, allowing us to return from the function early and
skip the unnecessary expensive kernel-mediated access to system
registers.

libatomic/ChangeLog:

* config/linux/aarch64/host-config.h (has_lse2): Add test for LSE.
(has_lse128): Add test for LSE2.
---
 libatomic/config/linux/aarch64/host-config.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index c5485d63855..3be4db6e5f8 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -53,8 +53,13 @@
 static inline bool
 has_lse2 (unsigned long hwcap)
 {
+  /* Check for LSE2.  */
   if (hwcap & HWCAP_USCAT)
 return true;
+  /* No point checking further for atomic 128-bit load/store if LSE
+ prerequisite not met.  */
+  if (!(hwcap & HWCAP_ATOMICS))
+return false;
   if (!(hwcap & HWCAP_CPUID))
 return false;
 
@@ -76,12 +81,14 @@ has_lse2 (unsigned long hwcap)
 static inline bool
 has_lse128 (unsigned long hwcap)
 {
-  if (!(hwcap & HWCAP_CPUID))
-return false;
+  /* In the absence of HWCAP_CPUID, we are unable to check for LSE128, return.
+ If feature check available, check LSE2 prerequisite before proceeding.  */
+  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
+ return false;
   unsigned long isar0;
   asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
   if (AT_FEAT_FIELD (isar0) >= 3)
-return true;
+  return true;
   return false;
 }
 
-- 
2.42.0

[PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-02 Thread Victor Do Nascimento

The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

  #define LSE2 _i1

and reconstructs function names with the pre-processor's token
concatenation feature, such that for `MACRO(_i)', we would
now have `MACRO_FEAT(name, feature)' and in the macro definition body
we replace `name` with `name##feature`.

Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same.  For example, the entry
and endpoints of `libat_store_16' remain defined by:

  - ENTRY (libat_store_16)
and
  - END (libat_store_16)

For the LSE2 implementation of the same 16-byte atomic store, we now
have:

  - ENTRY_FEAT (libat_store_16, LSE2)
and
  - END_FEAT (libat_store_16, LSE2)

For the alising of ifunc names, we define the following new
implementation of the ALIAS macro:

  - ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)

Defining the base feature name macro to map `CORE' to the empty string,
mapping LSE2 to the base implementation, we'd alias the LSE2
`libat_exchange_16' to it base implementation with:

  - ALIAS (libat_exchange_16, LSE2, CORE)

libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY_FEAT): Likewise.
(END_FEAT): Likewise.
(ENTRY_FEAT1): Likewise.
(END_FEAT1): Likewise.
(ALIAS): Modify macro to take in `arch' arguments.
---
 libatomic/config/linux/aarch64/atomic_16.S | 83 +-
 1 file changed, 49 insertions(+), 34 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index a099037179b..eb8e749b8a2 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,22 +40,38 @@
 
.arch   armv8-a+lse
 
-#define ENTRY(name)\
-   .global name;   \
-   .hidden name;   \
-   .type name,%function;   \
-   .p2align 4; \
-name:  \
-   .cfi_startproc; \
+#define ENTRY(name) ENTRY_FEAT (name, CORE)
+
+#define ENTRY_FEAT(name, feat) \
+   ENTRY_FEAT1(name, feat)
+
+#define ENTRY_FEAT1(name, feat)\
+   .global name##feat; \
+   .hidden name##feat; \
+   .type name##feat,%function; \
+   .p2align 4; \
+name##feat:\
+   .cfi_startproc; \
hint34  // bti c
 
-#define END(name)  \
-   .cfi_endproc;   \
-   .size name, .-name;
+#define END(name) END_FEAT (name, CORE)
 
-#define ALIAS(alias,name)  \
-   .global alias;  \
-   .set alias, name;
+#define END_FEAT(name, feat)   \
+   END_FEAT1(name, feat)
+
+#define END_FEAT1(name, feat)  \
+   .cfi_endproc;   \
+   .size name##feat, .-name##feat;
+
+#define ALIAS(alias, from, to) \
+   ALIAS1(alias,from,to)
+
+#define ALIAS1(alias, from, to)\
+   .global alias##from;\
+   .set alias##from, alias##to;
+
+#define CORE
+#define LSE2   _i1
 
 #define res0 x0
 #define res1 x1
@@ -108,7 +124,7 @@ ENTRY (libat_load_16)
 END (libat_load_16)
 
 
-ENTRY (libat_load_16_i1)
+ENTRY_FEAT (libat_load_16, LSE2)
cbnzw1, 1f
 
/* RELAXED.  */
@@ -128,7 +144,7 @@ ENTRY (libat_load_16_i1)
ldp res0, res1, [x0]
dmb ishld
ret
-END (libat_load_16_i1)
+END_FEAT (libat_load_16, LSE2)
 
 
 ENTRY (libat_store_16)
@@ -148,7 +164,7 @@ ENTRY (libat_store_16)
 END (libat_store_16)
 
 
-ENTRY (libat_store_16_i1)
+ENTRY_FEAT (libat_store_16, LSE2)
cbnzw4, 1f
 
/* RELAXED.  */
@@ -160,7 +176,7 @@ ENTRY (libat_store_16_i1)
stlxp   w4, in0, in1, [x0]
cbnzw4, 1b
ret
-END (libat_store_16_i1)
+END_FEAT (libat_store_16, LSE2)
 
 
 ENTRY (libat_exchange_16)
@@ -237,7 +253,7 @@ ENTRY (libat_compare_exchange_16)
 END (libat_compare_exchange_16)
 
 
-ENTRY (libat_compare_exchange_16_i1)
+ENTRY_FEAT (libat_compare_exchange_16, LSE2)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -270,7 +286,7 @@ ENTRY (libat_compare_exchange_16_i1)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END (libat_compare_exchange_16_i1)
+END_FEAT (libat_compare_exchange_16, LSE2)
 
 
 ENTRY (libat_fetch_add_16)
@@ -556,21 +572,20 @@ END (libat_test_and_set_16)
 
 /* Alias entry points which are the same in baseline and LSE2.  */

[PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-02 Thread Victor Do Nascimento

The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

  * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
  value held in a pair of registers, with original data loaded into
  the same 2 registers.
  * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
  in a pair of registers, with original data loaded into the same 2
  registers.
  * SWPP - Atomic swap of one 128-bit value with 128-bit value held
  in a pair of registers.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

  1. Add a configure-time check to check for LSE128 support in the
  assembler.
  2. Edit host-config.h so that when N == 16, nifunc = 2.
  3. Where available due to LSE128, implement the second ifunc, making
  use of the novel instructions.
  4. For atomic functions unable to make use of these new
  instructions, define a new alias which causes the _i1 function
  variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

* Makefile.am (AM_CPPFLAGS): add conditional setting of
-DHAVE_FEAT_LSE128.
* acinclude.m4 (LIBAT_TEST_FEAT_LSE128): New.
* config/linux/aarch64/atomic_16.S (LSE128): New macro
definition.
(libat_exchange_16): New LSE128 variant.
(libat_fetch_or_16): Likewise.
(libat_or_fetch_16): Likewise.
(libat_fetch_and_16): Likewise.
(libat_and_fetch_16): Likewise.
* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
(IFUNC_NCOND): Add operand size checking.
(has_lse2): Renamed from `ifunc1`.
(has_lse128): New.
(HAS_LSE128): Likewise.
* libatomic/configure.ac: Add call to LIBAT_TEST_FEAT_LSE128.
* configure (ac_subst_vars): Regenerated via autoreconf.
* libatomic/Makefile.in: Likewise.
* libatomic/auto-config.h.in: Likewise.
---
 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 +++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 170 ++-
 libatomic/config/linux/aarch64/host-config.h |  29 +++-
 libatomic/configure  |  59 ++-
 libatomic/configure.ac   |   1 +
 8 files changed, 276 insertions(+), 9 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index c0b8dea5037..24e843db67d 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
_$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+if ARCH_AARCH64_HAVE_LSE128
+AM_CPPFLAGS = -DHAVE_FEAT_LSE128
+endif
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
 libatomic_la_SOURCES += atomic_16.S
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index dc2330b91fd..cd48fa21334 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
_$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
$(am__append_4) $(am__append_5)
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS
 = -DHAVE_FEAT_LSE128
 @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp 
-DHAVE_KERNEL64
 @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586
diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index f35ab5b60a5..4197db8f404 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
   ])
 ])
 
+dnl
+dnl Test if the host assembler supports armv9.4-a LSE128 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_LSE128],[
+  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
+[libat_cv_have_feat_lse128],[
+AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
+if AC_TRY_EVAL(ac_link); then
+  eval libat_cv_have_feat_lse128=yes
+else
+  eval libat_cv_have_feat_lse128=no
+fi
+rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
+   [Have LSE128 support for 16 byte integers.])
+  AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 
= xyes])
+])
+
 dnl
 dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
 dnl
diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
index ab3424a759e..7c78933b07d 100644
---

[PATCH v3 0/3] Libatomic: Add LSE128 atomics support for AArch64

2024-01-02 Thread Victor Do Nascimento

v3 updates:

   1. In the absence of the `HWCAP_LSE128' feature bit in the current
   Linux Kernel release, the feature check continues to rely on a user
   space-issued `mrs' instruction.  Since the the ABI for exporting
   the AArch64 CPU ID/feature registers to userspace relies on
   FEAT_IDST [1], we make the ID_AA64ISAR0_EL1-mediated feature check
   contingent on having the HWCAP_CPUID bit set, ensuring FEAT_IDST
   support, avoiding potential runtime errors.

   2. It is established that, given LSE2 is mandatory from Armv8.4
   onward, LSE128 as introduced from Armv9.4 necessarily implies LSE2,
   such that a separate check for LSE2 is not necessary.

   3. Given that the infrastructure for exposing `mrs' to userspace
   hooks into the exception handler, the feature-register read is
   relatively expensive and ought to be avoided where possible.
   Consequently, where we can ascertain whether prerequisites are met
   via HWCAPS, we query these as a way of returning early where we
   known unequivocally that a given feature cannot be implemented due
   to unmet dependencies.  Such checks are added to both `has_lse2'
   and `has_lse128'.

Regression-tested on aarch64-none-linux-gnu on Cortex-A72 and
LSE128-enabled Armv-A Base RevC AEM FVP.

[1] https://www.kernel.org/doc/html/v6.6/arch/arm64/cpu-feature-registers.html

---

Building upon Wilco Dijkstra's work on AArch64 128-bit atomics for
Libatomic, namely the patches from [1] and [2],  this patch series
extends the library's  capabilities to dynamically select and emit
Armv9.4-a LSE128 implementations of atomic operations via ifuncs at
run-time whenever architectural support is present.

Regression tested on aarch64-linux-gnu target with LSE128-support.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html

Victor Do Nascimento (3):
  libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
  libatomic: Enable LSE128 128-bit atomics for armv9.4-a
  aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 ++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 251 ---
 libatomic/config/linux/aarch64/host-config.h |  36 ++-
 libatomic/configure  |  59 -
 libatomic/configure.ac   |   1 +
 8 files changed, 331 insertions(+), 42 deletions(-)

-- 
2.42.0

[PATCH] aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors

2023-12-09 Thread Victor Do Nascimento

In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed
long.  This means that when the `arm_neon.h' header is used by the
kernel, any use of the `uint64_t' / `in64_t' types needs to be
correctly cast to the correct `__builtin_aarch64_simd_di' /
`__builtin_aarch64_simd_df' types when calling the relevant ACLE
builtins.

This patch adds the necessary fixes to ensure that `vstl1_*' and
`vldap1_*' intrinsics are correctly defined for use by the kernel.

gcc/ChangeLog:

* config/aarch64/arm_neon.h (vldap1_lane_u64): Add
`const' to `__builtin_aarch64_simd_di *' cast.
(vldap1q_lane_u64): Likewise.
(vldap1_lane_s64): Cast __src to `const __builtin_aarch64_simd_di *'.
(vldap1q_lane_s64): Likewise.
(vldap1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
(vldap1q_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
(vldap1_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast.
(vldap1q_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast.
(vstl1_lane_u64): remove stray `const'.
(vstl1_lane_s64): Cast __src to `__builtin_aarch64_simd_di *'.
(vstl1q_lane_s64): Likewise.
(vstl1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
(vstl1q_lane_f64): Likewise.
---
 gcc/config/aarch64/arm_neon.h | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ef0d75e07ce..f394de595f7 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -13456,7 +13456,7 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
__artificial__))
 vldap1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
 {
   return __builtin_aarch64_vec_ldap1_lanev1di_usus (
- (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline uint64x2_t
@@ -13464,35 +13464,39 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
__artificial__))
 vldap1q_lane_u64 (const uint64_t *__src, uint64x2_t __vec, const int __lane)
 {
   return __builtin_aarch64_vec_ldap1_lanev2di_usus (
- (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline int64x1_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vldap1_lane_s64 (const int64_t *__src, int64x1_t __vec, const int __lane)
 {
-  return __builtin_aarch64_vec_ldap1_lanev1di (__src, __vec, __lane);
+  return __builtin_aarch64_vec_ldap1_lanev1di (
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vldap1q_lane_s64 (const int64_t *__src, int64x2_t __vec, const int __lane)
 {
-  return __builtin_aarch64_vec_ldap1_lanev2di (__src, __vec, __lane);
+  return __builtin_aarch64_vec_ldap1_lanev2di (
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline float64x1_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vldap1_lane_f64 (const float64_t *__src, float64x1_t __vec, const int __lane)
 {
-  return __builtin_aarch64_vec_ldap1_lanev1df (__src, __vec, __lane);
+  return __builtin_aarch64_vec_ldap1_lanev1df (
+ (const __builtin_aarch64_simd_df *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline float64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vldap1q_lane_f64 (const float64_t *__src, float64x2_t __vec, const int __lane)
 {
-  return __builtin_aarch64_vec_ldap1_lanev2df (__src, __vec, __lane);
+  return __builtin_aarch64_vec_ldap1_lanev2df (
+ (const __builtin_aarch64_simd_df *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline poly64x1_t
@@ -13500,7 +13504,7 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
__artificial__))
 vldap1_lane_p64 (const poly64_t *__src, poly64x1_t __vec, const int __lane)
 {
   return __builtin_aarch64_vec_ldap1_lanev1di_psps (
- (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline poly64x2_t
@@ -13508,14 +13512,14 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
__artificial__))
 vldap1q_lane_p64 (const poly64_t *__src, poly64x2_t __vec, const int __lane)
 {
   return __builtin_aarch64_vec_ldap1_lanev2di_psps (
- (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 /* vstl1_lane.  */
 
 __extension__ extern __inline void
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vstl1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
+vstl1_lane_u64

[PATCH v3] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-12-05 Thread Victor Do Nascimento

Key changes in v3:
  * Implement the `require_const_argument' function to ensure the nth
  argument in EXP represents a const-type argument in the valid range
  given by [minval, maxval), forgoing expansion altogether when an
  invalid argument is detected early on.
  * Whereas in the previous iteration, out-of-bound function
  parameters led to warnings and sensible defaults set, akin to the
  `__builtin_prefetch' implementation, parameters outside valid ranges
  now result in an error, more faithfully reflecting ACLE
  specifications.

 ---

Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:

  1. Data prefetch intrinsics:
  
  void __pldx (/*constant*/ unsigned int /*access_kind*/,
   /*constant*/ unsigned int /*cache_level*/,
   /*constant*/ unsigned int /*retention_policy*/,
   void const volatile *addr);

  void __pld (void const volatile *addr);

  2. Instruction prefetch intrinsics:
  ---
  void __plix (/*constant*/ unsigned int /*cache_level*/,
   /*constant*/ unsigned int /*retention_policy*/,
   void const volatile *addr);

  void __pli (void const volatile *addr);

`__pldx' affords the programmer more fine-grained control over the
data prefetch behavior than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.

While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.

`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.

`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.

Bootstrapped and tested on aarch64-none-linux-gnu.

[1] 
https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc:
(AARCH64_PLD): New enum aarch64_builtins entry.
(AARCH64_PLDX): Likewise.
(AARCH64_PLI): Likewise.
(AARCH64_PLIX): Likewise.
(aarch64_init_prefetch_builtin): New.
(aarch64_general_init_builtins): Call prefetch init function.
(aarch64_expand_prefetch_builtin): New.
(aarch64_general_expand_builtin):  Add prefetch expansion.
(require_const_argument): New.
* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
(aarch64_pldx): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/builtin_pld_pli.c: New.
* gcc.target/aarch64/builtin_pld_pli_illegal.c: New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 136 ++
 gcc/config/aarch64/aarch64.md |  12 ++
 gcc/config/aarch64/arm_acle.h |  30 
 .../gcc.target/aarch64/builtin_pld_pli.c  |  90 
 .../aarch64/builtin_pld_pli_illegal.c |  33 +
 5 files changed, 301 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d092654b6fb 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,10 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  AARCH64_PLD,
+  AARCH64_PLDX,
+  AARCH64_PLI,
+  AARCH64_PLIX,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1798,6 +1802,34 @@ aarch64_init_rng_builtins (void)
   AARCH64_BUILTIN_RNG_RNDRRS);
 }
 
+/* Add builtins for data and instrution prefetch.  */
+static void
+aarch64_init_prefetch_builtin (void)
+{
+#define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N)
\
+  aarch64_builtin_decls[INDEX] =   \
+aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
+
+  tree ftype;
+  tree cv_argtype;
+  cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
+| TYPE_QUAL_VOLATILE);
+  cv_argtype = build_pointer_type (cv_argtype);
+
+  ftype = build_function_type_list (void_type_node, cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLD, "pld");
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLI, "pli");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+   unsigned_type_node, unsigned_type_node,
+   cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLDX, "pldx");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+

[PATCH v2 3/5] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2023-11-28 Thread Victor Do Nascimento

This patch updates `aarch64-sys-regs.def', bringing it into sync with
the Binutils source.

gcc/ChangeLog:

* config/aarch64/aarch64-sys-regs.def (par_el1): New.
(rcwmask_el1): Likewise.
(rcwsmask_el1): Likewise.
(ttbr0_el1): Likewise.
(ttbr0_el12): Likewise.
(ttbr0_el2): Likewise.
(ttbr1_el1): Likewise.
(ttbr1_el12): Likewise.
(ttbr1_el2): Likewise.
(vttbr_el2): Likewise.
(gcspr_el0): Likewise.
(gcspr_el1): Likewise.
(gcspr_el12): Likewise.
(gcspr_el2): Likewise.
(gcspr_el3): Likewise.
(gcscre0_el1): Likewise.
(gcscr_el1): Likewise.
(gcscr_el12): Likewise.
(gcscr_el2): Likewise.
(gcscr_el3): Likewise.
---
 gcc/config/aarch64/aarch64-sys-regs.def | 30 +
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def
index d24a2455503..96bdadb0b0f 100644
--- a/gcc/config/aarch64/aarch64-sys-regs.def
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -419,6 +419,16 @@
   SYSREG ("fpcr",  CPENC (3,3,4,4,0),  0,  
AARCH64_NO_FEATURES)
   SYSREG ("fpexc32_el2",   CPENC (3,4,5,3,0),  0,  
AARCH64_NO_FEATURES)
   SYSREG ("fpsr",  CPENC (3,3,4,4,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("gcspr_el0", CPENC (3,3,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcspr_el1", CPENC (3,0,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcspr_el2", CPENC (3,4,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcspr_el12",CPENC (3,5,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcspr_el3", CPENC (3,6,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscre0_el1",   CPENC (3,0,2,5,2),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscr_el1", CPENC (3,0,2,5,0),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscr_el2", CPENC (3,4,2,5,0),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscr_el12",CPENC (3,5,2,5,0),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscr_el3", CPENC (3,6,2,5,0),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
   SYSREG ("gcr_el1",   CPENC (3,0,1,0,6),  F_ARCHEXT,  
AARCH64_FEATURE (MEMTAG))
   SYSREG ("gmid_el1",  CPENC (3,1,0,0,4),  F_REG_READ|F_ARCHEXT,   
AARCH64_FEATURE (MEMTAG))
   SYSREG ("gpccr_el3", CPENC (3,6,2,1,6),  0,  
AARCH64_NO_FEATURES)
@@ -584,7 +594,7 @@
   SYSREG ("oslar_el1", CPENC (2,0,1,0,4),  F_REG_WRITE,
AARCH64_NO_FEATURES)
   SYSREG ("oslsr_el1", CPENC (2,0,1,1,4),  F_REG_READ, 
AARCH64_NO_FEATURES)
   SYSREG ("pan",   CPENC (3,0,4,2,3),  F_ARCHEXT,  
AARCH64_FEATURE (PAN))
-  SYSREG ("par_el1",   CPENC (3,0,7,4,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("par_el1",   CPENC (3,0,7,4,0),  F_REG_128,  
AARCH64_NO_FEATURES)
   SYSREG ("pmbidr_el1",CPENC (3,0,9,10,7), 
F_REG_READ|F_ARCHEXT,   AARCH64_FEATURE (PROFILE))
   SYSREG ("pmblimitr_el1", CPENC (3,0,9,10,0), F_ARCHEXT,  
AARCH64_FEATURE (PROFILE))
   SYSREG ("pmbptr_el1",CPENC (3,0,9,10,1), F_ARCHEXT,  
AARCH64_FEATURE (PROFILE))
@@ -746,6 +756,8 @@
   SYSREG ("prlar_el2", CPENC (3,4,6,8,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8R))
   SYSREG ("prselr_el1",CPENC (3,0,6,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8R))
   SYSREG ("prselr_el2",CPENC (3,4,6,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8R))
+  SYSREG ("rcwmask_el1",   CPENC (3,0,13,0,6), F_ARCHEXT|F_REG_128,
AARCH64_FEATURE (THE))
+  SYSREG ("rcwsmask_el1",  CPENC (3,0,13,0,3), F_ARCHEXT|F_REG_128,
AARCH64_FEATURE (THE))
   SYSREG ("revidr_el1",CPENC (3,0,0,0,6),  F_REG_READ, 
AARCH64_NO_FEATURES)
   SYSREG ("rgsr_el1",  CPENC (3,0,1,0,5),  F_ARCHEXT,  
AARCH64_FEATURE (MEMTAG))
   SYSREG ("rmr_el1",   CPENC (3,0,12,0,2), 0,  
AARCH64_NO_FEATURES)
@@ -1034,13 +1046,13 @@
   SYSREG ("trfcr_el1", CPENC (3,0,1,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_4A))
   SYSREG ("trfcr_el12",CPENC (3,5,1,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_4A))
   SYSREG ("trfcr_el2", CPENC (3,4,1,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_4A))
-  SYSREG ("ttbr0_el1", CPENC (3,0,2,0,0),  0,

[PATCH v2 1/5] aarch64: Add march flags for +the and +d128 arch extensions

2023-11-28 Thread Victor Do Nascimento

Given the introduction of optional 128-bit page table descriptor and
translation hardening extension support with the Arm9.4-a
architecture, this introduces the relevant flags to enable the reading
and writing of 128-bit system registers.

The `+d128' -march modifier enables the use of the following ACLE
builtin functions:

  * __uint128_t __arm_rsr128(const char *special_register);
  * void __arm_wsr128(const char *special_register, __uint128_t value);

and defines the __ARM_FEATURE_SYSREG128 macro to 1.

Finally, the `rcwmask_el1' and `rcwsmask_el1' 128-bit system register
implementations are also reliant on the enablement of the `+the' flag,
which is thus also implemented in this patch.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (armv8.9-a): New.
(armv9.4-a): Likewise.
* config/aarch64/aarch64-option-extensions.def (d128): Likewise.
(the): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_V9_4A): Likewise.
(AARCH64_ISA_V8_9A): Likewise.
(TARGET_ARMV9_4): Likewise.
(AARCH64_ISA_D128): Likewise.
(AARCH64_ISA_THE): Likewise.
(TARGET_D128): Likewise.
* doc/invoke.texi (AArch64 Options): Document new -march flags
and extensions.
---
 gcc/config/aarch64/aarch64-arches.def|  2 ++
 gcc/config/aarch64/aarch64-c.cc  |  1 +
 gcc/config/aarch64/aarch64-option-extensions.def |  4 
 gcc/config/aarch64/aarch64.h | 15 +++
 gcc/doc/invoke.texi  |  6 ++
 5 files changed, 28 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 6b9a19c490b..1fe6b796001 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -39,10 +39,12 @@ AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 
8,  (V8_4A, SB, SSBS
 AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
 AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, LS64))
 AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
+AARCH64_ARCH("armv8.9-a", generic_armv8_a,   V8_9A, 8,  (V8_8A))
 AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
 AARCH64_ARCH("armv9-a",   generic_armv9_a,   V9A  , 9,  (V8_5A, SVE2))
 AARCH64_ARCH("armv9.1-a", generic_armv9_a,   V9_1A, 9,  (V8_6A, V9A))
 AARCH64_ARCH("armv9.2-a", generic_armv9_a,   V9_2A, 9,  (V8_7A, V9_1A))
 AARCH64_ARCH("armv9.3-a", generic_armv9_a,   V9_3A, 9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv9.4-a", generic_armv9_a,   V9_4A, 9,  (V8_9A, V9_3A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index be8b7236cf9..cacf8e8ed25 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -206,6 +206,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_LS64,
"__ARM_FEATURE_LS64", pfile);
   aarch64_def_or_undef (AARCH64_ISA_RCPC, "__ARM_FEATURE_RCPC", pfile);
+  aarch64_def_or_undef (TARGET_D128, "__ARM_FEATURE_SYSREG128", pfile);
 
   /* Not for ACLE, but required to keep "float.h" correct if we switch
  target between implementations that do or do not support ARMv8.2-A
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 825f3bf7758..da31f7c32d1 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -151,4 +151,8 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
+AARCH64_OPT_EXTENSION("d128", D128, (), (), (), "d128")
+
+AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index efe2036537e..5f0486cb128 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -219,13 +219,17 @@ enum class aarch64_feature : unsigned char {
 #define AARCH64_ISA_PAUTH (aarch64_isa_flags & AARCH64_FL_PAUTH)
 #define AARCH64_ISA_V8_7A (aarch64_isa_flags & AARCH64_FL_V8_7A)
 #define AARCH64_ISA_V8_8A (aarch64_isa_flags & AARCH64_FL_V8_8A)
+#define AARCH64_ISA_V8_9A (aarch64_isa_flags & AARCH64_FL_V8_9A)
 #define AARCH64_ISA_V9A   (aarch64_isa_flags & AARCH64_FL_V9A)
 #define AARCH64_ISA_V9_1A  (aarch64_isa_flags & AARCH64_FL_V9_1A)
 #define AARCH64_ISA_V9_2A  (aarch64_isa_flags & AARCH64_FL_V9_2A)
 #define AARCH64_ISA_V9_3A  (aarch64_isa_flags & AARCH64_FL_V9_3A)
+#define AARCH64_ISA_V9_4A  (aarch64_isa_flags & AARCH64_FL_V9_4A)
 #define AARCH64_ISA_MOPS  (aarch64_isa_flags & AARCH64_FL_MOPS)
 #define AARCH64_ISA_LS64  (aarch64_isa_flags & AARCH64_FL_LS64)

[PATCH v2 4/5] aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins

2023-11-28 Thread Victor Do Nascimento

Implement the ACLE builtins for 128-bit system register manipulation:

  * __uint128_t __arm_rsr128(const char *special_register);
  * void __arm_wsr128(const char *special_register, __uint128_t value);

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New
`enum aarch64_builtins' value.
(AARCH64_WSR128): Likewise.
(aarch64_init_rwsr_builtins): Init `__builtin_aarch64_rsr128'
and `__builtin_aarch64_wsr128' builtins.
(aarch64_expand_rwsr_builtin): Extend function to handle
`__builtin_aarch64_{rsr|wsr}128'.
* config/aarch64/aarch64-protos.h (aarch64_retrieve_sysreg):
Update function signature.
* config/aarch64/aarch64.cc (F_REG_128): New.
(aarch64_retrieve_sysreg): Add 128-bit register mode check.
* config/aarch64/aarch64.md (UNSPEC_SYSREG_RTI): New.
(UNSPEC_SYSREG_WTI): Likewise.
(aarch64_read_sysregti): Likewise.
(aarch64_write_sysregti): Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc | 50 +-
 gcc/config/aarch64/aarch64-protos.h|  2 +-
 gcc/config/aarch64/aarch64.cc  |  9 +++--
 gcc/config/aarch64/aarch64.md  | 18 ++
 gcc/config/aarch64/arm_acle.h  | 11 ++
 5 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index c5f20f68bca..1f2b2721f5a 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -815,11 +815,13 @@ enum aarch64_builtins
   AARCH64_RSR64,
   AARCH64_RSRF,
   AARCH64_RSRF64,
+  AARCH64_RSR128,
   AARCH64_WSR,
   AARCH64_WSRP,
   AARCH64_WSR64,
   AARCH64_WSRF,
   AARCH64_WSRF64,
+  AARCH64_WSR128,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1842,6 +1844,10 @@ aarch64_init_rwsr_builtins (void)
 = build_function_type_list (double_type_node, const_char_ptr_type, NULL);
   AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
 
+  fntype
+= build_function_type_list (uint128_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR128, rsr128, fntype);
+
   fntype
 = build_function_type_list (void_type_node, const_char_ptr_type,
uint32_type_node, NULL);
@@ -1867,6 +1873,12 @@ aarch64_init_rwsr_builtins (void)
 = build_function_type_list (void_type_node, const_char_ptr_type,
double_type_node, NULL);
   AARCH64_INIT_RWSR_BUILTINS_DECL (WSRF64, wsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint128_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR128, wsr128, fntype);
+
 }
 
 /* Initialize the memory tagging extension (MTE) builtins.  */
@@ -2710,6 +2722,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
   tree arg0, arg1;
   rtx const_str, input_val, subreg;
   enum machine_mode mode;
+  enum insn_code icode;
   class expand_operand ops[2];
 
   arg0 = CALL_EXPR_ARG (exp, 0);
@@ -2718,7 +2731,18 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
   || fcode == AARCH64_WSRP
   || fcode == AARCH64_WSR64
   || fcode == AARCH64_WSRF
-  || fcode == AARCH64_WSRF64);
+  || fcode == AARCH64_WSRF64
+  || fcode == AARCH64_WSR128);
+
+  bool op128 = (fcode == AARCH64_RSR128 || fcode == AARCH64_WSR128);
+  enum machine_mode sysreg_mode = op128 ? TImode : DImode;
+
+  if (op128 && !TARGET_D128)
+{
+  error_at (EXPR_LOCATION (exp), "128-bit system register support requires"
+" the % extension");
+  return const0_rtx;
+}
 
   /* Argument 0 (system register name) must be a string literal.  */
   gcc_assert (TREE_CODE (arg0) == ADDR_EXPR
@@ -2741,7 +2765,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
 sysreg_name[pos] = TOLOWER (sysreg_name[pos]);
 
   const char* name_output = aarch64_retrieve_sysreg ((const char *) 
sysreg_name,
-write_op);
+write_op, op128);
   if (name_output == NULL)
 {
   error_at (EXPR_LOCATION (exp), "invalid system register name provided");
@@ -2760,13 +2784,17 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
   mode = TYPE_MODE (TREE_TYPE (arg1));
   input_val = copy_to_mode_reg (mode, expand_normal (arg1));
 
+  icode = (op128 ? CODE_FOR_aarch64_write_sysregti
+: CODE_FOR_aarch64_write_sysregdi);
+
   switch (fcode)
{
case AARCH64_WSR:
case AARCH64_WSRP:
case AARCH64_WSR64:
case AARCH64_WSRF64:
- subreg = lowpart_subreg (DImode, input_val, mode);
+   case AARCH64_WSR128:
+ subreg = lowpart_subreg (sysreg_mode, input_val, mode);

[PATCH v2 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-28 Thread Victor Do Nascimento

Extend existing unit tests for the ACLE system register manipulation
functions to include 128-bit tests.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New.
(set_wsr128): Likewise.
---
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
index 93c48c4caf0..6feb0bef2d6 100644
--- a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
@@ -6,6 +6,38 @@
 
 #include 
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv9.4-a+d128")
+
+#ifndef __ARM_FEATURE_SYSREG128
+#error "__ARM_FEATURE_SYSREG128 feature macro not defined."
+#endif
+
+/*
+** get_rsr128:
+** mrrsx0, x1, s3_0_c7_c4_0
+** ...
+*/
+__uint128_t
+get_rsr128 ()
+{
+  __arm_rsr128 ("par_el1");
+}
+
+/*
+** set_wsr128:
+** ...
+** msrrs3_0_c7_c4_0, x0, x1
+** ...
+*/
+void
+set_wsr128 (__uint128_t c)
+{
+  __arm_wsr128 ("par_el1", c);
+}
+
+#pragma GCC pop_options
+
 /*
 ** get_rsr:
 ** ...
-- 
2.42.0

[PATCH v2 0/5] aarch64: Add Armv9.4-a 128-bit system-register read/write support

2023-11-28 Thread Victor Do Nascimento

Changes from v1 -
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635531.html

  * [PATCH 4/5] - For `error_at' message, put feature name in quotes.
  * [PATCH 4/5] - For `aarch64_retrieve_sysreg' function, add
  description of new parameter to comments.
  * [PATCH 5/5] - Reduce the minimum arch requirements of the system
  register unit tests, selectively using `#pragma GCC target' when
  testing 128-bit sysreg r/w functions.

---

Given the introduction of optional 128-bit page table descriptor and
translation hardening extension support with the Arm9.4-a
architecture, this patch series introduces the necessary changes to
the aarch64-specific builtin code to enable the reading and writing of
128-bit system registers.  In so doing, the following ACLE builtins and
feature macro are made available to the compiler:

  * __uint128_t __arm_rsr128(const char *special_register);
  * void __arm_wsr128(const char *special_register, __uint128_t value);
  * __ARM_FEATURE_SYSREG128.

Finally, in order to update the GCC system-register database bringing
it in line with Binutils, and in so doing add the relevant 128-bit
system registers to GCC, this patch also introduces the Guarded
Control Stack (GCS) `+gcs' architecture modifier flag, allowing the
inclusion of the novel GCS system registers which are now supported
and also present in the `aarch64-sys-regs.def' system register
database.

Victor Do Nascimento (5):
  aarch64: Add march flags for +the and +d128 arch extensions
  aarch64: Add support for GCS system registers with the +gcs modifier
  aarch64: Sync `aarch64-sys-regs.def' with Binutils.
  aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins
  aarch64: Add rsr128 and wsr128 ACLE tests

 gcc/config/aarch64/aarch64-arches.def |  2 +
 gcc/config/aarch64/aarch64-builtins.cc| 50 ---
 gcc/config/aarch64/aarch64-c.cc   |  1 +
 .../aarch64/aarch64-option-extensions.def |  6 +++
 gcc/config/aarch64/aarch64-protos.h   |  2 +-
 gcc/config/aarch64/aarch64-sys-regs.def   | 30 +++
 gcc/config/aarch64/aarch64.cc |  9 +++-
 gcc/config/aarch64/aarch64.h  | 21 
 gcc/config/aarch64/aarch64.md | 18 +++
 gcc/config/aarch64/arm_acle.h | 11 
 gcc/doc/invoke.texi   |  8 +++
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 32 
 12 files changed, 170 insertions(+), 20 deletions(-)

-- 
2.42.0

[PATCH v2 2/5] aarch64: Add support for GCS system registers with the +gcs modifier

2023-11-28 Thread Victor Do Nascimento

Given the introduction of system registers associated with the Guarded
Control Stack extension to Armv9.4-a in Binutils and their reliance on
the `+gcs' modifier, we implement the necessary changes in GCC to
allow for them to be recognized by the compiler.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (gcs): New.
* config/aarch64/aarch64.h (AARCH64_ISA_GCS): New.
(TARGET_THE):  Likewise.
* doc/invoke.texi (AArch64 Options): Describe GCS.
---
 gcc/config/aarch64/aarch64-option-extensions.def | 2 ++
 gcc/config/aarch64/aarch64.h | 6 ++
 gcc/doc/invoke.texi  | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index da31f7c32d1..e72c039b612 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -155,4 +155,6 @@ AARCH64_OPT_EXTENSION("d128", D128, (), (), (), "d128")
 
 AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
 
+AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 5f0486cb128..d84ff3fc8ba 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -230,6 +230,7 @@ enum class aarch64_feature : unsigned char {
 #define AARCH64_ISA_CSSC  (aarch64_isa_flags & AARCH64_FL_CSSC)
 #define AARCH64_ISA_D128  (aarch64_isa_flags & AARCH64_FL_D128)
 #define AARCH64_ISA_THE   (aarch64_isa_flags & AARCH64_FL_THE)
+#define AARCH64_ISA_GCS   (aarch64_isa_flags & AARCH64_FL_GCS)
 
 /* AARCH64_FL options necessary for system register implementation.  */
 
@@ -403,6 +404,11 @@ enum class aarch64_feature : unsigned char {
 enabled through +the.  */
 #define TARGET_THE (AARCH64_ISA_THE)
 
+/*  Armv9.4-A Guarded Control Stack extension system registers are
+enabled through +gcs.  */
+#define TARGET_GCS (AARCH64_ISA_GCS)
+
+
 /* Standard register usage.  */
 
 /* 31 64-bit general purpose registers R0-R30:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c76205bf6e8..75551ef2ace 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2,6 +2,8 @@ Enable the Pointer Authentication Extension.
 Enable the Common Short Sequence Compression instructions.
 @item d128
 Enable support for 128-bit system register read/write instructions.
+@item gcs
+Enable support for Armv9.4-a Guarded Control Stack extension.
 @item the
 Enable support for Armv8.9-a/9.4-a translation hardening extension.
 
-- 
2.42.0

[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2023-11-13 Thread Victor Do Nascimento

Continuing on from previously-proposed Libatomic enablement work [1],
the introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing both the Load-Acquire RCpc Pair
Ordered, and Store-Release Pair Ordered operations in the form of 
LDIAPP and STILP.

These operations single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.

The possibility that a core implements (beyond LSE & LSE2) both the
LSE128 and RCPC3 features has also required that support for up to 4
ifuncs (up from 3 before) be added, so that the lse128+rcpc option is
available for selection at runtime.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636287.html

libatomic/ChangeLog:

  * libatomic_i.h (GEN_SELECTOR): define for
  IFUNC_NCOND(N) == 4.
  * configure.ac: Add call to LIBAT_TEST_FEAT_LRCPC3() test.
  * configure: Regenerate.
  * config/linux/aarch64/host-config.h (HAS_LRCPC3): New.
  (has_rcpc3): Likewise.
  * config/linux/aarch64/atomic_16.S (libat_load_16): Add
  LRCPC3 variant.
  (libat_store_16): Likewise.
  * acinclude.m4 (LIBAT_TEST_FEAT_LRCPC3): New.
  (HAVE_FEAT_LRCPC3): Likewise
  (ARCH_AARCH64_HAVE_LRCPC3): Likewise.
  * Makefile.am (AM_CPPFLAGS): Conditionally append
  -DHAVE_FEAT_LRCPC3 flag.
---
 libatomic/Makefile.am|  6 +-
 libatomic/Makefile.in| 22 +++--
 libatomic/acinclude.m4   | 19 
 libatomic/auto-config.h.in   |  3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 94 +++-
 libatomic/config/linux/aarch64/host-config.h | 26 +-
 libatomic/configure  | 59 +++-
 libatomic/configure.ac   |  1 +
 libatomic/libatomic_i.h  | 18 
 9 files changed, 230 insertions(+), 18 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index 24e843db67d..dee38e46af9 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,8 +130,12 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
_$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+AM_CPPFLAGS  =
 if ARCH_AARCH64_HAVE_LSE128
-AM_CPPFLAGS = -DHAVE_FEAT_LSE128
+AM_CPPFLAGS += -DHAVE_FEAT_LSE128
+endif
+if ARCH_AARCH64_HAVE_LRCPC3
+AM_CPPFLAGS+= -DHAVE_FEAT_LRCPC3
 endif
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index cd48fa21334..8e87d12907a 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -89,15 +89,17 @@ POST_UNINSTALL = :
 build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
-@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_1 = $(foreach 
s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
-@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2 = atomic_16.S
-@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(foreach \
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_1
 = -DHAVE_FEAT_LSE128
+@ARCH_AARCH64_HAVE_LRCPC3_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2
 = -DHAVE_FEAT_LRCPC3
+@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(foreach 
s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
+@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = atomic_16.S
+@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_5 = $(foreach \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ s,$(SIZES),$(addsuffix \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ _$(s)_1_.lo,$(SIZEOBJS))) \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ $(addsuffix \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ _8_2_.lo,$(SIZEOBJS))
-@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = $(addsuffix 
_8_1_.lo,$(SIZEOBJS))
-@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_5 = $(addsuffix 
_16_1_.lo,$(SIZEOBJS)) \
+@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_6 = $(addsuffix 
_8_1_.lo,$(SIZEOBJS))
+@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_7 = $(addsuffix 
_16_1_.lo,$(SIZEOBJS)) \
 @ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@   $(addsuffix 
_16_2_.lo,$(SIZEOBJS))
 
 subdir = .
@@ -424,7 +426,7 @@ libatomic_la_LDFLAGS = $(libatomic_version_info) 
$(libatomic_version_script) \
$(lt_host_flags) $(libatomic_darwin_rpath)
 
 libatomic_la_SOURCES = gload.c gstore.c gcas.c gexch.c glfree.c lock.c \
-   init.c fenv.c fence.c flag.c $(am__append_2)
+   init.c fenv.c fence.c flag.c $(am__append_4)
 SIZEOBJS = load store cas exch fadd fsub fand fior fxor fnand tas
 EXTRA_libatomic_la_SOURCES =

[PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2023-11-13 Thread Victor Do Nascimento

The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

  * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
  value held in a pair of registers, with original data loaded into
  the same 2 registers.
  * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
  in a pair of registers, with original data loaded into the same 2
  registers.
  * SWPP - Atomic swap of one 128-bit value with 128-bit value held
  in a pair of registers.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

  1. Add a configure-time check to check for LSE128 support in the
  assembler.
  2. Edit host-config.h so that when N == 16, nifunc = 2.
  3. Where available due to LSE128, implement the second ifunc, making
  use of the novel instructions.
  4. For atomic functions unable to make use of these new
  instructions, define a new alias which causes the _i1 function
  variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

* Makefile.am (AM_CPPFLAGS): add conditional setting of
-DHAVE_FEAT_LSE128.
* acinclude.m4 (LIBAT_TEST_FEAT_LSE128): New.
* config/linux/aarch64/atomic_16.S (LSE128): New macro
definition.
(libat_exchange_16): New LSE128 variant.
(libat_fetch_or_16): Likewise.
(libat_or_fetch_16): Likewise.
(libat_fetch_and_16): Likewise.
(libat_and_fetch_16): Likewise.
* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
(IFUNC_NCOND): Add operand size checking.
(has_lse2): Renamed from `ifunc1`.
(has_lse128): New.
(HAS_LSE128): Likewise.
* libatomic/configure.ac: Add call to LIBAT_TEST_FEAT_LSE128.
* configure (ac_subst_vars): Regenerated via autoreconf.
* libatomic/Makefile.in: Likewise.
* libatomic/auto-config.h.in: Likewise.
---
 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 +++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 170 ++-
 libatomic/config/linux/aarch64/host-config.h |  27 ++-
 libatomic/configure  |  59 ++-
 libatomic/configure.ac   |   1 +
 8 files changed, 274 insertions(+), 9 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index c0b8dea5037..24e843db67d 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
_$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+if ARCH_AARCH64_HAVE_LSE128
+AM_CPPFLAGS = -DHAVE_FEAT_LSE128
+endif
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
 libatomic_la_SOURCES += atomic_16.S
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index dc2330b91fd..cd48fa21334 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
_$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
$(am__append_4) $(am__append_5)
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS
 = -DHAVE_FEAT_LSE128
 @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp 
-DHAVE_KERNEL64
 @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586
diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index f35ab5b60a5..4197db8f404 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
   ])
 ])
 
+dnl
+dnl Test if the host assembler supports armv9.4-a LSE128 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_LSE128],[
+  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
+[libat_cv_have_feat_lse128],[
+AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
+if AC_TRY_EVAL(ac_link); then
+  eval libat_cv_have_feat_lse128=yes
+else
+  eval libat_cv_have_feat_lse128=no
+fi
+rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
+   [Have LSE128 support for 16 byte integers.])
+  AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 
= xyes])
+])
+
 dnl
 dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
 dnl
diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
index ab3424a759e..7c78933b07d 100644
---

[PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2023-11-13 Thread Victor Do Nascimento

The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

  #define LSE2 _i1

and reconstructs function names with the pre-processor's token
concatenation feature, such that for `MACRO(name)', we would now have
`MACRO(name, feature)' and in the macro definition body we replace
`name` with `name##feature`.

libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY): Modify macro to take in `arch' argument.
(END): Likewise.
(ALIAS): Likewise.
(ENTRY1): New macro.
(END1): Likewise.
(ALIAS): Likewise.
---
 libatomic/config/linux/aarch64/atomic_16.S | 147 +++--
 1 file changed, 79 insertions(+), 68 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 0485c284117..3f6225830e6 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -39,22 +39,34 @@
 
.arch   armv8-a+lse
 
-#define ENTRY(name)\
-   .global name;   \
-   .hidden name;   \
-   .type name,%function;   \
-   .p2align 4; \
-name:  \
-   .cfi_startproc; \
+#define ENTRY(name, feat)  \
+   ENTRY1(name, feat)
+
+#define ENTRY1(name, feat) \
+   .global name##feat; \
+   .hidden name##feat; \
+   .type name##feat,%function; \
+   .p2align 4; \
+name##feat:\
+   .cfi_startproc; \
hint34  // bti c
 
-#define END(name)  \
-   .cfi_endproc;   \
-   .size name, .-name;
+#define END(name, feat)\
+   END1(name, feat)
 
-#define ALIAS(alias,name)  \
-   .global alias;  \
-   .set alias, name;
+#define END1(name, feat)   \
+   .cfi_endproc;   \
+   .size name##feat, .-name##feat;
+
+#define ALIAS(alias, from, to) \
+   ALIAS1(alias,from,to)
+
+#define ALIAS1(alias, from, to)\
+   .global alias##from;\
+   .set alias##from, alias##to;
+
+#define CORE
+#define LSE2   _i1
 
 #define res0 x0
 #define res1 x1
@@ -89,7 +101,7 @@ name:\
 #define SEQ_CST 5
 
 
-ENTRY (libat_load_16)
+ENTRY (libat_load_16, CORE)
mov x5, x0
cbnzw1, 2f
 
@@ -104,10 +116,10 @@ ENTRY (libat_load_16)
stxpw4, res0, res1, [x5]
cbnzw4, 2b
ret
-END (libat_load_16)
+END (libat_load_16, CORE)
 
 
-ENTRY (libat_load_16_i1)
+ENTRY (libat_load_16, LSE2)
cbnzw1, 1f
 
/* RELAXED.  */
@@ -127,10 +139,10 @@ ENTRY (libat_load_16_i1)
ldp res0, res1, [x0]
dmb ishld
ret
-END (libat_load_16_i1)
+END (libat_load_16, LSE2)
 
 
-ENTRY (libat_store_16)
+ENTRY (libat_store_16, CORE)
cbnzw4, 2f
 
/* RELAXED.  */
@@ -144,10 +156,10 @@ ENTRY (libat_store_16)
stlxp   w4, in0, in1, [x0]
cbnzw4, 2b
ret
-END (libat_store_16)
+END (libat_store_16, CORE)
 
 
-ENTRY (libat_store_16_i1)
+ENTRY (libat_store_16, LSE2)
cbnzw4, 1f
 
/* RELAXED.  */
@@ -159,10 +171,10 @@ ENTRY (libat_store_16_i1)
stlxp   w4, in0, in1, [x0]
cbnzw4, 1b
ret
-END (libat_store_16_i1)
+END (libat_store_16, LSE2)
 
 
-ENTRY (libat_exchange_16)
+ENTRY (libat_exchange_16, CORE)
mov x5, x0
cbnzw4, 2f
 
@@ -186,10 +198,10 @@ ENTRY (libat_exchange_16)
stlxp   w4, in0, in1, [x5]
cbnzw4, 4b
ret
-END (libat_exchange_16)
+END (libat_exchange_16, CORE)
 
 
-ENTRY (libat_compare_exchange_16)
+ENTRY (libat_compare_exchange_16, CORE)
ldp exp0, exp1, [x1]
cbz w4, 3f
cmp w4, RELEASE
@@ -228,10 +240,10 @@ ENTRY (libat_compare_exchange_16)
cbnzw4, 4b
mov x0, 1
ret
-END (libat_compare_exchange_16)
+END (libat_compare_exchange_16, CORE)
 
 
-ENTRY (libat_compare_exchange_16_i1)
+ENTRY (libat_compare_exchange_16, LSE2)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -264,10 +276,10 @@ ENTRY (libat_compare_exchange_16_i1)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END (libat_compare_exchange_16_i1)
+END (libat_compare_exchange_16, LSE2)
 
 
-ENTRY (libat_fetch_add_16)
+ENTRY (libat_fetch_add_16, CORE)

[PATCH v2 0/2] Libatomic: Add LSE128 atomics support for AArch64

2023-11-13 Thread Victor Do Nascimento

v2 updates:

Move the previously unguarded definition of IFUNC_NCONDN(N) in
`host-config.h' to within the scope of `#ifdef HWCAP_USCAP'.
This is done so that its definition is not only contingent on the
value of N but also on the definition of HWCAP_USCAP as it was found
that building on systems where !HWCAP_USCAP and N == 16 led to a
previously-undetected build error.

---

Building upon Wilco Dijkstra's work on AArch64 128-bit atomics for
Libatomic, namely the patches from [1] and [2],  this patch series
extends the library's  capabilities to dynamically select and emit
Armv9.4-a LSE128 implementations of atomic operations via ifuncs at
run-time whenever architectural support is present.

Regression tested on aarch64-linux-gnu target with LSE128-support.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html

Victor Do Nascimento (2):
  libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
  libatomic: Enable LSE128 128-bit atomics for armv9.4-a

 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 ++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 315 ++-
 libatomic/config/linux/aarch64/host-config.h |  27 +-
 libatomic/configure  |  59 +++-
 libatomic/configure.ac   |   1 +
 8 files changed, 352 insertions(+), 76 deletions(-)

-- 
2.42.0

[PATCH 5/5] aarch64: rcpc3: Add intrinsics tests

2023-11-09 Thread Victor Do Nascimento

Add unit test to ensure that added intrinsics compile to the correct
`LDAP1 {Vt.D}[lane],[Xn]' and `STL1 {Vt.d}[lane],[Xn]' instructions.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rcpc3.c: New.
---
 gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c | 47 +++
 1 file changed, 47 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c

diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c
new file mode 100644
index 000..689d047ab91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c
@@ -0,0 +1,47 @@
+/* Test the rcpc3 ACLE intrinsics.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.2-a+rcpc3" } */
+#include 
+#include 
+
+#define TEST_LDAP(TYPE, T) \
+  TYPE##x##1_t T##1_test (TYPE##_t const * ptr, TYPE##x##1_t src) {\
+return vldap1_lane_##T##64 (ptr, src, 0);  \
+  }
+
+#define TEST_LDAPQ(TYPE, T)\
+  TYPE##x##2_t T##2_test (TYPE##_t const * ptr, TYPE##x##2_t src) {\
+return vldap1q_lane_##T##64 (ptr, src, 1); \
+  }
+
+#define TEST_STL(TYPE, T)  \
+  void T##1s_test (TYPE##_t * ptr, TYPE##x##1_t src) { \
+vstl1_lane_##T##64 (ptr, src, 0);  \
+  }
+
+#define TEST_STLQ(TYPE, T) \
+  void T##2s_test (TYPE##_t * ptr, TYPE##x##2_t src) { \
+vstl1q_lane_##T##64 (ptr, src, 1); \
+  }
+
+TEST_LDAP (uint64, u);
+TEST_LDAP (int64, s);
+TEST_LDAP (float64, f);
+TEST_LDAP (poly64, p);
+/* { dg-final { scan-assembler-times {ldap1\t\{v\d.d\}\[0\], \[x\d\]} 4 } } */
+TEST_LDAPQ (uint64, u);
+TEST_LDAPQ (int64, s);
+TEST_LDAPQ (float64, f);
+TEST_LDAPQ (poly64, p);
+/* { dg-final { scan-assembler-times {ldap1\t\{v\d.d\}\[1\], \[x\d\]} 4 } } */
+
+TEST_STL (uint64, u);
+TEST_STL (int64, s);
+TEST_STL (float64, f);
+TEST_STL (poly64, p);
+/* { dg-final { scan-assembler-times {stl1\t\{v\d.d\}\[0\], \[x\d\]} 4 } } */
+TEST_STLQ (uint64, u);
+TEST_STLQ (int64, s);
+TEST_STLQ (float64, f);
+TEST_STLQ (poly64, p);
+/* { dg-final { scan-assembler-times {stl1\t\{v\d.d\}\[1\], \[x\d\]} 4 } } */
-- 
2.41.0

[PATCH 1/5] aarch64: rcpc3: Add +rcpc3 extension

2023-11-09 Thread Victor Do Nascimento

Given the optional LRCPC3 target support for Armv8.2-a cores onwards,
the +rcpc3 arch feature modifier is added to GCC's command-line options.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (rcpc3): New.
* config/aarch64/aarch64.h (AARCH64_ISA_RCPC3): Likewise.
(TARGET_RCPC3): Likewise.
* doc/invoke.texi (rcpc3): Document feature in AArch64 Options.
---
 gcc/config/aarch64/aarch64-option-extensions.def | 1 +
 gcc/config/aarch64/aarch64.h | 4 
 gcc/doc/invoke.texi  | 4 
 3 files changed, 9 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 825f3bf7758..2ab94799d34 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -151,4 +151,5 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
+AARCH64_OPT_EXTENSION("rcpc3", RCPC3, (), (), (), "rcpc3")
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 2f0777a37ac..68bbaccef1a 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -213,6 +213,7 @@ enum class aarch64_feature : unsigned char {
 #define AARCH64_ISA_F64MM (aarch64_isa_flags & AARCH64_FL_F64MM)
 #define AARCH64_ISA_BF16  (aarch64_isa_flags & AARCH64_FL_BF16)
 #define AARCH64_ISA_SB(aarch64_isa_flags & AARCH64_FL_SB)
+#define AARCH64_ISA_RCPC3 (aarch64_isa_flags & AARCH64_FL_RCPC3)
 #define AARCH64_ISA_V8R   (aarch64_isa_flags & AARCH64_FL_V8R)
 #define AARCH64_ISA_PAUTH (aarch64_isa_flags & AARCH64_FL_PAUTH)
 #define AARCH64_ISA_V9A   (aarch64_isa_flags & AARCH64_FL_V9A)
@@ -344,6 +345,9 @@ enum class aarch64_feature : unsigned char {
and sign-extending versions.*/
 #define TARGET_RCPC2 (AARCH64_ISA_RCPC8_4)
 
+/* RCPC3 LDAP1/STL1 loads/stores from Armv8.2-a.  */
+#define TARGET_RCPC3 (AARCH64_ISA_RCPC3)
+
 /* Apply the workaround for Cortex-A53 erratum 835769.  */
 #define TARGET_FIX_ERR_A53_835769  \
   ((aarch64_fix_a53_err835769 == 2)\
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6e776a0faa1..ba28eb195ce 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21028,6 +21028,10 @@ Enable the Flag Manipulation instructions Extension.
 Enable the Pointer Authentication Extension.
 @item cssc
 Enable the Common Short Sequence Compression instructions.
+@item rcpc3
+Enable the RCpc3 extension.  This enables the use of the LDAP1 and
+STL1 instructions for loads/stores of 64-bit values to and from SIMD
+register lanes, passing these on to the assembler.
 
 @end table
 
-- 
2.41.0

[PATCH 4/5] aarch64: rcpc3: add Neon ACLE wrapper functions to `arm_neon.h'

2023-11-09 Thread Victor Do Nascimento

Create the necessary mappings from the ACLE-defined Neon intrinsics
names[1] to the internal builtin function names.

[1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html

gcc/ChangeLog:

* gcc/config/aarch64/arm_neon.h (vldap1_lane_u64): New.
(vldap1q_lane_u64): Likewise.
(vldap1_lane_s64): Likewise.
(vldap1q_lane_s64): Likewise.
(vldap1_lane_f64): Likewise.
(vldap1q_lane_f64): Likewise.
(vldap1_lane_p64): Likewise.
(vldap1q_lane_p64): Likewise.
(vstl1_lane_u64): Likewise.
(vstl1q_lane_u64): Likewise.
(vstl1_lane_s64): Likewise.
(vstl1q_lane_s64): Likewise.
(vstl1_lane_f64): Likewise.
(vstl1q_lane_f64): Likewise.
(vstl1_lane_p64): Likewise.
(vstl1q_lane_p64): Likewise.
---
 gcc/config/aarch64/arm_neon.h | 129 ++
 1 file changed, 129 insertions(+)

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 349f3167699..ef0d75e07ce 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -13446,6 +13446,135 @@ vld1q_lane_u64 (const uint64_t *__src, uint64x2_t 
__vec, const int __lane)
   return __aarch64_vset_lane_any (*__src, __vec, __lane);
 }
 
+#pragma GCC push_options
+#pragma GCC target ("+nothing+rcpc3+simd")
+
+/* vldap1_lane.  */
+
+__extension__ extern __inline uint64x1_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vldap1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
+{
+  return __builtin_aarch64_vec_ldap1_lanev1di_usus (
+ (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vldap1q_lane_u64 (const uint64_t *__src, uint64x2_t __vec, const int __lane)
+{
+  return __builtin_aarch64_vec_ldap1_lanev2di_usus (
+ (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+}
+
+__extension__ extern __inline int64x1_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vldap1_lane_s64 (const int64_t *__src, int64x1_t __vec, const int __lane)
+{
+  return __builtin_aarch64_vec_ldap1_lanev1di (__src, __vec, __lane);
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vldap1q_lane_s64 (const int64_t *__src, int64x2_t __vec, const int __lane)
+{
+  return __builtin_aarch64_vec_ldap1_lanev2di (__src, __vec, __lane);
+}
+
+__extension__ extern __inline float64x1_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vldap1_lane_f64 (const float64_t *__src, float64x1_t __vec, const int __lane)
+{
+  return __builtin_aarch64_vec_ldap1_lanev1df (__src, __vec, __lane);
+}
+
+__extension__ extern __inline float64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vldap1q_lane_f64 (const float64_t *__src, float64x2_t __vec, const int __lane)
+{
+  return __builtin_aarch64_vec_ldap1_lanev2df (__src, __vec, __lane);
+}
+
+__extension__ extern __inline poly64x1_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vldap1_lane_p64 (const poly64_t *__src, poly64x1_t __vec, const int __lane)
+{
+  return __builtin_aarch64_vec_ldap1_lanev1di_psps (
+ (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+}
+
+__extension__ extern __inline poly64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vldap1q_lane_p64 (const poly64_t *__src, poly64x2_t __vec, const int __lane)
+{
+  return __builtin_aarch64_vec_ldap1_lanev2di_psps (
+ (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+}
+
+/* vstl1_lane.  */
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vstl1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
+{
+  __builtin_aarch64_vec_stl1_lanev1di_sus ((__builtin_aarch64_simd_di *) __src,
+  __vec, __lane);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vstl1q_lane_u64 (uint64_t *__src, uint64x2_t __vec, const int __lane)
+{
+  __builtin_aarch64_vec_stl1_lanev2di_sus ((__builtin_aarch64_simd_di *) __src,
+  __vec, __lane);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vstl1_lane_s64 (int64_t *__src, int64x1_t __vec, const int __lane)
+{
+  __builtin_aarch64_vec_stl1_lanev1di (__src, __vec, __lane);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vstl1q_lane_s64 (int64_t *__src, int64x2_t __vec, const int __lane)
+{
+  __builtin_aarch64_vec_stl1_lanev2di (__src, __vec, __lane);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vstl1_lane_f64

[PATCH 3/5] aarch64: rcpc3: Add Neon ACLE intrinsics

2023-11-09 Thread Victor Do Nascimento

Register the target specific builtins in `aarch64-simd-builtins.def'
and implement their associated backend patterns in `aarch64-simd.md'.

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def
(vec_ldap1_lane): New.
(vec_stl1_lane): Likewise.
* config/aarch64/aarch64-simd.md
(aarch64_vec_stl1_lanes_lane): New.
(aarch64_vec_stl1_lane): Likewise.
(aarch64_vec_ldap1_lanes_lane): Likewise.
(aarch64_vec_ldap1_lane): Likewise.
---
 gcc/config/aarch64/aarch64-simd-builtins.def |  7 +++
 gcc/config/aarch64/aarch64-simd.md   | 65 
 gcc/config/aarch64/aarch64.md|  2 +
 3 files changed, 74 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index e2b94ad8247..0ae6c4ad41a 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -43,6 +43,13 @@
help describe the attributes (for example, pure) for the intrinsic
function.  */
 
+  BUILTIN_V12DIF (LOADSTRUCT_LANE, vec_ldap1_lane, 0, ALL)
+  BUILTIN_V12DIUP (LOADSTRUCT_LANE_U, vec_ldap1_lane, 0, ALL)
+  BUILTIN_V12DIUP (LOADSTRUCT_LANE_P, vec_ldap1_lane, 0, ALL)
+  BUILTIN_V12DIF (STORESTRUCT_LANE, vec_stl1_lane, 0, ALL)
+  BUILTIN_V12DIUP (STORESTRUCT_LANE_U, vec_stl1_lane, 0, ALL)
+  BUILTIN_V12DIUP (STORESTRUCT_LANE_P, vec_stl1_lane, 0, ALL)
+
   BUILTIN_VDC (BINOP, combine, 0, AUTO_FP)
   BUILTIN_VD_I (BINOPU, combine, 0, NONE)
   BUILTIN_VDC_P (BINOPP, combine, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 81ff5bad03d..79697336f61 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7697,6 +7697,71 @@
   DONE;
 })
 
+;; Patterns for rcpc3 vector lane loads and stores.
+
+(define_insn "aarch64_vec_stl1_lanes_lane"
+  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Q")
+   (unspec:BLK [(match_operand:V12DIF 1 "register_operand" "w")
+(match_operand:SI 2 "immediate_operand" "i")]
+UNSPEC_STL1_LANE))]
+  "TARGET_RCPC3"
+  {
+operands[2] = aarch64_endian_lane_rtx (mode,
+  INTVAL (operands[2]));
+return "stl1\\t{%S1.}[%2], %0";
+  }
+  [(set_attr "type" "neon_store2_one_lane")]
+)
+
+(define_expand "aarch64_vec_stl1_lane"
+ [(match_operand:DI 0 "register_operand")
+  (match_operand:V12DIF 1 "register_operand")
+  (match_operand:SI 2 "immediate_operand")]
+  "TARGET_RCPC3"
+{
+  rtx mem = gen_rtx_MEM (BLKmode, operands[0]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)));
+
+  aarch64_simd_lane_bounds (operands[2], 0,
+   GET_MODE_NUNITS (mode).to_constant (), NULL);
+  emit_insn (gen_aarch64_vec_stl1_lanes_lane (mem,
+   operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "aarch64_vec_ldap1_lanes_lane"
+  [(set (match_operand:V12DIF 0 "register_operand" "=w")
+   (unspec:V12DIF [
+   (match_operand:BLK 1 "aarch64_simd_struct_operand" "Q")
+   (match_operand:V12DIF 2 "register_operand" "0")
+   (match_operand:SI 3 "immediate_operand" "i")]
+   UNSPEC_LDAP1_LANE))]
+  "TARGET_RCPC3"
+  {
+operands[3] = aarch64_endian_lane_rtx (mode,
+  INTVAL (operands[3]));
+return "ldap1\\t{%S0.}[%3], %1";
+  }
+  [(set_attr "type" "neon_load2_one_lane")]
+)
+
+(define_expand "aarch64_vec_ldap1_lane"
+  [(match_operand:V12DIF 0 "register_operand")
+   (match_operand:DI 1 "register_operand")
+   (match_operand:V12DIF 2 "register_operand")
+   (match_operand:SI 3 "immediate_operand")]
+  "TARGET_RCPC3"
+{
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (mode)));
+
+  aarch64_simd_lane_bounds (operands[3], 0,
+   GET_MODE_NUNITS (mode).to_constant (), NULL);
+  emit_insn (gen_aarch64_vec_ldap1_lanes_lane (operands[0],
+   mem, operands[2], operands[3]));
+  DONE;
+})
+
 (define_insn_and_split "aarch64_rev_reglist"
 [(set (match_operand:VSTRUCT_QD 0 "register_operand" "=")
(unspec:VSTRUCT_QD
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5bb8c772be8..fb6de3b1fbf 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -295,6 +295,8 @@
 UNSPEC_LD1RO
 UNSPEC_SALT_ADDR
 UNSPECV_PATCHABLE_AREA
+UNSPEC_LDAP1_LANE
+UNSPEC_STL1_LANE
 ])
 
 (define_c_enum "unspecv" [
-- 
2.41.0

[PATCH 2/5] aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics

2023-11-09 Thread Victor Do Nascimento

The LDAP1 and STL1 Neon ACLE intrinsics, operating on 64-bit data
values, operate on single-lane (Vt.1D) or twin-lane (Vt.2D) SIMD
register configurations, either in the DI or DF modes.  This leads to
the need for a mode iterator accounting for the V1DI, V1DF, V2DI and
V2DF modes.

This patch therefore introduces the new V12DIF mode iterator with
which to generate functions operating on signed 64-bit integer and
float values and V12DIUP for generating the unsigned and
polynomial-type counterparts.  Along with this, we modify the
associated mode attributes accordingly in order to allow for the
implementation of the relevant backend patterns for the intrinsics.

gcc/ChangeLog:

* config/aarch64/iterators.md (V12DIF): New.
(V12DUP): Likewise.
(VEL): Add support for all V12DIF-associated modes.
(Vetype): Add support for V1DI and V1DF.
(Vel): Likewise.
---
 gcc/config/aarch64/iterators.md | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index f9e2210095e..471438e27be 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -314,6 +314,12 @@
 ;; All byte modes.
 (define_mode_iterator VB [V8QI V16QI])
 
+;; 1 and 2 lane DI and DF modes.
+(define_mode_iterator V12DIF [V1DI V1DF V2DI V2DF])
+
+;; 1 and 2 lane DI mode for unsigned and poly types.
+(define_mode_iterator V12DIUP [V1DI V2DI])
+
 ;; 2 and 4 lane SI modes.
 (define_mode_iterator VS [V2SI V4SI])
 
@@ -1195,10 +1201,10 @@
 (define_mode_attr Vetype [(V8QI "b") (V16QI "b")
  (V4HI "h") (V8HI  "h")
  (V2SI "s") (V4SI  "s")
- (V2DI "d")
+ (V2DI "d") (V1DI  "d")
  (V4HF "h") (V8HF  "h")
  (V2SF "s") (V4SF  "s")
- (V2DF "d")
+ (V2DF "d") (V1DF  "d")
  (V2x8QI "b") (V2x4HI "h")
  (V2x2SI "s") (V2x1DI "d")
  (V2x4HF "h") (V2x2SF "s")
@@ -1358,10 +1364,12 @@
 (define_mode_attr VEL [(V8QI  "QI") (V16QI "QI")
   (V4HI "HI") (V8HI  "HI")
   (V2SI "SI") (V4SI  "SI")
-  (DI   "DI") (V2DI  "DI")
+  (DI   "DI") (V1DI  "DI")
+  (V2DI  "DI")
   (V4HF "HF") (V8HF  "HF")
   (V2SF "SF") (V4SF  "SF")
-  (DF   "DF") (V2DF  "DF")
+  (DF   "DF") (V1DF  "DF")
+  (V2DF  "DF")
   (SI   "SI") (HI"HI")
   (QI   "QI")
   (V4BF "BF") (V8BF "BF")
@@ -1378,12 +1386,13 @@
 (define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
   (V4HI "hi") (V8HI "hi")
   (V2SI "si") (V4SI "si")
-  (DI   "di") (V2DI "di")
+  (DI   "di") (V1DI "si")
+  (V2DI "di")
   (V4HF "hf") (V8HF "hf")
   (V2SF "sf") (V4SF "sf")
-  (V2DF "df") (DF   "df")
-  (SI   "si") (HI   "hi")
-  (QI   "qi")
+  (V1DF "df") (V2DF "df")
+  (DF   "df") (SI   "si")
+  (HI   "hi") (QI   "qi")
   (V4BF "bf") (V8BF "bf")
   (VNx16QI "qi") (VNx8QI "qi") (VNx4QI "qi") (VNx2QI "qi")
   (VNx8HI "hi") (VNx4HI "hi") (VNx2HI "hi")
-- 
2.41.0

[PATCH 0/5] aarch64: Add ACLE intrinsics codegen support for lrcpc3 instructions

2023-11-09 Thread Victor Do Nascimento

Given the introduction of the third set of Release Consistency
processor consistent (RCpc) memory model-compliant instructions in
the form of FEAT_LRCPC3 as an optional extension from Armv8.2-a
onward, this patch series adds the RCPC3 ACLE Neon intrinsics,
thus enabling the use of the architectural feature in C.

These intrinsics enable the use of the new LDAP1 and STL1
instructions and are given single and twin-lane variants for unsigned,
signed and poly 64-bit values, in the form of the following
builtin-functions:

  * vldap1_lane_{u|s|p}64
  * vldap1q_lane_{u|s|p}64
  * ldp1_lane_{u|s|p}64
  * ldp1q_lane_{u|s|p}64

Bootstrapped and regression tested on aarch64-none-linux-gnu.

Victor Do Nascimento (5):
  aarch64: rcpc3: Add +rcpc3 extension
  aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics
  aarch64: rcpc3: Add Neon ACLE intrinsics
  aarch64: rcpc3: add Neon ACLE wrapper functions to `arm_neon.h'
  aarch64: rcpc3: Add intrinsics tests

 .../aarch64/aarch64-option-extensions.def |   1 +
 gcc/config/aarch64/aarch64-simd-builtins.def  |   7 +
 gcc/config/aarch64/aarch64-simd.md|  65 +
 gcc/config/aarch64/aarch64.h  |   4 +
 gcc/config/aarch64/aarch64.md |   2 +
 gcc/config/aarch64/arm_neon.h | 129 ++
 gcc/config/aarch64/iterators.md   |  25 ++--
 gcc/doc/invoke.texi   |   4 +
 gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c |  47 +++
 9 files changed, 276 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c

-- 
2.41.0

[PATCH 4/5] aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins

2023-11-07 Thread Victor Do Nascimento

Implement the ACLE builtins for 128-bit system register manipulation:

  * __uint128_t __arm_rsr128(const char *special_register);
  * void __arm_wsr128(const char *special_register, __uint128_t value);

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New
`enum aarch64_builtins' value.
(AARCH64_WSR128): Likewise.
(aarch64_init_rwsr_builtins): Init `__builtin_aarch64_rsr128'
and `__builtin_aarch64_wsr128' builtins.
(aarch64_expand_rwsr_builtin): Extend function to handle
`__builtin_aarch64_{rsr|wsr}128'.
* config/aarch64/aarch64-protos.h (aarch64_retrieve_sysreg):
Update function signature.
* config/aarch64/aarch64.cc (F_REG_128): New.
(aarch64_retrieve_sysreg): Add 128-bit register mode check.
* config/aarch64/aarch64.md (UNSPEC_SYSREG_RTI): New.
(UNSPEC_SYSREG_WTI): Likewise.
(aarch64_read_sysregti): Likewise.
(aarch64_write_sysregti): Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc | 50 +-
 gcc/config/aarch64/aarch64-protos.h|  2 +-
 gcc/config/aarch64/aarch64.cc  |  6 +++-
 gcc/config/aarch64/aarch64.md  | 18 ++
 gcc/config/aarch64/arm_acle.h  | 11 ++
 5 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index c5f20f68bca..40d3788b5e0 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -815,11 +815,13 @@ enum aarch64_builtins
   AARCH64_RSR64,
   AARCH64_RSRF,
   AARCH64_RSRF64,
+  AARCH64_RSR128,
   AARCH64_WSR,
   AARCH64_WSRP,
   AARCH64_WSR64,
   AARCH64_WSRF,
   AARCH64_WSRF64,
+  AARCH64_WSR128,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1842,6 +1844,10 @@ aarch64_init_rwsr_builtins (void)
 = build_function_type_list (double_type_node, const_char_ptr_type, NULL);
   AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
 
+  fntype
+= build_function_type_list (uint128_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR128, rsr128, fntype);
+
   fntype
 = build_function_type_list (void_type_node, const_char_ptr_type,
uint32_type_node, NULL);
@@ -1867,6 +1873,12 @@ aarch64_init_rwsr_builtins (void)
 = build_function_type_list (void_type_node, const_char_ptr_type,
double_type_node, NULL);
   AARCH64_INIT_RWSR_BUILTINS_DECL (WSRF64, wsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint128_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR128, wsr128, fntype);
+
 }
 
 /* Initialize the memory tagging extension (MTE) builtins.  */
@@ -2710,6 +2722,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
   tree arg0, arg1;
   rtx const_str, input_val, subreg;
   enum machine_mode mode;
+  enum insn_code icode;
   class expand_operand ops[2];
 
   arg0 = CALL_EXPR_ARG (exp, 0);
@@ -2718,7 +2731,18 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
   || fcode == AARCH64_WSRP
   || fcode == AARCH64_WSR64
   || fcode == AARCH64_WSRF
-  || fcode == AARCH64_WSRF64);
+  || fcode == AARCH64_WSRF64
+  || fcode == AARCH64_WSR128);
+
+  bool op128 = (fcode == AARCH64_RSR128 || fcode == AARCH64_WSR128);
+  enum machine_mode sysreg_mode = op128 ? TImode : DImode;
+
+  if (op128 && !TARGET_D128)
+{
+  error_at (EXPR_LOCATION (exp), "128-bit system register suppport 
requires "
+"the +d128 Armv9.4-A extension");
+  return const0_rtx;
+}
 
   /* Argument 0 (system register name) must be a string literal.  */
   gcc_assert (TREE_CODE (arg0) == ADDR_EXPR
@@ -2741,7 +2765,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
 sysreg_name[pos] = TOLOWER (sysreg_name[pos]);
 
   const char* name_output = aarch64_retrieve_sysreg ((const char *) 
sysreg_name,
-write_op);
+write_op, op128);
   if (name_output == NULL)
 {
   error_at (EXPR_LOCATION (exp), "invalid system register name provided");
@@ -2760,13 +2784,17 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
fcode)
   mode = TYPE_MODE (TREE_TYPE (arg1));
   input_val = copy_to_mode_reg (mode, expand_normal (arg1));
 
+  icode = (op128 ? CODE_FOR_aarch64_write_sysregti
+: CODE_FOR_aarch64_write_sysregdi);
+
   switch (fcode)
{
case AARCH64_WSR:
case AARCH64_WSRP:
case AARCH64_WSR64:
case AARCH64_WSRF64:
- subreg = lowpart_subreg (DImode, input_val, mode);
+   case AARCH64_WSR128:
+ subreg = lowpart_subreg (sysreg_mode, input_val,

[PATCH 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-07 Thread Victor Do Nascimento

Extend existing unit tests for the ACLE system register manipulation
functions to include 128-bit tests.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New.
(set_wsr128): Likewise.
---
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 30 +++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
index 3af4b960306..e7725022316 100644
--- a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
@@ -1,11 +1,15 @@
 /* Test the __arm_[r,w]sr ACLE intrinsics family.  */
 /* Check that function variants for different data types handle types 
correctly.  */
 /* { dg-do compile } */
-/* { dg-options "-O1 -march=armv8.4-a" } */
+/* { dg-options "-O1 -march=armv9.4-a+d128" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 #include 
 
+#ifndef __ARM_FEATURE_SYSREG128
+#error "__ARM_FEATURE_SYSREG128 feature macro not defined."
+#endif
+
 /*
 ** get_rsr:
 ** ...
@@ -66,6 +70,17 @@ get_rsrf64 ()
   return __arm_rsrf64("trcseqstr");
 }
 
+/*
+** get_rsr128:
+** mrrsx0, x1, s3_0_c7_c4_0
+** ...
+*/
+__uint128_t
+get_rsr128 ()
+{
+  __arm_rsr128("par_el1");
+}
+
 /*
 ** set_wsr32:
 ** ...
@@ -129,6 +144,18 @@ set_wsrf64(double a)
   __arm_wsrf64("trcseqstr", a);
 }
 
+/*
+** set_wsr128:
+** ...
+** msrrs3_0_c7_c4_0, x0, x1
+** ...
+*/
+void
+set_wsr128 (__uint128_t c)
+{
+  __arm_wsr128 ("par_el1", c);
+}
+
 /*
 ** set_custom:
 ** ...
@@ -142,3 +169,4 @@ void set_custom()
   __uint64_t b = __arm_rsr64("S1_2_C3_C4_5");
   __arm_wsr64("S1_2_C3_C4_5", b);
 }
+
-- 
2.41.0

[PATCH 0/5] aarch64: Add Armv9.4-a 128-bit system-register read/write support

2023-11-07 Thread Victor Do Nascimento

Given the introduction of optional 128-bit page table descriptor and
translation hardening extension support with the Arm9.4-a
architecture, this patch series introduces the necessary changes to
the aarch64-specific builtin code to enable the reading and writing of
128-bit system registers.  In so doing, the following ACLE builtins and
feature macro are made available to the compiler:

  * __uint128_t __arm_rsr128(const char *special_register);
  * void __arm_wsr128(const char *special_register, __uint128_t value);
  * __ARM_FEATURE_SYSREG128.

Finally, in order to update the GCC system-register database bringing
it in line with Binutils, and in so doing add the relevant 128-bit
system registers to GCC, this patch also introduces the Guarded
Control Stack (GCS) `+gcs' architecture modifier flag, allowing the
inclusion of the novel GCS system registers which are now supported
and also present in the `aarch64-sys-regs.def' system register
database.

Victor Do Nascimento (5):
  aarch64: Add march flags for +the and +d128 arch extensions
  aarch64: Add support for GCS system registers with the +gcs modifier
  aarch64: Sync `aarch64-sys-regs.def' with Binutils.
  aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins
  aarch64: Add rsr128 and wsr128 ACLE tests

 gcc/config/aarch64/aarch64-arches.def |  2 +
 gcc/config/aarch64/aarch64-builtins.cc| 50 ---
 gcc/config/aarch64/aarch64-c.cc   |  1 +
 .../aarch64/aarch64-option-extensions.def |  6 +++
 gcc/config/aarch64/aarch64-protos.h   |  2 +-
 gcc/config/aarch64/aarch64-sys-regs.def   | 30 +++
 gcc/config/aarch64/aarch64.cc |  6 ++-
 gcc/config/aarch64/aarch64.h  | 21 
 gcc/config/aarch64/aarch64.md | 18 +++
 gcc/config/aarch64/arm_acle.h | 11 
 gcc/doc/invoke.texi   |  8 +++
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 30 ++-
 12 files changed, 165 insertions(+), 20 deletions(-)

-- 
2.41.0

[PATCH 1/5] aarch64: Add march flags for +the and +d128 arch extensions

2023-11-07 Thread Victor Do Nascimento

Given the introduction of optional 128-bit page table descriptor and
translation hardening extension support with the Arm9.4-a
architecture, this introduces the relevant flags to enable the reading
and writing of 128-bit system registers.

The `+d128' -march modifier enables the use of the following ACLE
builtin functions:

  * __uint128_t __arm_rsr128(const char *special_register);
  * void __arm_wsr128(const char *special_register, __uint128_t value);

and defines the __ARM_FEATURE_SYSREG128 macro to 1.

Finally, the `rcwmask_el1' and `rcwsmask_el1' 128-bit system register
implementations are also reliant on the enablement of the `+the' flag,
which is thus also implemented in this patch.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (armv8.9-a): New.
(armv9.4-a): Likewise.
* config/aarch64/aarch64-option-extensions.def (d128): Likewise.
(the): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_V9_4A): Likewise.
(AARCH64_ISA_V8_9A): Likewise.
(TARGET_ARMV9_4): Likewise.
(AARCH64_ISA_D128): Likewise.
(AARCH64_ISA_THE): Likewise.
(TARGET_D128): Likewise.
* doc/invoke.texi (AArch64 Options): Document new -march flags
and extensions.
---
 gcc/config/aarch64/aarch64-arches.def|  2 ++
 gcc/config/aarch64/aarch64-c.cc  |  1 +
 gcc/config/aarch64/aarch64-option-extensions.def |  4 
 gcc/config/aarch64/aarch64.h | 15 +++
 gcc/doc/invoke.texi  |  6 ++
 5 files changed, 28 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 7ae92aa8e98..becccb801d0 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -39,10 +39,12 @@ AARCH64_ARCH("armv8.5-a", generic,   V8_5A, 8,  
(V8_4A, SB, SSBS, PR
 AARCH64_ARCH("armv8.6-a", generic,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
 AARCH64_ARCH("armv8.7-a", generic,   V8_7A, 8,  (V8_6A, LS64))
 AARCH64_ARCH("armv8.8-a", generic,   V8_8A, 8,  (V8_7A, MOPS))
+AARCH64_ARCH("armv8.9-a", generic,   V8_9A, 8,  (V8_8A))
 AARCH64_ARCH("armv8-r",   generic,   V8R  , 8,  (V8_4A))
 AARCH64_ARCH("armv9-a",   generic,   V9A  , 9,  (V8_5A, SVE2))
 AARCH64_ARCH("armv9.1-a", generic,   V9_1A, 9,  (V8_6A, V9A))
 AARCH64_ARCH("armv9.2-a", generic,   V9_2A, 9,  (V8_7A, V9_1A))
 AARCH64_ARCH("armv9.3-a", generic,   V9_3A, 9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv9.4-a", generic,   V9_4A, 9,  (V8_9A, V9_3A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index be8b7236cf9..cacf8e8ed25 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -206,6 +206,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_LS64,
"__ARM_FEATURE_LS64", pfile);
   aarch64_def_or_undef (AARCH64_ISA_RCPC, "__ARM_FEATURE_RCPC", pfile);
+  aarch64_def_or_undef (TARGET_D128, "__ARM_FEATURE_SYSREG128", pfile);
 
   /* Not for ACLE, but required to keep "float.h" correct if we switch
  target between implementations that do or do not support ARMv8.2-A
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 825f3bf7758..da31f7c32d1 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -151,4 +151,8 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
+AARCH64_OPT_EXTENSION("d128", D128, (), (), (), "d128")
+
+AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 84e6f79ca83..1b3c800ec89 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -219,13 +219,17 @@ enum class aarch64_feature : unsigned char {
 #define AARCH64_ISA_PAUTH (aarch64_isa_flags & AARCH64_FL_PAUTH)
 #define AARCH64_ISA_V8_7A (aarch64_isa_flags & AARCH64_FL_V8_7A)
 #define AARCH64_ISA_V8_8A (aarch64_isa_flags & AARCH64_FL_V8_8A)
+#define AARCH64_ISA_V8_9A (aarch64_isa_flags & AARCH64_FL_V8_9A)
 #define AARCH64_ISA_V9A   (aarch64_isa_flags & AARCH64_FL_V9A)
 #define AARCH64_ISA_V9_1A  (aarch64_isa_flags & AARCH64_FL_V9_1A)
 #define AARCH64_ISA_V9_2A  (aarch64_isa_flags & AARCH64_FL_V9_2A)
 #define AARCH64_ISA_V9_3A  (aarch64_isa_flags & AARCH64_FL_V9_3A)
+#define AARCH64_ISA_V9_4A  (aarch64_isa_flags & AARCH64_FL_V9_4A)
 #define AARCH64_ISA_MOPS  (aarch64_isa_flags & AARCH64_FL_MOPS)
 #define AARCH64_ISA_LS64  (aarch64_isa_flags & AARCH64_FL_LS64)
 #define AARCH64_ISA_CSSC

[PATCH 3/5] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2023-11-07 Thread Victor Do Nascimento

This patch updates `aarch64-sys-regs.def', bringing it into sync with
the Binutils source.

gcc/ChangeLog:

* config/aarch64/aarch64-sys-regs.def (par_el1): New.
(rcwmask_el1): Likewise.
(rcwsmask_el1): Likewise.
(ttbr0_el1): Likewise.
(ttbr0_el12): Likewise.
(ttbr0_el2): Likewise.
(ttbr1_el1): Likewise.
(ttbr1_el12): Likewise.
(ttbr1_el2): Likewise.
(vttbr_el2): Likewise.
(gcspr_el0): Likewise.
(gcspr_el1): Likewise.
(gcspr_el12): Likewise.
(gcspr_el2): Likewise.
(gcspr_el3): Likewise.
(gcscre0_el1): Likewise.
(gcscr_el1): Likewise.
(gcscr_el12): Likewise.
(gcscr_el2): Likewise.
(gcscr_el3): Likewise.
---
 gcc/config/aarch64/aarch64-sys-regs.def | 30 +
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def
index d24a2455503..96bdadb0b0f 100644
--- a/gcc/config/aarch64/aarch64-sys-regs.def
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -419,6 +419,16 @@
   SYSREG ("fpcr",  CPENC (3,3,4,4,0),  0,  
AARCH64_NO_FEATURES)
   SYSREG ("fpexc32_el2",   CPENC (3,4,5,3,0),  0,  
AARCH64_NO_FEATURES)
   SYSREG ("fpsr",  CPENC (3,3,4,4,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("gcspr_el0", CPENC (3,3,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcspr_el1", CPENC (3,0,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcspr_el2", CPENC (3,4,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcspr_el12",CPENC (3,5,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcspr_el3", CPENC (3,6,2,5,1),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscre0_el1",   CPENC (3,0,2,5,2),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscr_el1", CPENC (3,0,2,5,0),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscr_el2", CPENC (3,4,2,5,0),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscr_el12",CPENC (3,5,2,5,0),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
+  SYSREG ("gcscr_el3", CPENC (3,6,2,5,0),  F_ARCHEXT,  
AARCH64_FEATURE (GCS))
   SYSREG ("gcr_el1",   CPENC (3,0,1,0,6),  F_ARCHEXT,  
AARCH64_FEATURE (MEMTAG))
   SYSREG ("gmid_el1",  CPENC (3,1,0,0,4),  F_REG_READ|F_ARCHEXT,   
AARCH64_FEATURE (MEMTAG))
   SYSREG ("gpccr_el3", CPENC (3,6,2,1,6),  0,  
AARCH64_NO_FEATURES)
@@ -584,7 +594,7 @@
   SYSREG ("oslar_el1", CPENC (2,0,1,0,4),  F_REG_WRITE,
AARCH64_NO_FEATURES)
   SYSREG ("oslsr_el1", CPENC (2,0,1,1,4),  F_REG_READ, 
AARCH64_NO_FEATURES)
   SYSREG ("pan",   CPENC (3,0,4,2,3),  F_ARCHEXT,  
AARCH64_FEATURE (PAN))
-  SYSREG ("par_el1",   CPENC (3,0,7,4,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("par_el1",   CPENC (3,0,7,4,0),  F_REG_128,  
AARCH64_NO_FEATURES)
   SYSREG ("pmbidr_el1",CPENC (3,0,9,10,7), 
F_REG_READ|F_ARCHEXT,   AARCH64_FEATURE (PROFILE))
   SYSREG ("pmblimitr_el1", CPENC (3,0,9,10,0), F_ARCHEXT,  
AARCH64_FEATURE (PROFILE))
   SYSREG ("pmbptr_el1",CPENC (3,0,9,10,1), F_ARCHEXT,  
AARCH64_FEATURE (PROFILE))
@@ -746,6 +756,8 @@
   SYSREG ("prlar_el2", CPENC (3,4,6,8,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8R))
   SYSREG ("prselr_el1",CPENC (3,0,6,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8R))
   SYSREG ("prselr_el2",CPENC (3,4,6,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8R))
+  SYSREG ("rcwmask_el1",   CPENC (3,0,13,0,6), F_ARCHEXT|F_REG_128,
AARCH64_FEATURE (THE))
+  SYSREG ("rcwsmask_el1",  CPENC (3,0,13,0,3), F_ARCHEXT|F_REG_128,
AARCH64_FEATURE (THE))
   SYSREG ("revidr_el1",CPENC (3,0,0,0,6),  F_REG_READ, 
AARCH64_NO_FEATURES)
   SYSREG ("rgsr_el1",  CPENC (3,0,1,0,5),  F_ARCHEXT,  
AARCH64_FEATURE (MEMTAG))
   SYSREG ("rmr_el1",   CPENC (3,0,12,0,2), 0,  
AARCH64_NO_FEATURES)
@@ -1034,13 +1046,13 @@
   SYSREG ("trfcr_el1", CPENC (3,0,1,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_4A))
   SYSREG ("trfcr_el12",CPENC (3,5,1,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_4A))
   SYSREG ("trfcr_el2", CPENC (3,4,1,2,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_4A))
-  SYSREG ("ttbr0_el1", CPENC (3,0,2,0,0),  0,

[PATCH 2/5] aarch64: Add support for GCS system registers with the +gcs modifier

2023-11-07 Thread Victor Do Nascimento

Given the introduction of system registers associated with the Guarded
Control Stack extension to Armv9.4-a in Binutils and their reliance on
the `+gcs' modifier, we implement the necessary changes in GCC to
allow for them to be recognized by the compiler.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (gcs): New.
* config/aarch64/aarch64.h (AARCH64_ISA_GCS): New.
(TARGET_THE):  Likewise.
* doc/invoke.texi (AArch64 Options): Describe GCS.
---
 gcc/config/aarch64/aarch64-option-extensions.def | 2 ++
 gcc/config/aarch64/aarch64.h | 6 ++
 gcc/doc/invoke.texi  | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index da31f7c32d1..e72c039b612 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -155,4 +155,6 @@ AARCH64_OPT_EXTENSION("d128", D128, (), (), (), "d128")
 
 AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
 
+AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 1b3c800ec89..69ef54553d7 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -230,6 +230,7 @@ enum class aarch64_feature : unsigned char {
 #define AARCH64_ISA_CSSC  (aarch64_isa_flags & AARCH64_FL_CSSC)
 #define AARCH64_ISA_D128  (aarch64_isa_flags & AARCH64_FL_D128)
 #define AARCH64_ISA_THE   (aarch64_isa_flags & AARCH64_FL_THE)
+#define AARCH64_ISA_GCS   (aarch64_isa_flags & AARCH64_FL_GCS)
 
 /* AARCH64_FL options necessary for system register implementation.  */
 
@@ -403,6 +404,11 @@ enum class aarch64_feature : unsigned char {
 enabled through +the.  */
 #define TARGET_THE (AARCH64_ISA_THE)
 
+/*  Armv9.4-A Guarded Control Stack extension system registers are
+enabled through +gcs.  */
+#define TARGET_GCS (AARCH64_ISA_GCS)
+
+
 /* Standard register usage.  */
 
 /* 31 64-bit general purpose registers R0-R30:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 88327ce9681..88ee1fdb524 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21032,6 +21032,8 @@ Enable the Pointer Authentication Extension.
 Enable the Common Short Sequence Compression instructions.
 @item d128
 Enable support for 128-bit system register read/write instructions.
+@item gcs
+Enable support for Armv9.4-a Guarded Control Stack extension.
 @item the
 Enable support for Armv8.9-a/9.4-a translation hardening extension.
 
-- 
2.41.0

[PATCH 0/2] Libatomic: Add LSE128 atomics support for AArch64

2023-11-07 Thread Victor Do Nascimento

Building upon Wilco Dijkstra's work on AArch64 128-bit atomics for
Libatomic, namely the patches from [1] and [2],  this patch series
extends the library's  capabilities to dynamically select and emit
Armv9.4-a LSE128 implementations of atomic operations via ifuncs at
run-time whenever architectural support is present.

Regression tested on aarch64-linux-gnu target with LSE128-support.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html

Victor Do Nascimento (2):
  libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
  libatomic: Enable LSE128 128-bit atomics for armv9.4-a

 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 ++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 315 ++-
 libatomic/config/linux/aarch64/host-config.h |  23 +-
 libatomic/configure  |  59 +++-
 libatomic/configure.ac   |   1 +
 8 files changed, 349 insertions(+), 75 deletions(-)

-- 
2.41.0

[PATCH 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2023-11-07 Thread Victor Do Nascimento

The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

  * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
  value held in a pair of registers, with original data loaded into
  the same 2 registers.
  * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
  in a pair of registers, with original data loaded into the same 2
  registers.
  * SWPP - Atomic swap of one 128-bit value with 128-bit value held
  in a pair of registers.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

  1. Add a configure-time check to check for LSE128 support in the
  assembler.
  2. Edit host-config.h so that when N == 16, nifunc = 2.
  3. Where available due to LSE128, implement the second ifunc, making
  use of the novel instructions.
  4. For atomic functions unable to make use of these new
  instructions, define a new alias which causes the _i1 function
  variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

* Makefile.am (AM_CPPFLAGS): add conditional setting of
-DHAVE_FEAT_LSE128.
* acinclude.m4 (LIBAT_TEST_FEAT_LSE128): New.
* config/linux/aarch64/atomic_16.S (LSE128): New macro
definition.
(libat_exchange_16): New LSE128 variant.
(libat_fetch_or_16): Likewise.
(libat_or_fetch_16): Likewise.
(libat_fetch_and_16): Likewise.
(libat_and_fetch_16): Likewise.
* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
(IFUNC_NCOND): Add operand size checking.
(has_lse2): Renamed from `ifunc1`.
(has_lse128): New.
(HAS_LSE128): Likewise.
* libatomic/configure.ac: Add call to LIBAT_TEST_FEAT_LSE128.
* configure (ac_subst_vars): Regenerated via autoreconf.
* libatomic/Makefile.in: Likewise.
* libatomic/auto-config.h.in: Likewise.
---
 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 +++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 170 ++-
 libatomic/config/linux/aarch64/host-config.h |  23 ++-
 libatomic/configure  |  59 ++-
 libatomic/configure.ac   |   1 +
 8 files changed, 271 insertions(+), 8 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index c0b8dea5037..24e843db67d 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
_$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+if ARCH_AARCH64_HAVE_LSE128
+AM_CPPFLAGS = -DHAVE_FEAT_LSE128
+endif
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
 libatomic_la_SOURCES += atomic_16.S
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index dc2330b91fd..cd48fa21334 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
_$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
$(am__append_4) $(am__append_5)
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS
 = -DHAVE_FEAT_LSE128
 @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp 
-DHAVE_KERNEL64
 @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586
diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index f35ab5b60a5..4197db8f404 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
   ])
 ])
 
+dnl
+dnl Test if the host assembler supports armv9.4-a LSE128 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_LSE128],[
+  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
+[libat_cv_have_feat_lse128],[
+AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
+if AC_TRY_EVAL(ac_link); then
+  eval libat_cv_have_feat_lse128=yes
+else
+  eval libat_cv_have_feat_lse128=no
+fi
+rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
+   [Have LSE128 support for 16 byte integers.])
+  AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 
= xyes])
+])
+
 dnl
 dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
 dnl
diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
index ab3424a759e..7c78933b07d 100644
---

[PATCH 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2023-11-07 Thread Victor Do Nascimento

The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

  #define LSE2 _i1

and reconstructs function names with the pre-processor's token
concatenation feature, such that for `MACRO(name)', we would now have
`MACRO(name, feature)' and in the macro definition body we replace
`name` with `name##feature`.

libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY): Modify macro to take in `arch' argument.
(END): Likewise.
(ALIAS): Likewise.
(ENTRY1): New macro.
(END1): Likewise.
(ALIAS): Likewise.
---
 libatomic/config/linux/aarch64/atomic_16.S | 147 +++--
 1 file changed, 79 insertions(+), 68 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 0485c284117..3f6225830e6 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -39,22 +39,34 @@
 
.arch   armv8-a+lse
 
-#define ENTRY(name)\
-   .global name;   \
-   .hidden name;   \
-   .type name,%function;   \
-   .p2align 4; \
-name:  \
-   .cfi_startproc; \
+#define ENTRY(name, feat)  \
+   ENTRY1(name, feat)
+
+#define ENTRY1(name, feat) \
+   .global name##feat; \
+   .hidden name##feat; \
+   .type name##feat,%function; \
+   .p2align 4; \
+name##feat:\
+   .cfi_startproc; \
hint34  // bti c
 
-#define END(name)  \
-   .cfi_endproc;   \
-   .size name, .-name;
+#define END(name, feat)\
+   END1(name, feat)
 
-#define ALIAS(alias,name)  \
-   .global alias;  \
-   .set alias, name;
+#define END1(name, feat)   \
+   .cfi_endproc;   \
+   .size name##feat, .-name##feat;
+
+#define ALIAS(alias, from, to) \
+   ALIAS1(alias,from,to)
+
+#define ALIAS1(alias, from, to)\
+   .global alias##from;\
+   .set alias##from, alias##to;
+
+#define CORE
+#define LSE2   _i1
 
 #define res0 x0
 #define res1 x1
@@ -89,7 +101,7 @@ name:\
 #define SEQ_CST 5
 
 
-ENTRY (libat_load_16)
+ENTRY (libat_load_16, CORE)
mov x5, x0
cbnzw1, 2f
 
@@ -104,10 +116,10 @@ ENTRY (libat_load_16)
stxpw4, res0, res1, [x5]
cbnzw4, 2b
ret
-END (libat_load_16)
+END (libat_load_16, CORE)
 
 
-ENTRY (libat_load_16_i1)
+ENTRY (libat_load_16, LSE2)
cbnzw1, 1f
 
/* RELAXED.  */
@@ -127,10 +139,10 @@ ENTRY (libat_load_16_i1)
ldp res0, res1, [x0]
dmb ishld
ret
-END (libat_load_16_i1)
+END (libat_load_16, LSE2)
 
 
-ENTRY (libat_store_16)
+ENTRY (libat_store_16, CORE)
cbnzw4, 2f
 
/* RELAXED.  */
@@ -144,10 +156,10 @@ ENTRY (libat_store_16)
stlxp   w4, in0, in1, [x0]
cbnzw4, 2b
ret
-END (libat_store_16)
+END (libat_store_16, CORE)
 
 
-ENTRY (libat_store_16_i1)
+ENTRY (libat_store_16, LSE2)
cbnzw4, 1f
 
/* RELAXED.  */
@@ -159,10 +171,10 @@ ENTRY (libat_store_16_i1)
stlxp   w4, in0, in1, [x0]
cbnzw4, 1b
ret
-END (libat_store_16_i1)
+END (libat_store_16, LSE2)
 
 
-ENTRY (libat_exchange_16)
+ENTRY (libat_exchange_16, CORE)
mov x5, x0
cbnzw4, 2f
 
@@ -186,10 +198,10 @@ ENTRY (libat_exchange_16)
stlxp   w4, in0, in1, [x5]
cbnzw4, 4b
ret
-END (libat_exchange_16)
+END (libat_exchange_16, CORE)
 
 
-ENTRY (libat_compare_exchange_16)
+ENTRY (libat_compare_exchange_16, CORE)
ldp exp0, exp1, [x1]
cbz w4, 3f
cmp w4, RELEASE
@@ -228,10 +240,10 @@ ENTRY (libat_compare_exchange_16)
cbnzw4, 4b
mov x0, 1
ret
-END (libat_compare_exchange_16)
+END (libat_compare_exchange_16, CORE)
 
 
-ENTRY (libat_compare_exchange_16_i1)
+ENTRY (libat_compare_exchange_16, LSE2)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -264,10 +276,10 @@ ENTRY (libat_compare_exchange_16_i1)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END (libat_compare_exchange_16_i1)
+END (libat_compare_exchange_16, LSE2)
 
 
-ENTRY (libat_fetch_add_16)
+ENTRY (libat_fetch_add_16, CORE)

[PATCH V3 5/6] aarch64: Add front-end argument type checking for target builtins

2023-11-02 Thread Victor Do Nascimento

In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.

Example:

  const char *regname = "amcgcr_el0";
  long long a = __builtin_aarch64_rsr64 (regname);

is reduced by the ccp1 pass to

  long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");

As these functions require an argument of STRING_CST type, there needs
to be a check carried out by the front-end capable of picking this up.

The introduced `check_general_builtin_call' function will be called by
the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
carrying out any appropriate checks associated with a particular
builtin function code.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (check_general_builtin_call):
New.
* config/aarch64/aarch64-c.cc (aarch64_check_builtin_call):
Add check_general_builtin_call call.
* config/aarch64/aarch64-protos.h (check_general_builtin_call):
New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rwsr-3.c: New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 31 +++
 gcc/config/aarch64/aarch64-c.cc   |  4 +--
 gcc/config/aarch64/aarch64-protos.h   |  4 +++
 .../gcc.target/aarch64/acle/rwsr-3.c  | 18 +++
 4 files changed, 55 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index dd76cca611b..c5f20f68bca 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2127,6 +2127,37 @@ aarch64_general_builtin_decl (unsigned code, bool)
   return aarch64_builtin_decls[code];
 }
 
+bool
+aarch64_check_general_builtin_call (location_t location, vec,
+   unsigned int code, tree fndecl,
+   unsigned int nargs ATTRIBUTE_UNUSED, tree *args)
+{
+  switch (code)
+{
+case AARCH64_RSR:
+case AARCH64_RSRP:
+case AARCH64_RSR64:
+case AARCH64_RSRF:
+case AARCH64_RSRF64:
+case AARCH64_WSR:
+case AARCH64_WSRP:
+case AARCH64_WSR64:
+case AARCH64_WSRF:
+case AARCH64_WSRF64:
+  if (TREE_CODE (args[0]) != NOP_EXPR
+ || TREE_CODE (TREE_TYPE (args[0])) != POINTER_TYPE
+ || (TREE_CODE (TREE_OPERAND (TREE_OPERAND (args[0], 0) , 0))
+ != STRING_CST))
+   {
+ error_at (location, "first argument to %qD must be a string literal",
+   fndecl);
+ return false;
+   }
+}
+  /* Default behavior.  */
+  return true;
+}
+
 typedef enum
 {
   SIMD_ARG_COPY_TO_REG,
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index ab8844f6049..be8b7236cf9 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -339,8 +339,8 @@ aarch64_check_builtin_call (location_t loc, vec 
arg_loc,
   switch (code & AARCH64_BUILTIN_CLASS)
 {
 case AARCH64_BUILTIN_GENERAL:
-  return true;
-
+  return aarch64_check_general_builtin_call (loc, arg_loc, subcode,
+orig_fndecl, nargs, args);
 case AARCH64_BUILTIN_SVE:
   return aarch64_sve::check_builtin_call (loc, arg_loc, subcode,
  orig_fndecl, nargs, args);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 5d6a1e75700..dbd486cfea4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -990,6 +990,10 @@ tree aarch64_general_builtin_rsqrt (unsigned int);
 void handle_arm_acle_h (void);
 void handle_arm_neon_h (void);
 
+bool aarch64_check_general_builtin_call (location_t, vec,
+unsigned int, tree, unsigned int,
+tree *);
+
 namespace aarch64_sve {
   void init_builtins ();
   void handle_arm_sve_h ();
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c
new file mode 100644
index 000..17038fefbf6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c
@@ -0,0 +1,18 @@
+/* Test the __arm_[r,w]sr ACLE intrinsics family.  */
+/* Ensure that illegal behavior is rejected by the compiler.  */
+
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -O3 -march=armv8.4-a" } */
+
+#include 
+
+void
+test_non_const_sysreg_name ()
+{
+  const char *regname = "trcseqstr";
+  long long a = __arm_rsr64 (regname); /* { dg-error "first argument to 
'__builtin_aarch64_rsr64' must be a string literal" } */
+  __arm_wsr64 (regname, a); /* { dg-error "first argument to

[PATCH V3 4/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-11-02 Thread Victor Do Nascimento

Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rwsr.c: New.
* gcc.target/aarch64/acle/rwsr-1.c: Likewise.
* gcc.target/aarch64/acle/rwsr-2.c: Likewise.
* gcc.dg/pch/rwsr-pch.c: Likewise.
* gcc.dg/pch/rwsr-pch.hs: Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc| 191 ++
 gcc/config/aarch64/aarch64.md |  18 ++
 gcc/config/aarch64/arm_acle.h |  30 +++
 gcc/testsuite/gcc.dg/pch/rwsr-pch.c   |   7 +
 gcc/testsuite/gcc.dg/pch/rwsr-pch.hs  |  10 +
 .../gcc.target/aarch64/acle/rwsr-1.c  |  29 +++
 .../gcc.target/aarch64/acle/rwsr-2.c  |  25 +++
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
 8 files changed, 454 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.c
 create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.hs
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..dd76cca611b 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -47,6 +47,7 @@
 #include "stringpool.h"
 #include "attribs.h"
 #include "gimple-fold.h"
+#include "builtins.h"
 
 #define v8qi_UP  E_V8QImode
 #define v8di_UP  E_V8DImode
@@ -808,6 +809,17 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1798,6 +1810,65 @@ aarch64_init_rng_builtins (void)
   AARCH64_BUILTIN_RNG_RNDRRS);
 }
 
+/* Add builtins for reading system register.  */
+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node,

[PATCH V3 6/6] aarch64: Add system register duplication check selftest

2023-11-02 Thread Victor Do Nascimento

Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.

Duplicate entries are defined as any two SYSREG entries in the .def
file which share the same encoding values (as specified by its `CPENC'
field) and where the relationship amongst the two does not fit into
one of the following categories:

* Simple aliasing: In some cases, it is observed that one
register name serves as an alias to another.  One example of
this is where TRCEXTINSELR aliases TRCEXTINSELR0.
* Expressing intent: It is possible that when a given register
serves two distinct functions depending on how it is used, it
is given two distinct names whose use should match the context
under which it is being used.  Example:  Debug Data Transfer
Register. When used to receive data, it should be accessed as
DBGDTRRX_EL0 while when transmitting data it should be
accessed via DBGDTRTX_EL0.
* Register depreciation: Some register names have been
deprecated and should no longer be used, but backwards-
compatibility requires that such names continue to be
recognized, as is the case for the SPSR_EL1 register, whose
access via the SPSR_SVC name is now deprecated.
* Same encoding different target: Some encodings are given
different meaning depending on the target architecture and, as
such, are given different names in each of theses contexts.
We see an example of this for CPENC(3,4,2,0,0), which
corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2
in Armv8-R targets.

A consequence of these observations is that `CPENC' duplication is
acceptable iff at least one of the `properties' or `arch_reqs' fields
of the `sysreg_t' structs associated with the two registers in
question differ and it's this condition that is checked by the new
`aarch64_test_sysreg_encoding_clashes' function.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_test_sysreg_encoding_clashes): New.
(aarch64_run_selftests): add call to
aarch64_test_sysreg_encoding_clashes selftest.
---
 gcc/config/aarch64/aarch64.cc | 44 +++
 1 file changed, 44 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index eaeab0be436..c0d75f167be 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22,6 +22,7 @@
 
 #define INCLUDE_STRING
 #define INCLUDE_ALGORITHM
+#define INCLUDE_VECTOR
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -28390,6 +28391,48 @@ aarch64_test_fractional_cost ()
   ASSERT_EQ (cf (1, 2).as_double (), 0.5);
 }
 
+/* Calculate whether our system register data, as imported from
+   `aarch64-sys-reg.def' has any duplicate entries.  */
+static void
+aarch64_test_sysreg_encoding_clashes (void)
+{
+  using dup_instances_t = hash_map>;
+
+  dup_instances_t duplicate_instances;
+
+  /* Every time an encoding is established to come up more than once
+ we add it to a "clash-analysis queue", which is then used to extract
+ necessary information from our hash map when establishing whether
+ repeated encodings are valid.  */
+
+  /* 1) Collect recurrence information.  */
+  for (unsigned i = 0; i < nsysreg; i++)
+{
+  const sysreg_t *reg = sysreg_structs + i;
+
+  std::vector *tmp
+   = _instances.get_or_insert (reg->encoding);
+
+  tmp->push_back (reg);
+}
+
+  /* 2) Carry out analysis on collected data.  */
+  for (auto instance : duplicate_instances)
+{
+  unsigned nrep = instance.second.size ();
+  if (nrep > 1)
+   for (unsigned i = 0; i < nrep; i++)
+ for (unsigned j = i + 1; j < nrep; j++)
+   {
+ const sysreg_t *a = instance.second[i];
+ const sysreg_t *b = instance.second[j];
+ ASSERT_TRUE ((a->properties != b->properties)
+  || (a->arch_reqs != b->arch_reqs));
+   }
+}
+}
+
 /* Run all target-specific selftests.  */
 
 static void
@@ -28397,6 +28440,7 @@ aarch64_run_selftests (void)
 {
   aarch64_test_loading_full_dump ();
   aarch64_test_fractional_cost ();
+  aarch64_test_sysreg_encoding_clashes ();
 }
 
 } // namespace selftest
-- 
2.41.0

[PATCH V3 0/6] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-11-02 Thread Victor Do Nascimento

Implement changes resulting from upstream discussion about the
implementation as presented in V2 of this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633458.html

Note that patch 4/7 of the previous iteration of this series (Add
basic target_print_operand support for CONST_STRING) was resubmitted
and upstreamed separately due to its use in other work which had since
been submitted.

---

This patch series adds support for reading and writing to and from
system registers via the relevant ACLE-defined builtins [1].

The patch series makes a series of additions to the aarch64-specific
areas of the compiler to make this possible.

Firstly, a mechanism for defining system registers is established via a
new .def file and the new SYSREG macro.  This macro is the same as is
used in Binutils and system register entries are compatible with
either code-base.

Given the information contained in this system register definition
file, a compile-time validation mechanism is implemented, such that any
system register name passed as a string literal argument to these
builtins can be checked against known system registers and its use
for a given target architecture validated.

Finally, patterns for each of these builtins are added to the back-end
such that, if all validation criteria are met, the correct assembly is
emitted.

Thus, the following example of system register access is now valid for
GCC:

long long old = __arm_rsr("trcseqstr");
__arm_wsr("trcseqstr", new);

Testing:
 - Bootstrap/regtest on aarch64-linux-gnu done.

[1] https://arm-software.github.io/acle/main/acle.html

Victor Do Nascimento (6):
  aarch64: Sync system register information with Binutils
  aarch64: Add support for aarch64-sys-regs.def
  aarch64: Implement system register validation tools
  aarch64: Implement system register r/w arm ACLE intrinsic functions
  aarch64: Add front-end argument type checking for target builtins
  aarch64: Add system register duplication check selftest

 gcc/config/aarch64/aarch64-builtins.cc|  222 
 gcc/config/aarch64/aarch64-c.cc   |4 +-
 gcc/config/aarch64/aarch64-protos.h   |6 +
 gcc/config/aarch64/aarch64-sys-regs.def   | 1064 +
 gcc/config/aarch64/aarch64.cc |  244 
 gcc/config/aarch64/aarch64.h  |   22 +
 gcc/config/aarch64/aarch64.md |   18 +
 gcc/config/aarch64/arm_acle.h |   30 +
 gcc/config/aarch64/predicates.md  |4 +
 gcc/testsuite/gcc.dg/pch/rwsr-pch.c   |7 +
 gcc/testsuite/gcc.dg/pch/rwsr-pch.hs  |   10 +
 .../gcc.target/aarch64/acle/rwsr-1.c  |   29 +
 .../gcc.target/aarch64/acle/rwsr-2.c  |   25 +
 .../gcc.target/aarch64/acle/rwsr-3.c  |   18 +
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  |  144 +++
 15 files changed, 1845 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def
 create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.c
 create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.hs
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

-- 
2.41.0

[PATCH V3 3/6] aarch64: Implement system register validation tools

2023-11-02 Thread Victor Do Nascimento

Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler.  In particular, this involves:

  1. Ensuring a supplied string corresponds to a known system
 register name.  System registers can be accessed either via their
 name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
 Register names are validated using a hash map, mapping known
 system register names to its corresponding `sysreg_t' struct,
 which is populated from the `aarch64_system_regs.def' file.
 Register name validation is done via `lookup_sysreg_map', while
 the encoding naming convention is validated via a parser
 implemented in this patch - `is_implem_def_reg'.
  2. Once a given register name is deemed to be valid, it is checked
 against a further 2 criteria:
   a. Is the referenced register implemented in the target
  architecture?  This is achieved by comparing the ARCH field
  in the relevant SYSREG entry from `aarch64_system_regs.def'
  against `aarch64_feature_flags' flags set at compile-time.
   b. Is the register being used correctly?  Check the requested
  operation against the FLAGS specified in SYSREG.
  This prevents operations like writing to a read-only system
  register.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): New.
(aarch64_retrieve_sysreg): Likewise.
* config/aarch64/aarch64.cc (is_implem_def_reg): Likewise.
(aarch64_valid_sysreg_name_p): Likewise.
(aarch64_retrieve_sysreg): Likewise.
(aarch64_register_sysreg): Likewise.
(aarch64_init_sysregs): Likewise.
(aarch64_lookup_sysreg_map): Likewise.
* config/aarch64/predicates.md (aarch64_sysreg_string): New.
---
 gcc/config/aarch64/aarch64-protos.h |   2 +
 gcc/config/aarch64/aarch64.cc   | 147 
 gcc/config/aarch64/predicates.md|   4 +
 3 files changed, 153 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 60a55f4bc19..5d6a1e75700 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -830,6 +830,8 @@ bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
 bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
 bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+bool aarch64_valid_sysreg_name_p (const char *);
+const char *aarch64_retrieve_sysreg (const char *, bool);
 rtx aarch64_check_zero_based_sve_index_immediate (rtx);
 bool aarch64_sve_index_immediate_p (rtx);
 bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a4a9e2e51ea..eaeab0be436 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -85,6 +85,7 @@
 #include "config/arm/aarch-common.h"
 #include "config/arm/aarch-common-protos.h"
 #include "ssa.h"
+#include "hash-map.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2860,6 +2861,51 @@ const sysreg_t sysreg_structs[] =
 
 const unsigned nsysreg = ARRAY_SIZE (sysreg_structs);
 
+using sysreg_map_t = hash_map;
+static sysreg_map_t *sysreg_map = nullptr;
+
+/* Map system register names to their hardware metadata: encoding,
+   feature flags and architectural feature requirements, all of which
+   are encoded in a sysreg_t struct.  */
+void
+aarch64_register_sysreg (const char *name, const sysreg_t *metadata)
+{
+  bool dup = sysreg_map->put (name, metadata);
+  gcc_checking_assert (!dup);
+}
+
+/* Lazily initialize hash table for system register validation,
+   checking the validity of supplied register name and returning
+   register's associated metadata.  */
+static void
+aarch64_init_sysregs (void)
+{
+  gcc_assert (!sysreg_map);
+  sysreg_map = new sysreg_map_t;
+
+  for (unsigned i = 0; i < nsysreg; i++)
+{
+  const sysreg_t *reg = sysreg_structs + i;
+  aarch64_register_sysreg (reg->name, reg);
+}
+}
+
+/* No direct access to the sysreg hash-map should be made.  Doing so
+   risks trying to acess an unitialized hash-map and dereferencing the
+   returned double pointer without due care risks dereferencing a
+   null-pointer.  */
+const sysreg_t *
+aarch64_lookup_sysreg_map (const char *regname)
+{
+  if (!sysreg_map)
+aarch64_init_sysregs ();
+
+  const sysreg_t **sysreg_entry = sysreg_map->get (regname);
+  if (sysreg_entry != NULL)
+return *sysreg_entry;
+  return NULL;
+}
+
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
@@ -28116,6 +28162,107 @@ aarch64_pars_overlap_p (rtx par1, rtx par2)
   return false;
 }
 
+/* Parse an implementation-defined system register name of
+   the form

[PATCH V3 1/6] aarch64: Sync system register information with Binutils

2023-11-02 Thread Victor Do Nascimento

This patch adds the `aarch64-sys-regs.def' file, originally written
for Binutils, to GCC. In so doing, it provides GCC with the necessary
information for teaching the compiler about system registers known to
the assembler and how these can be used.

By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; By keeping both copies of the file in sync,
any `SYSREG (...)' that is added in one project is automatically added
to its counterpart.  This being the case, no change should be made in
the GCC copy of the file.  Any modifications should first be made in
Binutils and the resulting file copied over to GCC.

GCC does not implement the full range of ISA flags present in
Binutils.  Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC.  Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file.  This is done
in the next patch in the series.

`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S encoding of system registers when
issuing mrs/msr instructions.  This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.

gcc/ChangeLog:

* config/aarch64/aarch64-system-regs.def: New.
---
 gcc/config/aarch64/aarch64-sys-regs.def | 1064 +++
 1 file changed, 1064 insertions(+)
 create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def

diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def
new file mode 100644
index 000..d24a2455503
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -0,0 +1,1064 @@
+/* aarch64-system-regs.def -- AArch64 opcode support.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of the GNU opcodes library.
+
+   This library is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   It is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; see the file COPYING3.  If not,
+   see .  */
+
+/* Array of system registers and their associated arch features.
+
+   This file is also used by GCC.  Where necessary, any updates should
+   be made in Binutils and the updated file copied across to GCC, such
+   that the two projects are kept in sync at all times.
+
+   Before using #include to read this file, define a macro:
+
+ SYSREG (name, encoding, flags, features)
+
+  The NAME is the system register name, as recognized by the
+  assembler.  ENCODING provides the necessary information for the binary
+  encoding of the system register.  The FLAGS field is a bitmask of
+  relevant behavior information pertaining to the particular register.
+  For example: is it read/write-only? does it alias another register?
+  The FEATURES field maps onto ISA flags and specifies the architectural
+  feature requirements of the system register.  */
+
+  SYSREG ("accdata_el1",   CPENC (3,0,13,0,5), 0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el1", CPENC (3,0,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el2", CPENC (3,4,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el3", CPENC (3,6,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el1", CPENC (3,0,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el12",CPENC (3,5,5,1,0),  F_ARCHEXT,  
AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr0_el2", CPENC (3,4,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el3", CPENC (3,6,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el1", CPENC (3,0,5,1,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el12",CPENC (3,5,5,1,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr1_el2", CPENC (3,4,5,1,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el3", CPENC (3,6,5,1,1),  0,

[PATCH V3 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-11-02 Thread Victor Do Nascimento

This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.

Entries in the aarch64-system-regs.def file should be as follows:

  SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)

Where the arguments to SYSREG correspond to:
  - NAME:  The system register name, as used in the assembly language.
  - CPENC: The system register encoding, mapping to:

   s__c_c_

  - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
  encode extra information required to ensure proper use of
  the system register.  For example, a read-only system
  register will have the flag F_REG_READ, while write-only
  registers will be labeled F_REG_WRITE.  Such flags are
  tested against at compile-time.
  - ARCH: The architectural features the system register is associated
  with.  This is encoded via one of three possible macros:
  1. When a system register is universally implemented, we say
  it has no feature requirements, so we tag it with the
  AARCH64_NO_FEATURES macro.
  2. When a register is only implemented for a single
  architectural extension EXT, the AARCH64_FEATURE (EXT), is
  used.
  3. When a given system register is made available by any of N
  possible architectural extensions, the AARCH64_FEATURES(N, ...)
  macro is used to combine them accordingly.

In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.

Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (sysreg_t): New.
(sysreg_structs): Likewise.
(nsysreg): Likewise.
(AARCH64_FEATURE): Likewise.
(AARCH64_FEATURES): Likewise.
(AARCH64_NO_FEATURES): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
ISA flag.
(AARCH64_ISA_V8_1A): Likewise.
(AARCH64_ISA_V8_7A): Likewise.
(AARCH64_ISA_V8_8A): Likewise.
(AARCH64_NO_FEATURES): Likewise.
(AARCH64_FL_RAS): New ISA flag alias.
(AARCH64_FL_LOR): Likewise.
(AARCH64_FL_PAN): Likewise.
(AARCH64_FL_AMU): Likewise.
(AARCH64_FL_SCXTNUM): Likewise.
(AARCH64_FL_ID_PFR2): Likewise.
(F_DEPRECATED): New.
(F_REG_READ): Likewise.
(F_REG_WRITE): Likewise.
(F_ARCHEXT): Likewise.
(F_REG_ALIAS): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 53 +++
 gcc/config/aarch64/aarch64.h  | 22 +++
 2 files changed, 75 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5fd7063663c..a4a9e2e51ea 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2806,6 +2806,59 @@ static const struct processor all_cores[] =
feature_deps::V8A ().enable, _tunings},
   {NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
 };
+/* Internal representation of system registers.  */
+typedef struct {
+  const char *name;
+  /* Stringified sysreg encoding values, represented as
+ s__c_c_.  */
+  const char *encoding;
+  /* Flags affecting sysreg usage, such as read/write-only.  */
+  unsigned properties;
+  /* Architectural features implied by sysreg.  */
+  aarch64_feature_flags arch_reqs;
+} sysreg_t;
+
+/* An aarch64_feature_set initializer for a single feature,
+   AARCH64_FEATURE_.  */
+#define AARCH64_FEATURE(FEAT) AARCH64_FL_##FEAT
+
+/* Used by AARCH64_FEATURES.  */
+#define AARCH64_OR_FEATURES_1(X, F1) \
+  AARCH64_FEATURE (F1)
+#define AARCH64_OR_FEATURES_2(X, F1, F2) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_1 (X, F2))
+#define AARCH64_OR_FEATURES_3(X, F1, ...) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_2 (X, __VA_ARGS__))
+
+/* An aarch64_feature_set initializer for the N features listed in "...".  */
+#define AARCH64_FEATURES(N, ...) \
+  AARCH64_OR_FEATURES_##N (0, __VA_ARGS__)
+
+#define AARCH64_NO_FEATURES   0
+
+/* Flags associated with the properties of system registers.  It mainly serves
+   to mark particular registers as read or write only.  */
+#define F_DEPRECATED  (1 << 1)
+#define F_REG_READ(1 << 2)
+#define F_REG_WRITE   (1 << 3)
+#define F_ARCHEXT (1 << 4)
+/* Flag indicating register name is alias for another system register.  */
+#define F_REG_ALIAS   (1 << 5)
+
+/* Database of system registers, their encodings and architectural
+   requirements.  */
+const sysreg_t

[PATCH V2] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-10-30 Thread Victor Do Nascimento

Correct CV-qualification from being erroeously applied to the `addr'
pointer, applying it instead to its pointer target, as specified by
the ACLE standards.

---

Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:

  1. Data prefetch intrinsics:
  
  void __pldx (/*constant*/ unsigned int /*access_kind*/,
   /*constant*/ unsigned int /*cache_level*/,
   /*constant*/ unsigned int /*retention_policy*/,
   void const volatile *addr);

  void __pld (void const volatile *addr);

  2. Instruction prefetch intrinsics:
  ---
  void __plix (/*constant*/ unsigned int /*cache_level*/,
   /*constant*/ unsigned int /*retention_policy*/,
   void const volatile *addr);

  void __pli (void const volatile *addr);

`__pldx' affords the programmer more fine-grained control over the
data prefetch behaviour than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.

While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.

`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.

`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.

Bootstrapped and tested on aarch64-none-linux-gnu.

[1] 
https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc:
(AARCH64_PLD): New enum aarch64_builtins entry.
(AARCH64_PLDX): Likewise.
(AARCH64_PLI): Likewise.
(AARCH64_PLIX): Likewise.
(aarch64_init_prefetch_builtin): New.
(aarch64_general_init_builtins): Call prefetch init function.
(aarch64_expand_prefetch_builtin): New.
(aarch64_general_expand_builtin):  Add prefetch expansion.
* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
(aarch64_pldx): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/builtin_pld_pli.c: New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 161 ++
 gcc/config/aarch64/aarch64.md |  12 ++
 gcc/config/aarch64/arm_acle.h |  30 
 .../gcc.target/aarch64/builtin_pldx.c |  90 ++
 4 files changed, 293 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pldx.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..27a4c87b300 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,10 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  AARCH64_PLD,
+  AARCH64_PLDX,
+  AARCH64_PLI,
+  AARCH64_PLIX,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1798,6 +1802,34 @@ aarch64_init_rng_builtins (void)
   AARCH64_BUILTIN_RNG_RNDRRS);
 }
 
+/* Add builtins for data and instrution prefetch.  */
+static void
+aarch64_init_prefetch_builtin (void)
+{
+#define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N)
\
+  aarch64_builtin_decls[INDEX] =   \
+aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
+
+  tree ftype;
+  tree cv_argtype;
+  cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
+| TYPE_QUAL_VOLATILE);
+  cv_argtype = build_pointer_type (cv_argtype);
+
+  ftype = build_function_type_list (void_type_node, cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLD, "pld");
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLI, "pli");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+   unsigned_type_node, unsigned_type_node,
+   cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLDX, "pldx");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+   unsigned_type_node, cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLIX, "plix");
+}
+
 /* Initialize the memory tagging extension (MTE) builtins.  */
 struct
 {
@@ -2019,6 +2051,8 @@ aarch64_general_init_builtins (void)
   aarch64_init_rng_builtins ();
   aarch64_init_data_intrinsics ();
 
+  aarch64_init_prefetch_builtin ();
+
   tree ftype_jcvt
 = build_function_type_list (intSI_type_node, double_type_node, NULL);
   aarch64_builtin_decls[AARCH64_JSCVT]
@@ -2599,6 +2633,127 @@ aarch64_expand_rng_builtin (tree exp, rtx target, int 
fcode, int ignore)
   return target;
 }
 
+/* Expand a

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-27 Thread Victor Do Nascimento





On 10/27/23 14:18, Alex Coplan wrote:

On 26/10/2023 16:23, Richard Sandiford wrote:

Victor Do Nascimento  writes:

On 10/18/23 21:39, Richard Sandiford wrote:

Victor Do Nascimento  writes:

Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
* gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
---
   gcc/config/aarch64/aarch64-builtins.cc| 200 ++
   gcc/config/aarch64/aarch64.md |  17 ++
   gcc/config/aarch64/arm_acle.h |  30 +++
   .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
   gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
   5 files changed, 411 insertions(+)
   create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d8bb2a989a5 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,17 @@ enum aarch64_builtins
 AARCH64_RBIT,
 AARCH64_RBITL,
 AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
 AARCH64_BUILTIN_MAX
   };
   
@@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)

   AARCH64_BUILTIN_RNG_RNDRRS);
   }
   
+/* Add builtins for reading system register.  */

+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRP, wsrp, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint64_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR64, wsr64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   float

[PATCH] aarch64: Implement ACLE instruction/data prefetch functions.

2023-10-27 Thread Victor Do Nascimento

Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:

  1. Data prefetch intrinsics:
  
  void __pldx (/*constant*/ unsigned int /*access_kind*/,
   /*constant*/ unsigned int /*cache_level*/,
   /*constant*/ unsigned int /*retention_policy*/,
   void const volatile *addr);

  void __pld (void const volatile *addr);

  2. Instruction prefetch intrinsics:
  ---
  void __plix (/*constant*/ unsigned int /*cache_level*/,
   /*constant*/ unsigned int /*retention_policy*/,
   void const volatile *addr);

  void __pli (void const volatile *addr);

`__pldx' affords the programmer more fine-grained control over the
data prefetch behavior than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.

While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.

`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.

`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.

Bootstrapped and tested on aarch64-none-linux-gnu.

[1] 
https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc:
(AARCH64_PLD): New enum aarch64_builtins entry.
(AARCH64_PLDX): Likewise.
(AARCH64_PLI): Likewise.
(AARCH64_PLIX): Likewise.
(aarch64_init_prefetch_builtin): New.
(aarch64_general_init_builtins): Call prefetch init function.
(aarch64_expand_prefetch_builtin): New.
(aarch64_general_expand_builtin):  Add prefetch expansion.
* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
(aarch64_pldx): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/builtin_pld_pli.c: New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 160 ++
 gcc/config/aarch64/aarch64.md |  12 ++
 gcc/config/aarch64/arm_acle.h |  30 
 .../gcc.target/aarch64/builtin_pldx.c |  90 ++
 4 files changed, 292 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pldx.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..307e7617548 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,10 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  AARCH64_PLD,
+  AARCH64_PLDX,
+  AARCH64_PLI,
+  AARCH64_PLIX,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1798,6 +1802,33 @@ aarch64_init_rng_builtins (void)
   AARCH64_BUILTIN_RNG_RNDRRS);
 }
 
+/* Add builtins for data and instrution prefetch.  */
+static void
+aarch64_init_prefetch_builtin (void)
+{
+#define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N)
\
+  aarch64_builtin_decls[INDEX] =   \
+aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
+
+  tree ftype;
+  tree void_const_vol_ptr_type = build_type_variant (ptr_type_node, 1, 1);
+
+  ftype = build_function_type_list (void_type_node, void_const_vol_ptr_type,
+   NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLD, "pld");
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLI, "pli");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+   unsigned_type_node, unsigned_type_node,
+   void_const_vol_ptr_type, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLDX, "pldx");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+   unsigned_type_node, void_const_vol_ptr_type,
+   NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLIX, "plix");
+}
+
 /* Initialize the memory tagging extension (MTE) builtins.  */
 struct
 {
@@ -2019,6 +2050,8 @@ aarch64_general_init_builtins (void)
   aarch64_init_rng_builtins ();
   aarch64_init_data_intrinsics ();
 
+  aarch64_init_prefetch_builtin ();
+
   tree ftype_jcvt
 = build_function_type_list (intSI_type_node, double_type_node, NULL);
   aarch64_builtin_decls[AARCH64_JSCVT]
@@ -2599,6 +2632,127 @@ aarch64_expand_rng_builtin (tree exp, rtx target, int 
fcode, int ignore)
   return target;
 }
 
+/* Expand a prefetch builtin EXP.  */
+void
+aarch64_expand_prefetch_builtin (tree exp, int fcode)
+{
+
+#define EXPAND_CONST_INT(IN_IDX, OUT_IDX, ERRMSG)  \
+  if (TREE_CODE (args[IN_IDX]) != INTEGER_CST)

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-26 Thread Victor Do Nascimento





On 10/26/23 16:23, Richard Sandiford wrote:

Victor Do Nascimento  writes:

On 10/18/23 21:39, Richard Sandiford wrote:

Victor Do Nascimento  writes:

Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
* gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
---
   gcc/config/aarch64/aarch64-builtins.cc| 200 ++
   gcc/config/aarch64/aarch64.md |  17 ++
   gcc/config/aarch64/arm_acle.h |  30 +++
   .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
   gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
   5 files changed, 411 insertions(+)
   create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d8bb2a989a5 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,17 @@ enum aarch64_builtins
 AARCH64_RBIT,
 AARCH64_RBITL,
 AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
 AARCH64_BUILTIN_MAX
   };
   
@@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)

   AARCH64_BUILTIN_RNG_RNDRRS);
   }
   
+/* Add builtins for reading system register.  */

+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRP, wsrp, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint64_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR64, wsr64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   float_type_node, NULL);
+  AARCH64_INIT_RWSR_BUI

Re: [PATCH V2 7/7] aarch64: Add system register duplication check selftest

2023-10-26 Thread Victor Do Nascimento





On 10/18/23 22:30, Richard Sandiford wrote:

Victor Do Nascimento  writes:

Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.

Duplicate entries are defined as any two SYSREG entries in the .def
file which share the same encoding values (as specified by its `CPENC'
field) and where the relationship amongst the two does not fit into
one of the following categories:

* Simple aliasing: In some cases, it is observed that one
register name serves as an alias to another.  One example of
this is where TRCEXTINSELR aliases TRCEXTINSELR0.
* Expressing intent: It is possible that when a given register
serves two distinct functions depending on how it is used, it
is given two distinct names whose use should match the context
under which it is being used.  Example:  Debug Data Transfer
Register. When used to receive data, it should be accessed as
DBGDTRRX_EL0 while when transmitting data it should be
accessed via DBGDTRTX_EL0.
* Register depreciation: Some register names have been
deprecated and should no longer be used, but backwards-
compatibility requires that such names continue to be
recognized, as is the case for the SPSR_EL1 register, whose
access via the SPSR_SVC name is now deprecated.
* Same encoding different target: Some encodings are given
different meaning depending on the target architecture and, as
such, are given different names in each of theses contexts.
We see an example of this for CPENC(3,4,2,0,0), which
corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2
in Armv8-R targets.

A consequence of these observations is that `CPENC' duplication is
acceptable iff at least one of the `properties' or `arch_reqs' fields
of the `sysreg_t' structs associated with the two registers in
question differ and it's this condition that is checked by the new
`aarch64_test_sysreg_encoding_clashes' function.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64.cc
(aarch64_test_sysreg_encoding_clashes): New.
(aarch64_run_selftests): add call to
aarch64_test_sysreg_encoding_clashes selftest.
---
  gcc/config/aarch64/aarch64.cc | 53 +++
  1 file changed, 53 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d187e171beb..e0be2877ede 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22,6 +22,7 @@
  
  #define INCLUDE_STRING

  #define INCLUDE_ALGORITHM
+#define INCLUDE_VECTOR
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
@@ -28332,6 +28333,57 @@ aarch64_test_fractional_cost ()
ASSERT_EQ (cf (1, 2).as_double (), 0.5);
  }
  
+/* Calculate whether our system register data, as imported from

+   `aarch64-sys-reg.def' has any duplicate entries.  */
+static void
+aarch64_test_sysreg_encoding_clashes (void)
+{
+  using dup_counters_t = hash_map;
+  using dup_instances_t = hash_map>;
+
+  dup_counters_t duplicate_counts;
+  dup_instances_t duplicate_instances;
+
+  /* Every time an encoding is established to come up more than once
+  we add it to a "clash-analysis queue", which is then used to extract
+  necessary information from our hash map when establishing whether
+  repeated encodings are valid.  */


Formatting nit, sorry, but second and subsequent lines should be
indented to line up with the "E".


+
+  /* 1) Collect recurrence information.  */
+  std::vector testqueue;
+
+  for (unsigned i = 0; i < nsysreg; i++)
+{
+  const sysreg_t *reg = sysreg_structs + i;
+
+  unsigned *tbl_entry = _counts.get_or_insert (reg->encoding);
+  *tbl_entry += 1;
+
+  std::vector *tmp
+   = _instances.get_or_insert (reg->encoding);
+
+  tmp->push_back (reg);
+  if (*tbl_entry > 1)
+ testqueue.push_back (reg->encoding);
+}


Do we need two hash maps here?  It looks like the length of the vector
is always equal to the count.  Also...



You're right.  Addressed in next iteration of patch series.


+
+  /* 2) Carry out analysis on collected data.  */
+  for (auto enc : testqueue)


...hash_map itself is iterable.  We could iterate over that instead,
which would avoid the need for the queue.



My rationale here is that I prefer to take up the extra little bit of 
memory to save on execution time.


`duplicate_instances' is an iterable of vectors, with one such vector 
for each encountered encoding value, irrespective of whether or not that 
encoding is duplicated.  Thus to iterate over this, we'd have to 1. 
iterate through every possible vector and 2. check each one's length. 
By having our `testqueue', we know immediately which encodings have 
duplicate sysreg entries and thus we can jump immediately to analyzing

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-26 Thread Victor Do Nascimento





On 10/18/23 21:39, Richard Sandiford wrote:

Victor Do Nascimento  writes:

Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
* gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
---
  gcc/config/aarch64/aarch64-builtins.cc| 200 ++
  gcc/config/aarch64/aarch64.md |  17 ++
  gcc/config/aarch64/arm_acle.h |  30 +++
  .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
  gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
  5 files changed, 411 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d8bb2a989a5 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,17 @@ enum aarch64_builtins
AARCH64_RBIT,
AARCH64_RBITL,
AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
AARCH64_BUILTIN_MAX
  };
  
@@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)

   AARCH64_BUILTIN_RNG_RNDRRS);
  }
  
+/* Add builtins for reading system register.  */

+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRP, wsrp, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint64_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR64, wsr64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   float_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRF, wsrf, fntype);
+
+  fntype
+= build_function_type_list (void

Re: [PATCH V2 2/7] aarch64: Add support for aarch64-sys-regs.def

2023-10-26 Thread Victor Do Nascimento





On 10/18/23 22:07, Richard Sandiford wrote:

Victor Do Nascimento  writes:

This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.

Entries in the aarch64-system-regs.def file should be as follows:

   SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)

Where the arguments to SYSREG correspond to:
   - NAME:  The system register name, as used in the assembly language.
   - CPENC: The system register encoding, mapping to:

   s__c_c_

   - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
  encode extra information required to ensure proper use of
  the system register.  For example, a read-only system
  register will have the flag F_REG_READ, while write-only
  registers will be labeled F_REG_WRITE.  Such flags are
  tested against at compile-time.
   - ARCH: The architectural features the system register is associated
  with.  This is encoded via one of three possible macros:
  1. When a system register is universally implemented, we say
  it has no feature requirements, so we tag it with the
  AARCH64_NO_FEATURES macro.
  2. When a register is only implemented for a single
  architectural extension EXT, the AARCH64_FEATURE (EXT), is
  used.
  3. When a given system register is made available by any of N
  possible architectural extensions, the AARCH64_FEATURES(N, ...)
  macro is used to combine them accordingly.

In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.

Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64.cc (sysreg_t): New.
(sysreg_structs): Likewise.
(nsysreg): Likewise.
(AARCH64_FEATURE): Likewise.
(AARCH64_FEATURES): Likewise.
(AARCH64_NO_FEATURES): Likewise.
* gcc/config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
ISA flag.
(AARCH64_ISA_V8_1A): Likewise.
(AARCH64_ISA_V8_7A): Likewise.
(AARCH64_ISA_V8_8A): Likewise.
(AARCH64_NO_FEATURES): Likewise.
(AARCH64_FL_RAS): New ISA flag alias.
(AARCH64_FL_LOR): Likewise.
(AARCH64_FL_PAN): Likewise.
(AARCH64_FL_AMU): Likewise.
(AARCH64_FL_SCXTNUM): Likewise.
(AARCH64_FL_ID_PFR2): Likewise.
(F_DEPRECATED): New.
(F_REG_READ): Likewise.
(F_REG_WRITE): Likewise.
(F_ARCHEXT): Likewise.
(F_REG_ALIAS): Likewise.
---
  gcc/config/aarch64/aarch64.cc | 38 +++
  gcc/config/aarch64/aarch64.h  | 36 +
  2 files changed, 74 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9fbfc548a89..69de2366424 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2807,6 +2807,44 @@ static const struct processor all_cores[] =
{NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
  };
  
+typedef struct {

+  const char* name;
+  const char* encoding;


Formatting nit, but GCC style is:

   const char *foo

rather than:

   const char* foo;


+  const unsigned properties;
+  const unsigned long long arch_reqs;


I don't think these two should be const.  There's no reason in principle
why a sysreg_t can't be created and modified dynamically.

It would be useful to have some comments above the fields to say what
they represent.  E.g. the definition on its own doesn't make clear what
"properties" refers to.

arch_reqs should use aarch64_feature_flags rather than unsigned long long.
We're running out of feature flags in GCC too, so aarch64_feature_flags
is soon likely to be a C++ class.


+} sysreg_t;
+
+/* An aarch64_feature_set initializer for a single feature,
+   AARCH64_FEATURE_.  */
+#define AARCH64_FEATURE(FEAT) AARCH64_FL_##FEAT
+
+/* Used by AARCH64_FEATURES.  */
+#define AARCH64_OR_FEATURES_1(X, F1) \
+  AARCH64_FEATURE (F1)
+#define AARCH64_OR_FEATURES_2(X, F1, F2) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_1 (X, F2))
+#define AARCH64_OR_FEATURES_3(X, F1, ...) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_2 (X, __VA_ARGS__))
+
+/* An aarch64_feature_set initializer for the N features listed in "...".  */
+#define AARCH64_FEATURES(N, ...) \
+  AARCH64_OR_FEATURES_##N (0, __VA_ARGS__)
+
+/* Database of system registers, their encodings and architectural
+   requirements.  */
+const sysreg_t sysreg_structs[] =
+{
+#define CPENC(SN, OP1,

[PATCH] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-26 Thread Victor Do Nascimento

Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

  (set (reg/i:DI 0 x0)
 (unspec:DI [(const_string ("s3_3_c13_c2_2"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

  "mrs\t%x0, %1"

Testing:
 - Bootstrap/regtest on aarch64-linux-gnu done.

gcc/ChangeLog

* config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.
---
 gcc/config/aarch64/aarch64.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 62b1ae0652f..c715f6369bc 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12346,6 +12346,11 @@ aarch64_print_operand (FILE *f, rtx x, int code)
 
   switch (GET_CODE (x))
{
+   case CONST_STRING:
+ {
+   asm_fprintf (f, "%s", XSTR (x, 0));
+   break;
+ }
case REG:
  if (aarch64_sve_data_mode_p (GET_MODE (x)))
{
-- 
2.41.0

[PATCH V2 3/7] aarch64: Implement system register validation tools

2023-10-18 Thread Victor Do Nascimento

Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler.  In particular, this involves:

  1. Ensuring a supplied string corresponds to a known system
 register name.  System registers can be accessed either via their
 name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
 Register names are validated using a hash map, mapping known
 system register names to its corresponding `sysreg_t' struct,
 which is populated from the `aarch64_system_regs.def' file.
 Register name validation is done via `lookup_sysreg_map', while
 the encoding naming convention is validated via a parser
 implemented in this patch - `is_implem_def_reg'.
  2. Once a given register name is deemed to be valid, it is checked
 against a further 2 criteria:
   a. Is the referenced register implemented in the target
  architecture?  This is achieved by comparing the ARCH field
  in the relevant SYSREG entry from `aarch64_system_regs.def'
  against `aarch64_feature_flags' flags set at compile-time.
   b. Is the register being used correctly?  Check the requested
  operation against the FLAGS specified in SYSREG.
  This prevents operations like writing to a read-only system
  register.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): 
New.
(aarch64_retrieve_sysreg): Likewise.
* gcc/config/aarch64/aarch64.cc (is_implem_def_reg): Likewise.
(aarch64_valid_sysreg_name_p): Likewise.
(aarch64_retrieve_sysreg): Likewise.
(aarch64_register_sysreg): Likewise.
(aarch64_init_sysregs): Likewise.
(aarch64_lookup_sysreg_map): Likewise.
* gcc/config/aarch64/predicates.md (aarch64_sysreg_string): New.
---
 gcc/config/aarch64/aarch64-protos.h |   2 +
 gcc/config/aarch64/aarch64.cc   | 146 
 gcc/config/aarch64/predicates.md|   4 +
 3 files changed, 152 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 60a55f4bc19..a134e2fcf8e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -830,6 +830,8 @@ bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
 bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
 bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+bool aarch64_valid_sysreg_name_p (const char *);
+const char *aarch64_retrieve_sysreg (char *, bool);
 rtx aarch64_check_zero_based_sve_index_immediate (rtx);
 bool aarch64_sve_index_immediate_p (rtx);
 bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 69de2366424..816c4b69fc8 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -85,6 +85,7 @@
 #include "config/arm/aarch-common.h"
 #include "config/arm/aarch-common-protos.h"
 #include "ssa.h"
+#include "hash-map.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2845,6 +2846,52 @@ const sysreg_t sysreg_structs[] =
 const unsigned nsysreg = TOTAL_ITEMS;
 #undef TOTAL_ITEMS
 
+using sysreg_map_t = hash_map;
+static sysreg_map_t *sysreg_map = nullptr;
+
+/* Map system register names to their hardware metadata: Encoding,
+   feature flags and architectural feature requirements, all of which
+   are encoded in a sysreg_t struct.  */
+void
+aarch64_register_sysreg (const char *name, const sysreg_t *metadata)
+{
+  bool dup = sysreg_map->put (name, metadata);
+  gcc_checking_assert (!dup);
+}
+
+/* Lazily initialize hash table for system register validation,
+   checking the validity of supplied register name and returning
+   register's associated metadata.  */
+static void
+aarch64_init_sysregs (void)
+{
+  gcc_assert (!sysreg_map);
+  sysreg_map = new sysreg_map_t;
+  gcc_assert (sysreg_map);
+
+  for (unsigned i = 0; i < nsysreg; i++)
+{
+  const sysreg_t *reg = sysreg_structs + i;
+  aarch64_register_sysreg (reg->name , reg);
+}
+}
+
+/* No direct access to the sysreg hash-map should be made.  Doing so
+   risks trying to acess an unitialized hash-map and dereferencing the
+   returned double pointer without due care risks dereferencing a
+   null-pointer.  */
+const sysreg_t *
+aarch64_lookup_sysreg_map (const char *regname)
+{
+  if (!sysreg_map)
+aarch64_init_sysregs ();
+
+  const sysreg_t **sysreg_entry = sysreg_map->get (regname);
+  if (sysreg_entry != NULL)
+return *sysreg_entry;
+  return NULL;
+}
+
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
@@ -28053,6 +28100,105 @@ aarch64_pars_overlap_p (rtx par1, rtx par2)
   return false;
 }
 
+/* Parse an implementation-defined system register name of

[PATCH V2 6/7] aarch64: Add front-end argument type checking for target builtins

2023-10-18 Thread Victor Do Nascimento

In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.

Example:

  const char *regname = "amcgcr_el0";
  long long a = __builtin_aarch64_rsr64 (regname);

is reduced by the ccp1 pass to

  long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");

As these functions require an argument of STRING_CST type, there needs
to be a check carried out by the front-end capable of picking this up.

The introduced `check_general_builtin_call' function will be called by
the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
carrying out any appropriate checks associated with a particular
builtin function code.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (check_general_builtin_call):
New.
* gcc/config/aarch64/aarch64-c.cc (aarch64_check_builtin_call):
Add check_general_builtin_call call.
* gcc/config/aarch64/aarch64-protos.h (check_general_builtin_call):
New.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c: New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 33 +++
 gcc/config/aarch64/aarch64-c.cc   |  4 +--
 gcc/config/aarch64/aarch64-protos.h   |  3 ++
 .../gcc.target/aarch64/acle/rwsr-2.c  | 15 +
 4 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index d8bb2a989a5..6734361f4f4 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2126,6 +2126,39 @@ aarch64_general_builtin_decl (unsigned code, bool)
   return aarch64_builtin_decls[code];
 }
 
+bool
+check_general_builtin_call (location_t location, vec,
+   unsigned int code, tree fndecl,
+   unsigned int nargs ATTRIBUTE_UNUSED, tree *args)
+{
+  switch (code)
+{
+case AARCH64_RSR:
+case AARCH64_RSRP:
+case AARCH64_RSR64:
+case AARCH64_RSRF:
+case AARCH64_RSRF64:
+case AARCH64_WSR:
+case AARCH64_WSRP:
+case AARCH64_WSR64:
+case AARCH64_WSRF:
+case AARCH64_WSRF64:
+  if (TREE_CODE (args[0]) == VAR_DECL
+ || TREE_CODE (TREE_TYPE (args[0])) != POINTER_TYPE
+ || TREE_CODE (TREE_OPERAND (TREE_OPERAND (args[0], 0) , 0))
+ != STRING_CST)
+   {
+ const char  *fn_name, *err_msg;
+ fn_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+ err_msg = "first argument to %<%s%> must be a string literal";
+ error_at (location, err_msg, fn_name);
+ return false;
+   }
+}
+  /* Default behavior.  */
+  return true;
+}
+
 typedef enum
 {
   SIMD_ARG_COPY_TO_REG,
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index ab8844f6049..c2a9a59df73 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -339,8 +339,8 @@ aarch64_check_builtin_call (location_t loc, vec 
arg_loc,
   switch (code & AARCH64_BUILTIN_CLASS)
 {
 case AARCH64_BUILTIN_GENERAL:
-  return true;
-
+  return check_general_builtin_call (loc, arg_loc, subcode, orig_fndecl,
+nargs, args);
 case AARCH64_BUILTIN_SVE:
   return aarch64_sve::check_builtin_call (loc, arg_loc, subcode,
  orig_fndecl, nargs, args);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index a134e2fcf8e..9ef96ff511f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -990,6 +990,9 @@ tree aarch64_general_builtin_rsqrt (unsigned int);
 void handle_arm_acle_h (void);
 void handle_arm_neon_h (void);
 
+bool check_general_builtin_call (location_t, vec, unsigned int,
+ tree, unsigned int, tree *);
+
 namespace aarch64_sve {
   void init_builtins ();
   void handle_arm_sve_h ();
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
new file mode 100644
index 000..72e5fb75b21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
@@ -0,0 +1,15 @@
+/* Test the __arm_[r,w]sr ACLE intrinsics family.  */
+/* Ensure that illegal behavior is rejected by the compiler.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.4-a" } */
+
+#include 
+
+void
+test_non_const_sysreg_name ()
+{
+  const char *regname = "trcseqstr";
+  long long a = __arm_rsr64 (regname); /* { dg-error "first argument to 
'__builtin_aarch64_rsr64' must be a string literal" } */
+  __arm_wsr64 (regname,

[PATCH V2 2/7] aarch64: Add support for aarch64-sys-regs.def

2023-10-18 Thread Victor Do Nascimento

This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.

Entries in the aarch64-system-regs.def file should be as follows:

  SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)

Where the arguments to SYSREG correspond to:
  - NAME:  The system register name, as used in the assembly language.
  - CPENC: The system register encoding, mapping to:

   s__c_c_

  - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
  encode extra information required to ensure proper use of
  the system register.  For example, a read-only system
  register will have the flag F_REG_READ, while write-only
  registers will be labeled F_REG_WRITE.  Such flags are
  tested against at compile-time.
  - ARCH: The architectural features the system register is associated
  with.  This is encoded via one of three possible macros:
  1. When a system register is universally implemented, we say
  it has no feature requirements, so we tag it with the
  AARCH64_NO_FEATURES macro.
  2. When a register is only implemented for a single
  architectural extension EXT, the AARCH64_FEATURE (EXT), is
  used.
  3. When a given system register is made available by any of N
  possible architectural extensions, the AARCH64_FEATURES(N, ...)
  macro is used to combine them accordingly.

In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.

Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64.cc (sysreg_t): New.
(sysreg_structs): Likewise.
(nsysreg): Likewise.
(AARCH64_FEATURE): Likewise.
(AARCH64_FEATURES): Likewise.
(AARCH64_NO_FEATURES): Likewise.
* gcc/config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
ISA flag.
(AARCH64_ISA_V8_1A): Likewise.
(AARCH64_ISA_V8_7A): Likewise.
(AARCH64_ISA_V8_8A): Likewise.
(AARCH64_NO_FEATURES): Likewise.
(AARCH64_FL_RAS): New ISA flag alias.
(AARCH64_FL_LOR): Likewise.
(AARCH64_FL_PAN): Likewise.
(AARCH64_FL_AMU): Likewise.
(AARCH64_FL_SCXTNUM): Likewise.
(AARCH64_FL_ID_PFR2): Likewise.
(F_DEPRECATED): New.
(F_REG_READ): Likewise.
(F_REG_WRITE): Likewise.
(F_ARCHEXT): Likewise.
(F_REG_ALIAS): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 38 +++
 gcc/config/aarch64/aarch64.h  | 36 +
 2 files changed, 74 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9fbfc548a89..69de2366424 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2807,6 +2807,44 @@ static const struct processor all_cores[] =
   {NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
 };
 
+typedef struct {
+  const char* name;
+  const char* encoding;
+  const unsigned properties;
+  const unsigned long long arch_reqs;
+} sysreg_t;
+
+/* An aarch64_feature_set initializer for a single feature,
+   AARCH64_FEATURE_.  */
+#define AARCH64_FEATURE(FEAT) AARCH64_FL_##FEAT
+
+/* Used by AARCH64_FEATURES.  */
+#define AARCH64_OR_FEATURES_1(X, F1) \
+  AARCH64_FEATURE (F1)
+#define AARCH64_OR_FEATURES_2(X, F1, F2) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_1 (X, F2))
+#define AARCH64_OR_FEATURES_3(X, F1, ...) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_2 (X, __VA_ARGS__))
+
+/* An aarch64_feature_set initializer for the N features listed in "...".  */
+#define AARCH64_FEATURES(N, ...) \
+  AARCH64_OR_FEATURES_##N (0, __VA_ARGS__)
+
+/* Database of system registers, their encodings and architectural
+   requirements.  */
+const sysreg_t sysreg_structs[] =
+{
+#define CPENC(SN, OP1, CN, CM, OP2) "s"#SN"_"#OP1"_c"#CN"_c"#CM"_"#OP2
+#define SYSREG(NAME, ENC, FLAGS, ARCH) \
+  { NAME, ENC, FLAGS, ARCH },
+#include "aarch64-sys-regs.def"
+#undef CPENC
+};
+
+#define TOTAL_ITEMS (sizeof sysreg_structs / sizeof sysreg_structs[0])
+const unsigned nsysreg = TOTAL_ITEMS;
+#undef TOTAL_ITEMS
+
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index d74e9116fc5..cf3969a11aa 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -179,6 +179,8 @@ enum class aarch64_feature : unsigned char {
 
 /* Macros to test ISA flags.  */

[PATCH V2 7/7] aarch64: Add system register duplication check selftest

2023-10-18 Thread Victor Do Nascimento

Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.

Duplicate entries are defined as any two SYSREG entries in the .def
file which share the same encoding values (as specified by its `CPENC'
field) and where the relationship amongst the two does not fit into
one of the following categories:

* Simple aliasing: In some cases, it is observed that one
register name serves as an alias to another.  One example of
this is where TRCEXTINSELR aliases TRCEXTINSELR0.
* Expressing intent: It is possible that when a given register
serves two distinct functions depending on how it is used, it
is given two distinct names whose use should match the context
under which it is being used.  Example:  Debug Data Transfer
Register. When used to receive data, it should be accessed as
DBGDTRRX_EL0 while when transmitting data it should be
accessed via DBGDTRTX_EL0.
* Register depreciation: Some register names have been
deprecated and should no longer be used, but backwards-
compatibility requires that such names continue to be
recognized, as is the case for the SPSR_EL1 register, whose
access via the SPSR_SVC name is now deprecated.
* Same encoding different target: Some encodings are given
different meaning depending on the target architecture and, as
such, are given different names in each of theses contexts.
We see an example of this for CPENC(3,4,2,0,0), which
corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2
in Armv8-R targets.

A consequence of these observations is that `CPENC' duplication is
acceptable iff at least one of the `properties' or `arch_reqs' fields
of the `sysreg_t' structs associated with the two registers in
question differ and it's this condition that is checked by the new
`aarch64_test_sysreg_encoding_clashes' function.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64.cc
(aarch64_test_sysreg_encoding_clashes): New.
(aarch64_run_selftests): add call to
aarch64_test_sysreg_encoding_clashes selftest.
---
 gcc/config/aarch64/aarch64.cc | 53 +++
 1 file changed, 53 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d187e171beb..e0be2877ede 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22,6 +22,7 @@
 
 #define INCLUDE_STRING
 #define INCLUDE_ALGORITHM
+#define INCLUDE_VECTOR
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -28332,6 +28333,57 @@ aarch64_test_fractional_cost ()
   ASSERT_EQ (cf (1, 2).as_double (), 0.5);
 }
 
+/* Calculate whether our system register data, as imported from
+   `aarch64-sys-reg.def' has any duplicate entries.  */
+static void
+aarch64_test_sysreg_encoding_clashes (void)
+{
+  using dup_counters_t = hash_map;
+  using dup_instances_t = hash_map>;
+
+  dup_counters_t duplicate_counts;
+  dup_instances_t duplicate_instances;
+
+  /* Every time an encoding is established to come up more than once
+  we add it to a "clash-analysis queue", which is then used to extract
+  necessary information from our hash map when establishing whether
+  repeated encodings are valid.  */
+
+  /* 1) Collect recurrence information.  */
+  std::vector testqueue;
+
+  for (unsigned i = 0; i < nsysreg; i++)
+{
+  const sysreg_t *reg = sysreg_structs + i;
+
+  unsigned *tbl_entry = _counts.get_or_insert (reg->encoding);
+  *tbl_entry += 1;
+
+  std::vector *tmp
+   = _instances.get_or_insert (reg->encoding);
+
+  tmp->push_back (reg);
+  if (*tbl_entry > 1)
+ testqueue.push_back (reg->encoding);
+}
+
+  /* 2) Carry out analysis on collected data.  */
+  for (auto enc : testqueue)
+{
+  unsigned nrep = *duplicate_counts.get (enc);
+  for (unsigned i = 0; i < nrep; i++)
+   for (unsigned j = i+1; j < nrep; j++)
+ {
+   std::vector *tmp2 = duplicate_instances.get (enc);
+   const sysreg_t *a = (*tmp2)[i];
+   const sysreg_t *b = (*tmp2)[j];
+   ASSERT_TRUE ((a->properties != b->properties)
+|| (a->arch_reqs != b->arch_reqs));
+ }
+}
+}
+
 /* Run all target-specific selftests.  */
 
 static void
@@ -28339,6 +28391,7 @@ aarch64_run_selftests (void)
 {
   aarch64_test_loading_full_dump ();
   aarch64_test_fractional_cost ();
+  aarch64_test_sysreg_encoding_clashes ();
 }
 
 } // namespace selftest
-- 
2.41.0

[PATCH V2 1/7] aarch64: Sync system register information with Binutils

2023-10-18 Thread Victor Do Nascimento

This patch adds the `aarch64-sys-regs.def' file, originally written
for Binutils, to GCC. In so doing, it provides GCC with the necessary
information for teaching the compiler about system registers known to
the assembler and how these can be used.

By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; By keeping both copies of the file in sync,
any `SYSREG (...)' that is added in one project is automatically added
to its counterpart.  This being the case, no change should be made in
the GCC copy of the file.  Any modifications should first be made in
Binutils and the resulting file copied over to GCC.

GCC does not implement the full range of ISA flags present in
Binutils.  Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC.  Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file.  This is done
in the next patch in the series.

`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S encoding of system registers when
issuing mrs/msr instructions.  This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-system-regs.def: New.
---
 gcc/config/aarch64/aarch64-sys-regs.def | 1064 +++
 1 file changed, 1064 insertions(+)
 create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def

diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def
new file mode 100644
index 000..d24a2455503
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -0,0 +1,1064 @@
+/* aarch64-system-regs.def -- AArch64 opcode support.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of the GNU opcodes library.
+
+   This library is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   It is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; see the file COPYING3.  If not,
+   see .  */
+
+/* Array of system registers and their associated arch features.
+
+   This file is also used by GCC.  Where necessary, any updates should
+   be made in Binutils and the updated file copied across to GCC, such
+   that the two projects are kept in sync at all times.
+
+   Before using #include to read this file, define a macro:
+
+ SYSREG (name, encoding, flags, features)
+
+  The NAME is the system register name, as recognized by the
+  assembler.  ENCODING provides the necessary information for the binary
+  encoding of the system register.  The FLAGS field is a bitmask of
+  relevant behavior information pertaining to the particular register.
+  For example: is it read/write-only? does it alias another register?
+  The FEATURES field maps onto ISA flags and specifies the architectural
+  feature requirements of the system register.  */
+
+  SYSREG ("accdata_el1",   CPENC (3,0,13,0,5), 0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el1", CPENC (3,0,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el2", CPENC (3,4,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el3", CPENC (3,6,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el1", CPENC (3,0,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el12",CPENC (3,5,5,1,0),  F_ARCHEXT,  
AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr0_el2", CPENC (3,4,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el3", CPENC (3,6,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el1", CPENC (3,0,5,1,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el12",CPENC (3,5,5,1,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr1_el2", CPENC (3,4,5,1,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el3", CPENC (3,6,5,1,1),  0,

[PATCH V2 4/7] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-18 Thread Victor Do Nascimento

Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

  (set (reg/i:DI 0 x0)
 (unspec:DI [(const_string ("s3_3_c13_c2_2"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

  "mrs\t%x0, %1"

gcc/ChangeLog

* gcc/config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.
---
 gcc/config/aarch64/aarch64.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 816c4b69fc8..d187e171beb 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12430,6 +12430,12 @@ aarch64_print_operand (FILE *f, rtx x, int code)
 
   switch (GET_CODE (x))
{
+   case CONST_STRING:
+ {
+   const char *output_op = XSTR (x, 0);
+   asm_fprintf (f, "%s", output_op);
+   break;
+ }
case REG:
  if (aarch64_sve_data_mode_p (GET_MODE (x)))
{
-- 
2.41.0

[PATCH V2 0/7] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-10-18 Thread Victor Do Nascimento

This revision of the patch series addresses the following key pieces
of upstream feedback:

  * `aarch64-sys-regs.def', being identical in content to the file with
  the same name in Binutils, now retains the copyright header from
  Binutils.
  * We migrate away from the binary search handling of system-register
  lookups in favour of a hashmap approach, relaxing the requirement
  that all entries in `aarch64-sys-reg.def' be kept in alphabetical
  order.
  * A static selftest is added for sanity-checking of the contents of
  `aarch64-sys-regs.def'.  Given the move to a hashmap lookup mechanism,
  no testing is needed for the preservation of alphabetical order, but
  a test is added to detect spurious duplicate register definitions.

---

This patch series adds support for reading and writing to and from
system registers via the relevant ACLE-defined builtins [1].

The patch series makes a series of additions to the aarch64-specific
areas of the compiler to make this possible.

Firstly, a mechanism for defining system registers is established via a
new .def file and the new SYSREG macro.  This macro is the same as is
used in Binutils and system register entries are compatible with
either code-base.

Given the information contained in this system register definition
file, a compile-time validation mechanism is implemented, such that any
system register name passed as a string literal argument to these
builtins can be checked against known system registers and its use
for a given target architecture validated.

Finally, patterns for each of these builtins are added to the back-end
such that, if all validation criteria are met, the correct assembly is
emitted.

Thus, the following example of system register access is now valid for
GCC:

long long old = __arm_rsr("trcseqstr");
__arm_wsr("trcseqstr", new);

Testing:
 - Bootstrap/regtest on aarch64-linux-gnu done.

[1] https://arm-software.github.io/acle/main/acle.html

Victor Do Nascimento (7):
  aarch64: Sync system register information with Binutils
  aarch64: Add support for aarch64-sys-regs.def
  aarch64: Implement system register validation tools
  aarch64: Add basic target_print_operand support for CONST_STRING
  aarch64: Implement system register r/w arm ACLE intrinsic functions
  aarch64: Add front-end argument type checking for target builtins
  aarch64: Add system register duplication check selftest

 gcc/config/aarch64/aarch64-builtins.cc|  233 
 gcc/config/aarch64/aarch64-c.cc   |4 +-
 gcc/config/aarch64/aarch64-protos.h   |5 +
 gcc/config/aarch64/aarch64-sys-regs.def   | 1064 +
 gcc/config/aarch64/aarch64.cc |  243 
 gcc/config/aarch64/aarch64.h  |   36 +
 gcc/config/aarch64/aarch64.md |   17 +
 gcc/config/aarch64/arm_acle.h |   30 +
 gcc/config/aarch64/predicates.md  |4 +
 .../gcc.target/aarch64/acle/rwsr-1.c  |   20 +
 .../gcc.target/aarch64/acle/rwsr-2.c  |   15 +
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  |  144 +++
 12 files changed, 1813 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

-- 
2.41.0

[PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-18 Thread Victor Do Nascimento

Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
* gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc| 200 ++
 gcc/config/aarch64/aarch64.md |  17 ++
 gcc/config/aarch64/arm_acle.h |  30 +++
 .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
 5 files changed, 411 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d8bb2a989a5 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,17 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)
   AARCH64_BUILTIN_RNG_RNDRRS);
 }
 
+/* Add builtins for reading system register.  */
+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRP, wsrp, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint64_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR64, wsr64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   float_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRF, wsrf, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   double_type_node, NULL);
+

1 2 >

1 - 100 of 113 matches

Mail list logo