[PATCH v2 2/4] Libatomic: Define per-file identifier macros

2024-06-11 Thread Victor Do Nascimento
In order to facilitate the fine-tuning of how `libatomic_i.h' and `host-config.h' headers are used by different atomic functions, we define distinct identifier macros for each file which, in implementing atomic operations, imports these headers. The idea is that different parts of these headers

[PATCH v2 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file

2024-06-11 Thread Victor Do Nascimento
At present, `atomic_16.S' groups different implementations of the same functions together in the file. Therefore, as an example, the LSE2 implementation of `load_16' follows on immediately from its core implementation, as does the `store_16' LSE2 implementation. Such architectural

[PATCH v2 3/4] Libatomic: Make ifunc selector behavior contingent on importing file

2024-06-11 Thread Victor Do Nascimento
By querying previously-defined file-identifier macros, `host-config.h' is able to get information about its environment and, based on this information, select more appropriate function-specific ifunc selectors. This reduces the number of unnecessary feature tests that need to be carried out in

[PATCH v2 1/4] Libatomic: AArch64: Convert all lse128 assembly to .insn directives

2024-06-11 Thread Victor Do Nascimento
Given the lack of support for the LSE128 instructions in all but the the most up-to-date version of Binutils (2.42), having the build-time test for assembler support for these instructions often leads to the building of Libatomic without support for LSE128-dependent atomic function

[PATCH v2 0/4] Libatomic: Cleanup ifunc selector and aliasing

2024-06-11 Thread Victor Do Nascimento
on armv9.4-a target with LRCPC3 and LSE128 support and without. Victor Do Nascimento (4): Libatomic: AArch64: Convert all lse128 assembly to .insn directives Libatomic: Define per-file identifier macros Libatomic: Make ifunc selector behavior contingent on importing file Libatomic: Clean up

[PATCH v2] middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061]

2024-06-11 Thread Victor Do Nascimento
At present the autovectorizer fails to vectorize simple loops involving calls to `__builtin_prefetch'. A simple example of such loop is given below: void foo(double * restrict a, double * restrict b, int n){ int i; for(i=0; i *references) clobbers_memory = true;

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Victor Do Nascimento
Christina wrote: -Original Message- From: Richard Biener Sent: Friday, May 17, 2024 10:46 AM To: Tamar Christina Cc: Victor Do Nascimento ; gcc- patc...@gcc.gnu.org; Richard Sandiford ; Richard Earnshaw ; Victor Do Nascimento Subject: Re: [PATCH] middle-end: Expand {u|s}dot produ

[PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Victor Do Nascimento
From: Victor Do Nascimento At present, the compiler offers the `{u|s|us}dot_prod_optab' direct optabs for dealing with vectorizable dot product code sequences. The consequence of using a direct optab for this is that backend-pattern selection is only ever able to match against one datatype

Re: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento
On 5/16/24 15:16, Andrew Pinski wrote: On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento mailto:victor.donascime...@arm.com>> wrote: At present the autovectorizer fails to vectorize simple loops involving calls to `__builtin_prefetch'.  A simple example of such loop is

[PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento
At present the autovectorizer fails to vectorize simple loops involving calls to `__builtin_prefetch'. A simple example of such loop is given below: void foo(double * restrict a, double * restrict b, int n){ int i; for(i=0; i *references) clobbers_memory = true;

[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-05-16 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing the Load-Acquire RCpc Pair Ordered, and Store-Release Pair Ordered operations in the form of LDIAPP and STILP. These operations are

[PATCH 1/4] Libatomic: Define per-file identifier macros

2024-05-16 Thread Victor Do Nascimento
In order to facilitate the fine-tuning of how `libatomic_i.h' and `host-config.h' headers are used by different atomic functions, we define distinct identifier macros for each file which, in implementing atomic operations, imports these headers. The idea is that different parts of these headers

[PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file

2024-05-16 Thread Victor Do Nascimento
At present, `atomic_16.S' groups different implementations of the same functions together in the file. Therefore, as an example, the LSE128 implementation of `exchange_16' follows on immediately from its core implementation, as does the `fetch_or_16' LSE128 implementation. Such architectural

[PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file

2024-05-16 Thread Victor Do Nascimento
By querying previously-defined file-identifier macros, `host-config.h' is able to get information about its environment and, based on this information, select more appropriate function-specific ifunc selectors. This reduces the number of unnecessary feature tests that need to be carried out in

[PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing

2024-05-16 Thread Victor Do Nascimento
Following improvements to the way ifuncs are selected based on detected architectural features, we are able to do away with many of the aliases that were previously needed for subsets of atomic functions that were not implemented in a given extension. This may be clarified by virtue of an

[PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing

2024-05-16 Thread Victor Do Nascimento
on armv9.4-a target with LRCPC3 and LSE128 support and without. Victor Do Nascimento (4): Libatomic: Define per-file identifier macros Libatomic: Make ifunc selector behavior contingent on importing file Libatomic: Clean up AArch64 ifunc aliasing Libatomic: Clean up AArch64 `atomic_16.S

[gcc r14-9684] aarch64: Add +lse128 architectural extension command-line flag

2024-03-27 Thread Victor Do Nascimento via Gcc-cvs
https://gcc.gnu.org/g:d9ea5181800fee3c0c21c33a463d8ca90b004ff6 commit r14-9684-gd9ea5181800fee3c0c21c33a463d8ca90b004ff6 Author: Victor Do Nascimento Date: Tue Mar 5 20:38:26 2024 + aarch64: Add +lse128 architectural extension command-line flag Given how, at present

[gcc r14-9685] aarch64: Align lrcpc3 FEAT_STRING with /proc/cpuinfo 'Features' entry

2024-03-27 Thread Victor Do Nascimento via Gcc-cvs
https://gcc.gnu.org/g:258defbdee985c8f27c33e9f533c19d059941972 commit r14-9685-g258defbdee985c8f27c33e9f533c19d059941972 Author: Victor Do Nascimento Date: Tue Mar 5 20:38:26 2024 + aarch64: Align lrcpc3 FEAT_STRING with /proc/cpuinfo 'Features' entry Due to the Linux kernel

Re: [PATCH] aarch64: Add +lse128 architectural extension command-line flag

2024-03-27 Thread Victor Do Nascimento
On 3/26/24 12:26, Richard Sandiford wrote: Victor Do Nascimento writes: Given how, at present, the choice of using LSE128 atomic instructions by the toolchain is delegated to run-time selection in the form of Libatomic ifuncs, responsible for querying target support, the `+lse128' target

[PATCH] aarch64: Align lrcpc3 FEAT_STRING with /proc/cpuinfo 'Features' entry

2024-03-25 Thread Victor Do Nascimento
Due to the Linux kernel exposing the lrcpc3 architectural feature as "lrcpc3", this patch corrects the relevant FEATURE_STRING entry in the "rcpc3" AARCH64_OPT_FMV_EXTENSION macro, such that the feature can be correctly detected when doing native compilation on rcpc3-enabled targets. Regtested on

[PATCH] aarch64: Add +lse128 architectural extension command-line flag

2024-03-15 Thread Victor Do Nascimento
Given how, at present, the choice of using LSE128 atomic instructions by the toolchain is delegated to run-time selection in the form of Libatomic ifuncs, responsible for querying target support, the `+lse128' target architecture compile-time flag is absent from GCC. This, however, contrasts with

Re: [libatomic PATCH] PR other/113336: Fix libatomic testsuite regressions on ARM.

2024-02-14 Thread Victor Do Nascimento
-gnueabihf with --with-arch=armv6 with make bootstrap and make -k check where it fixes all of the FAILs in libatomic. Ok for mainline? 2024-01-28 Roger Sayle Victor Do Nascimento libatomic/ChangeLog PR other/113336 * Makefile.am: Build tas_1_2_.o

[PATCH] AArch64: Update system register database.

2024-02-06 Thread Victor Do Nascimento
With the release of Binutils 2.42, this brings the level of system-register support in GCC in line with the current state-of-the-art in Binutils, ensuring everything available in Binutils is plainly accessible from GCC. Where Binutils uses a more detailed description of which features are

Re: [PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-26 Thread Victor Do Nascimento
On 1/26/24 10:53, Richard Sandiford wrote: > Victor Do Nascimento writes: >> @@ -712,6 +760,27 @@ ENTRY (libat_test_and_set_16) >> END (libat_test_and_set_16) >> >> >> +/* Alias all LSE128_LRCPC3 ifuncs to their specific implementations, >> + t

Re: [libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-25 Thread Victor Do Nascimento
On 1/11/24 15:55, Roger Sayle wrote: Hi Richard, As you've recommended, this issue has now been filed in bugzilla as PR other/113336. As explained in the new PR, libatomic's testsuite used to pass on armv6 (raspberry pi) in previous GCC releases, but the code was incorrect/non-synchronous;

[PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-24 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing the Load-Acquire RCpc Pair Ordered, and Store-Release Pair Ordered operations in the form of LDIAPP and STILP. These operations are

[PATCH v2 1/2] libatomic: Increase max IFUNC_NCOND(N) from 3 to 4.

2024-01-24 Thread Victor Do Nascimento
libatomic/ChangeLog: * libatomic_i.h: Add GEN_SELECTOR implementation for IFUNC_NCOND(N) == 4. --- libatomic/libatomic_i.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h index 861a22da152..0a854fd908c

[PATCH v2 0/2] libatomic: AArch64 rcpc3 128-bit atomic operation enablement

2024-01-24 Thread Victor Do Nascimento
/gcc-patches/2024-January/643841.html Victor Do Nascimento (2): libatomic: Increase max IFUNC_NCOND(N) from 3 to 4. libatomic: Add rcpc3 128-bit atomic operations for AArch64 libatomic/Makefile.am| 6 +- libatomic/Makefile.in| 22

[PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-24 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their

[PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-24 Thread Victor Do Nascimento
At present, Evaluation of both `has_lse2(hwcap)' and `has_lse128(hwcap)' may require issuing an `mrs' instruction to query a system register. This instruction, when issued from user-space results in a trap by the kernel which then returns the value read in by the system register. Given the

[PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-24 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset)

[PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver

2024-01-24 Thread Victor Do Nascimento
With support for new atomic features in Armv9.4-a being indicated by HWCAP2 bits, Libatomic's ifunc resolver must now query its second argument, of type __ifunc_arg_t*. We therefore make this argument known to libatomic, allowing us to query hwcap2 bits in the following manner: bool resolver

[PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64

2024-01-24 Thread Victor Do Nascimento
ion tested on aarch64-linux-gnu target with LSE128-support. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html Victor Do Nascimento (4): libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface liba

Re: [PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-08 Thread Victor Do Nascimento
On 1/5/24 11:10, Richard Sandiford wrote: Victor Do Nascimento writes: The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Victor Do Nascimento
On 1/5/24 11:47, Richard Sandiford wrote: Victor Do Nascimento writes: The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers

[PATCH v3 3/3] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-02 Thread Victor Do Nascimento
At present, Evaluation of both `has_lse2(hwcap)' and `has_lse128(hwcap)' may require issuing an `mrs' instruction to query a system register. This instruction, when issued from user-space results in a trap by the kernel which then returns the value read in by the system register. Given the

[PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-02 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their

[PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-02 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset)

[PATCH v3 0/3] Libatomic: Add LSE128 atomics support for AArch64

2024-01-02 Thread Victor Do Nascimento
/620529.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html Victor Do Nascimento (3): libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface libatomic: Enable LSE128 128-bit atomics for armv9.4-a aarch64: Add explicit checks for implicit LSE/LSE2

[PATCH] aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors

2023-12-09 Thread Victor Do Nascimento
In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed long. This means that when the `arm_neon.h' header is used by the kernel, any use of the `uint64_t' / `in64_t' types needs to be correctly cast to the correct `__builtin_aarch64_simd_di' / `__builtin_aarch64_simd_df' types when

[PATCH v3] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-12-05 Thread Victor Do Nascimento
Key changes in v3: * Implement the `require_const_argument' function to ensure the nth argument in EXP represents a const-type argument in the valid range given by [minval, maxval), forgoing expansion altogether when an invalid argument is detected early on. * Whereas in the previous

[PATCH v2 3/5] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2023-11-28 Thread Victor Do Nascimento
This patch updates `aarch64-sys-regs.def', bringing it into sync with the Binutils source. gcc/ChangeLog: * config/aarch64/aarch64-sys-regs.def (par_el1): New. (rcwmask_el1): Likewise. (rcwsmask_el1): Likewise. (ttbr0_el1): Likewise. (ttbr0_el12):

[PATCH v2 1/5] aarch64: Add march flags for +the and +d128 arch extensions

2023-11-28 Thread Victor Do Nascimento
Given the introduction of optional 128-bit page table descriptor and translation hardening extension support with the Arm9.4-a architecture, this introduces the relevant flags to enable the reading and writing of 128-bit system registers. The `+d128' -march modifier enables the use of the

[PATCH v2 4/5] aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins

2023-11-28 Thread Victor Do Nascimento
Implement the ACLE builtins for 128-bit system register manipulation: * __uint128_t __arm_rsr128(const char *special_register); * void __arm_wsr128(const char *special_register, __uint128_t value); gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New

[PATCH v2 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-28 Thread Victor Do Nascimento
Extend existing unit tests for the ACLE system register manipulation functions to include 128-bit tests. gcc/testsuite/ChangeLog: * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New. (set_wsr128): Likewise. --- gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 32

[PATCH v2 0/5] aarch64: Add Armv9.4-a 128-bit system-register read/write support

2023-11-28 Thread Victor Do Nascimento
Control Stack (GCS) `+gcs' architecture modifier flag, allowing the inclusion of the novel GCS system registers which are now supported and also present in the `aarch64-sys-regs.def' system register database. Victor Do Nascimento (5): aarch64: Add march flags for +the and +d128 arch extensions

[PATCH v2 2/5] aarch64: Add support for GCS system registers with the +gcs modifier

2023-11-28 Thread Victor Do Nascimento
Given the introduction of system registers associated with the Guarded Control Stack extension to Armv9.4-a in Binutils and their reliance on the `+gcs' modifier, we implement the necessary changes in GCC to allow for them to be recognized by the compiler. gcc/ChangeLog: *

[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2023-11-13 Thread Victor Do Nascimento
Continuing on from previously-proposed Libatomic enablement work [1], the introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing both the Load-Acquire RCpc Pair Ordered, and Store-Release Pair

[PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2023-11-13 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset)

[PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2023-11-13 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their

[PATCH v2 0/2] Libatomic: Add LSE128 atomics support for AArch64

2023-11-13 Thread Victor Do Nascimento
/626358.html Victor Do Nascimento (2): libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface libatomic: Enable LSE128 128-bit atomics for armv9.4-a libatomic/Makefile.am| 3 + libatomic/Makefile.in| 1 + libatomic/acinclude.m4

[PATCH 5/5] aarch64: rcpc3: Add intrinsics tests

2023-11-09 Thread Victor Do Nascimento
Add unit test to ensure that added intrinsics compile to the correct `LDAP1 {Vt.D}[lane],[Xn]' and `STL1 {Vt.d}[lane],[Xn]' instructions. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/rcpc3.c: New. --- gcc/testsuite/gcc.target/aarch64/acle/rcpc3.c | 47 +++ 1 file

[PATCH 1/5] aarch64: rcpc3: Add +rcpc3 extension

2023-11-09 Thread Victor Do Nascimento
Given the optional LRCPC3 target support for Armv8.2-a cores onwards, the +rcpc3 arch feature modifier is added to GCC's command-line options. gcc/ChangeLog: * config/aarch64/aarch64-option-extensions.def (rcpc3): New. * config/aarch64/aarch64.h (AARCH64_ISA_RCPC3): Likewise.

[PATCH 4/5] aarch64: rcpc3: add Neon ACLE wrapper functions to `arm_neon.h'

2023-11-09 Thread Victor Do Nascimento
Create the necessary mappings from the ACLE-defined Neon intrinsics names[1] to the internal builtin function names. [1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html gcc/ChangeLog: * gcc/config/aarch64/arm_neon.h (vldap1_lane_u64): New. (vldap1q_lane_u64):

[PATCH 3/5] aarch64: rcpc3: Add Neon ACLE intrinsics

2023-11-09 Thread Victor Do Nascimento
Register the target specific builtins in `aarch64-simd-builtins.def' and implement their associated backend patterns in `aarch64-simd.md'. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (vec_ldap1_lane): New. (vec_stl1_lane): Likewise. *

[PATCH 2/5] aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics

2023-11-09 Thread Victor Do Nascimento
The LDAP1 and STL1 Neon ACLE intrinsics, operating on 64-bit data values, operate on single-lane (Vt.1D) or twin-lane (Vt.2D) SIMD register configurations, either in the DI or DF modes. This leads to the need for a mode iterator accounting for the V1DI, V1DF, V2DI and V2DF modes. This patch

[PATCH 0/5] aarch64: Add ACLE intrinsics codegen support for lrcpc3 instructions

2023-11-09 Thread Victor Do Nascimento
}64 * ldp1q_lane_{u|s|p}64 Bootstrapped and regression tested on aarch64-none-linux-gnu. Victor Do Nascimento (5): aarch64: rcpc3: Add +rcpc3 extension aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics aarch64: rcpc3: Add Neon ACLE intrinsics aarch64: rcpc3: add Neon ACLE

[PATCH 4/5] aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins

2023-11-07 Thread Victor Do Nascimento
Implement the ACLE builtins for 128-bit system register manipulation: * __uint128_t __arm_rsr128(const char *special_register); * void __arm_wsr128(const char *special_register, __uint128_t value); gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New

[PATCH 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-07 Thread Victor Do Nascimento
Extend existing unit tests for the ACLE system register manipulation functions to include 128-bit tests. gcc/testsuite/ChangeLog: * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New. (set_wsr128): Likewise. --- gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 30

[PATCH 0/5] aarch64: Add Armv9.4-a 128-bit system-register read/write support

2023-11-07 Thread Victor Do Nascimento
present in the `aarch64-sys-regs.def' system register database. Victor Do Nascimento (5): aarch64: Add march flags for +the and +d128 arch extensions aarch64: Add support for GCS system registers with the +gcs modifier aarch64: Sync `aarch64-sys-regs.def' with Binutils. aarch64: Implement

[PATCH 1/5] aarch64: Add march flags for +the and +d128 arch extensions

2023-11-07 Thread Victor Do Nascimento
Given the introduction of optional 128-bit page table descriptor and translation hardening extension support with the Arm9.4-a architecture, this introduces the relevant flags to enable the reading and writing of 128-bit system registers. The `+d128' -march modifier enables the use of the

[PATCH 3/5] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2023-11-07 Thread Victor Do Nascimento
This patch updates `aarch64-sys-regs.def', bringing it into sync with the Binutils source. gcc/ChangeLog: * config/aarch64/aarch64-sys-regs.def (par_el1): New. (rcwmask_el1): Likewise. (rcwsmask_el1): Likewise. (ttbr0_el1): Likewise. (ttbr0_el12):

[PATCH 2/5] aarch64: Add support for GCS system registers with the +gcs modifier

2023-11-07 Thread Victor Do Nascimento
Given the introduction of system registers associated with the Guarded Control Stack extension to Armv9.4-a in Binutils and their reliance on the `+gcs' modifier, we implement the necessary changes in GCC to allow for them to be recognized by the compiler. gcc/ChangeLog: *

[PATCH 0/2] Libatomic: Add LSE128 atomics support for AArch64

2023-11-07 Thread Victor Do Nascimento
architectural support is present. Regression tested on aarch64-linux-gnu target with LSE128-support. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html Victor Do Nascimento (2): libatomic: atomic_16.S: Improve ENTRY

[PATCH 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2023-11-07 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset)

[PATCH 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2023-11-07 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their

[PATCH V3 5/6] aarch64: Add front-end argument type checking for target builtins

2023-11-02 Thread Victor Do Nascimento
In implementing the ACLE read/write system register builtins it was observed that leaving argument type checking to be done at expand-time meant that poorly-formed function calls were being "fixed" by certain optimization passes, meaning bad code wasn't being properly picked up in checking.

[PATCH V3 4/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-11-02 Thread Victor Do Nascimento
Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const char *special_register); uint64_t __arm_rsr64(const char *special_register); void* __arm_rsrp(const char *special_register); float

[PATCH V3 6/6] aarch64: Add system register duplication check selftest

2023-11-02 Thread Victor Do Nascimento
Add a build-time test to check whether system register data, as imported from `aarch64-sys-reg.def' has any duplicate entries. Duplicate entries are defined as any two SYSREG entries in the .def file which share the same encoding values (as specified by its `CPENC' field) and where the

[PATCH V3 0/6] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-11-02 Thread Victor Do Nascimento
__arm_wsr("trcseqstr", new); Testing: - Bootstrap/regtest on aarch64-linux-gnu done. [1] https://arm-software.github.io/acle/main/acle.html Victor Do Nascimento (6): aarch64: Sync system register information with Binutils aarch64: Add support for aarch64-sys-regs.def aarch64: Imple

[PATCH V3 3/6] aarch64: Implement system register validation tools

2023-11-02 Thread Victor Do Nascimento
Given the implementation of a mechanism of encoding system registers into GCC, this patch provides the mechanism of validating their use by the compiler. In particular, this involves: 1. Ensuring a supplied string corresponds to a known system register name. System registers can be

[PATCH V3 1/6] aarch64: Sync system register information with Binutils

2023-11-02 Thread Victor Do Nascimento
This patch adds the `aarch64-sys-regs.def' file, originally written for Binutils, to GCC. In so doing, it provides GCC with the necessary information for teaching the compiler about system registers known to the assembler and how these can be used. By aligning the representation of data common to

[PATCH V3 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-11-02 Thread Victor Do Nascimento
This patch defines the structure of a new .def file used for representing the aarch64 system registers, what information it should hold and the basic framework in GCC to process this file. Entries in the aarch64-system-regs.def file should be as follows: SYSREG (NAME, CPENC (sn,op1,cn,cm,op2),

[PATCH V2] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-10-30 Thread Victor Do Nascimento
Correct CV-qualification from being erroeously applied to the `addr' pointer, applying it instead to its pointer target, as specified by the ACLE standards. --- Implement the ACLE data and instruction prefetch functions[1] with the following signatures: 1. Data prefetch intrinsics:

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-27 Thread Victor Do Nascimento
On 10/27/23 14:18, Alex Coplan wrote: On 26/10/2023 16:23, Richard Sandiford wrote: Victor Do Nascimento writes: On 10/18/23 21:39, Richard Sandiford wrote: Victor Do Nascimento writes: Implement the aarch64 intrinsics for reading and writing system registers with the following

[PATCH] aarch64: Implement ACLE instruction/data prefetch functions.

2023-10-27 Thread Victor Do Nascimento
Implement the ACLE data and instruction prefetch functions[1] with the following signatures: 1. Data prefetch intrinsics: void __pldx (/*constant*/ unsigned int /*access_kind*/, /*constant*/ unsigned int /*cache_level*/,

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-26 Thread Victor Do Nascimento
On 10/26/23 16:23, Richard Sandiford wrote: Victor Do Nascimento writes: On 10/18/23 21:39, Richard Sandiford wrote: Victor Do Nascimento writes: Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const

Re: [PATCH V2 7/7] aarch64: Add system register duplication check selftest

2023-10-26 Thread Victor Do Nascimento
On 10/18/23 22:30, Richard Sandiford wrote: Victor Do Nascimento writes: Add a build-time test to check whether system register data, as imported from `aarch64-sys-reg.def' has any duplicate entries. Duplicate entries are defined as any two SYSREG entries in the .def file which share

Re: [PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-26 Thread Victor Do Nascimento
On 10/18/23 21:39, Richard Sandiford wrote: Victor Do Nascimento writes: Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const char *special_register); uint64_t __arm_rsr64(const char

Re: [PATCH V2 2/7] aarch64: Add support for aarch64-sys-regs.def

2023-10-26 Thread Victor Do Nascimento
On 10/18/23 22:07, Richard Sandiford wrote: Victor Do Nascimento writes: This patch defines the structure of a new .def file used for representing the aarch64 system registers, what information it should hold and the basic framework in GCC to process this file. Entries in the aarch64

[PATCH] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-26 Thread Victor Do Nascimento
Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accordingly. Consequently, an rtx such as: (set (reg/i:DI 0 x0) (unspec:DI [(const_string

[PATCH V2 3/7] aarch64: Implement system register validation tools

2023-10-18 Thread Victor Do Nascimento
Given the implementation of a mechanism of encoding system registers into GCC, this patch provides the mechanism of validating their use by the compiler. In particular, this involves: 1. Ensuring a supplied string corresponds to a known system register name. System registers can be

[PATCH V2 6/7] aarch64: Add front-end argument type checking for target builtins

2023-10-18 Thread Victor Do Nascimento
In implementing the ACLE read/write system register builtins it was observed that leaving argument type checking to be done at expand-time meant that poorly-formed function calls were being "fixed" by certain optimization passes, meaning bad code wasn't being properly picked up in checking.

[PATCH V2 2/7] aarch64: Add support for aarch64-sys-regs.def

2023-10-18 Thread Victor Do Nascimento
This patch defines the structure of a new .def file used for representing the aarch64 system registers, what information it should hold and the basic framework in GCC to process this file. Entries in the aarch64-system-regs.def file should be as follows: SYSREG (NAME, CPENC (sn,op1,cn,cm,op2),

[PATCH V2 7/7] aarch64: Add system register duplication check selftest

2023-10-18 Thread Victor Do Nascimento
Add a build-time test to check whether system register data, as imported from `aarch64-sys-reg.def' has any duplicate entries. Duplicate entries are defined as any two SYSREG entries in the .def file which share the same encoding values (as specified by its `CPENC' field) and where the

[PATCH V2 1/7] aarch64: Sync system register information with Binutils

2023-10-18 Thread Victor Do Nascimento
This patch adds the `aarch64-sys-regs.def' file, originally written for Binutils, to GCC. In so doing, it provides GCC with the necessary information for teaching the compiler about system registers known to the assembler and how these can be used. By aligning the representation of data common to

[PATCH V2 4/7] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-18 Thread Victor Do Nascimento
Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accordingly. Consequently, an rtx such as: (set (reg/i:DI 0 x0) (unspec:DI [(const_string

[PATCH V2 0/7] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-10-18 Thread Victor Do Nascimento
("trcseqstr"); __arm_wsr("trcseqstr", new); Testing: - Bootstrap/regtest on aarch64-linux-gnu done. [1] https://arm-software.github.io/acle/main/acle.html Victor Do Nascimento (7): aarch64: Sync system register information with Binutils aarch64: Add support for

[PATCH V2 5/7] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-18 Thread Victor Do Nascimento
Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const char *special_register); uint64_t __arm_rsr64(const char *special_register); void* __arm_rsrp(const char *special_register); float

Re: [PATCH 6/6] aarch64: Add front-end argument type checking for target builtins

2023-10-09 Thread Victor Do Nascimento
On 10/7/23 12:53, Richard Sandiford wrote: Richard Earnshaw writes: On 03/10/2023 16:18, Victor Do Nascimento wrote: In implementing the ACLE read/write system register builtins it was observed that leaving argument type checking to be done at expand-time meant that poorly-formed function

Re: [PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-09 Thread Victor Do Nascimento
On 10/9/23 01:02, Ramana Radhakrishnan wrote: On 5 Oct 2023, at 14:04, Victor Do Nascimento wrote: External email: Use caution opening links or attachments On 10/5/23 12:42, Richard Earnshaw wrote: On 03/10/2023 16:18, Victor Do Nascimento wrote: This patch adds the `aarch64-sys

Re: [PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-05 Thread Victor Do Nascimento
On 10/5/23 12:42, Richard Earnshaw wrote: On 03/10/2023 16:18, Victor Do Nascimento wrote: This patch adds the `aarch64-sys-regs.def' file to GCC, teaching the compiler about system registers known to the assembler and how these can be used. The macros used to hold system register

Re: [PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-05 Thread Victor Do Nascimento
On 10/5/23 13:26, Richard Earnshaw wrote: On 03/10/2023 16:18, Victor Do Nascimento wrote: Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accordingly

[PATCH 3/6] aarch64: Implement system register validation tools

2023-10-03 Thread Victor Do Nascimento
Given the implementation of a mechanism of encoding system registers into GCC, this patch provides the mechanism of validating their use by the compiler. In particular, this involves: 1. Ensuring a supplied string corresponds to a known system register name. System registers can be

[PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-03 Thread Victor Do Nascimento
This patch adds the `aarch64-sys-regs.def' file to GCC, teaching the compiler about system registers known to the assembler and how these can be used. The macros used to hold system register information reflect those in use by binutils, a design choice made to facilitate the sharing of data

[PATCH 5/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-03 Thread Victor Do Nascimento
Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const char *special_register); uint64_t __arm_rsr64(const char *special_register); void* __arm_rsrp(const char *special_register); float

[PATCH 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-10-03 Thread Victor Do Nascimento
This patch defines the structure of a new .def file used for representing the aarch64 system registers, what information it should hold and the basic framework in GCC to process this file. Entries in the aarch64-system-regs.def file should be as follows: SYSREG (NAME, CPENC (sn,op1,cn,cm,op2),

[PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-03 Thread Victor Do Nascimento
Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accordingly. Consequently, an rtx such as: (set (reg/i:DI 0 x0) (unspec:DI [(const_string

[PATCH 6/6] aarch64: Add front-end argument type checking for target builtins

2023-10-03 Thread Victor Do Nascimento
In implementing the ACLE read/write system register builtins it was observed that leaving argument type checking to be done at expand-time meant that poorly-formed function calls were being "fixed" by certain optimization passes, meaning bad code wasn't being properly picked up in checking.

[PATCH 0/6] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-10-03 Thread Victor Do Nascimento
/acle/main/acle.html Victor Do Nascimento (6): aarch64: Sync system register information with Binutils aarch64: Add support for aarch64-sys-regs.def aarch64: Implement system register validation tools aarch64: Add basic target_print_operand support for CONST_STRING aarch64: Implement sys

  1   2   >