Re: [PATCH] AArch64: Add ACLE MOPS support
Hi Wilco, On Fri, May 31, 2024 at 6:38 PM Wilco Dijkstra wrote: > Hi Richard, > > > I think this should be in a push_options/pop_options block, as for other > > intrinsics that require certain features. > > But then the intrinsic would always be defined, which is contrary to what > the > ACLE spec demands - it would not give a compilation error at the callsite > but give assembler errors (potentially in different functions after > inlining). > > What was the reason for using an inline asm rather than a builtin? > > Feels a bit old school. :) Using a builtin should mean that the > > RTL optimisers see the extent of the write. > > Given this intrinsic will be used very rarely, if ever, it does not make > sense > to provide anything more than the basic functionality. > I agree that it's unlikely to get much use. IMO we should be moving the arm_acle.h header to be implemented in the #pragma GCC aarch64 "arm_acle.h" at the top as much as possible. So I'd expect handle_arm_acle_h to be extended to inject these definitions when appropriate and during expansion it'd just generate the RTL pattern for it, which needn't be exposed as an implementation-defined builtin. Thanks, Kyrill Cheers, > Wilco
Re: [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax
Hi Tamar, On Wed, 15 May 2024 at 11:28, Tamar Christina wrote: > Hi All, > > This converts the single alternative patterns to the new compact syntax > such > that when I add the new alternatives it's clearer what's being changed. > > Note that this will spew out a bunch of warnings from geninsn as it'll > warn that > @ is useless for a single alternative pattern. These are not fatal so > won't > break the build and are only temporary. > > No change in functionality is expected with this patch. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? Ok. Thanks, Kyrill > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve.md (and3, > @aarch64_pred__z, *3_cc, > *3_ptest, aarch64_pred__z, > *3_cc, *3_ptest, > aarch64_pred__z, *3_cc, > *3_ptest, *cmp_ptest, > @aarch64_pred_cmp_wide, > *aarch64_pred_cmp_wide_cc, > *aarch64_pred_cmp_wide_ptest, > *aarch64_brk_cc, > *aarch64_brk_ptest, @aarch64_brk, *aarch64_brkn_cc, > *aarch64_brkn_ptest, *aarch64_brk_cc, > *aarch64_brk_ptest, aarch64_rdffr_z, > *aarch64_rdffr_z_ptest, > *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): > Convert > to compact syntax. > * config/aarch64/aarch64-sve2.md > (@aarch64_pred_): Likewise. > > --- > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64-sve.md > index > 0434358122d2fde71bd0e0f850338e739e9be02c..839ab0627747d7a49bef7b0192ee9e7a42587ca0 > 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -1156,76 +1156,86 @@ (define_insn "aarch64_rdffr" > > ;; Likewise with zero predication. > (define_insn "aarch64_rdffr_z" > - [(set (match_operand:VNx16BI 0 "register_operand" "=Upa") > + [(set (match_operand:VNx16BI 0 "register_operand") > (and:VNx16BI > (reg:VNx16BI FFRT_REGNUM) > - (match_operand:VNx16BI 1 "register_operand" "Upa")))] > + (match_operand:VNx16BI 1 "register_operand")))] >"TARGET_SVE && TARGET_NON_STREAMING" > - "rdffr\t%0.b, %1/z" > + {@ [ cons: =0, 1 ] > + [ Upa , Upa ] rdffr\t%0.b, %1/z > + } > ) > > ;; Read the FFR to test for a fault, without using the predicate result. > (define_insn "*aarch64_rdffr_z_ptest" >[(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") >(match_dup 1) >(match_operand:SI 2 "aarch64_sve_ptrue_flag") >(and:VNx16BI > (reg:VNx16BI FFRT_REGNUM) > (match_dup 1))] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] >"TARGET_SVE && TARGET_NON_STREAMING" > - "rdffrs\t%0.b, %1/z" > + {@ [ cons: =0, 1 , 2 ] > + [ Upa , Upa, ] rdffrs\t%0.b, %1/z > + } > ) > > ;; Same for unpredicated RDFFR when tested with a known PTRUE. > (define_insn "*aarch64_rdffr_ptest" >[(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") >(match_dup 1) >(const_int SVE_KNOWN_PTRUE) >(reg:VNx16BI FFRT_REGNUM)] > UNSPEC_PTEST)) > - (clobber (match_scratch:VNx16BI 0 "=Upa"))] > + (clobber (match_scratch:VNx16BI 0))] >"TARGET_SVE && TARGET_NON_STREAMING" > - "rdffrs\t%0.b, %1/z" > + {@ [ cons: =0, 1 ] > + [ Upa , Upa ] rdffrs\t%0.b, %1/z > + } > ) > > ;; Read the FFR with zero predication and test the result. > (define_insn "*aarch64_rdffr_z_cc" >[(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") >(match_dup 1) >(match_operand:SI 2 "aarch64_sve_ptrue_flag") >(and:VNx16BI > (reg:VNx16BI FFRT_REGNUM) > (match_dup 1))] > UNSPEC_PTEST)) > - (set (match_operand:VNx16BI 0 "register_operand" "=Upa") > + (set (match_operand:VNx16BI 0 "register_operand") > (and:VNx16BI > (reg:VNx16BI FFRT_REGNUM) > (match_dup 1)))] >"TARGET_SVE && TARGET_NON_STREAMING" > - "rdffrs\t%0.b, %1/z" > + {@ [ cons: =0, 1 , 2 ] > + [ Upa , Upa, ] rdffrs\t%0.b, %1/z > + } > ) > > ;; Same for unpredicated RDFFR when tested with a known PTRUE. > (define_insn "*aarch64_rdffr_cc" >[(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC > - [(match_operand:VNx16BI 1 "register_operand" "Upa") > + [(match_operand:VNx16BI 1 "register_operand") >(match_dup 1) >(const_int SVE_KNOWN_PTRUE) >(reg:VNx16BI FFRT_REGNUM)] > UNSPEC_PTEST)) > - (set (match_operand:VNx16BI 0 "register_operan
Re: [PATCH] AARCH64: Add Qualcomnm oryon-1 core
Hi Andrew, On Fri, May 3, 2024 at 8:50 PM Andrew Pinski wrote: > This patch adds Qualcomm's new oryon-1 core; this is enough > to recongize the core and later on will add the tuning structure. > > gcc/ChangeLog: > > * config/aarch64/aarch64-cores.def (oryon-1): New entry. > * config/aarch64/aarch64-tune.md: Regenerate. > * doc/invoke.texi (AArch64 Options): Document oryon-1. > > Signed-off-by: Andrew Pinski > Co-authored-by: Joel Jones > Co-authored-by: Wei Zhao > --- > gcc/config/aarch64/aarch64-cores.def | 5 + > gcc/config/aarch64/aarch64-tune.md | 2 +- > gcc/doc/invoke.texi | 1 + > 3 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/aarch64/aarch64-cores.def > b/gcc/config/aarch64/aarch64-cores.def > index f69fc212d56..be60929e400 100644 > --- a/gcc/config/aarch64/aarch64-cores.def > +++ b/gcc/config/aarch64/aarch64-cores.def > @@ -151,6 +151,11 @@ AARCH64_CORE("neoverse-512tvb", neoverse512tvb, > cortexa57, V8_4A, (SVE, I8MM, B > /* Qualcomm ('Q') cores. */ > AARCH64_CORE("saphira", saphira,saphira,V8_4A, (CRYPTO), > saphira, 0x51, 0xC01, -1) > > +/* ARMv8.6-A Architecture Processors. */ > + > +/* Qualcomm ('Q') cores. */ > +AARCH64_CORE("oryon-1", oryon1, cortexa57, V8_6A, (CRYPTO, SM4, SHA3, > F16), cortexa72, 0x51, 0x001, -1) > + > /* ARMv8-A big.LITTLE implementations. */ > > AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, > V8A, (CRC), cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1) > diff --git a/gcc/config/aarch64/aarch64-tune.md > b/gcc/config/aarch64/aarch64-tune.md > index abd3c9e0822..ba940f1c890 100644 > --- a/gcc/config/aarch64/aarch64-tune.md > +++ b/gcc/config/aarch64/aarch64-tune.md > @@ -1,5 +1,5 @@ > ;; -*- buffer-read-only: t -*- > ;; Generated automatically by gentune.sh from aarch64-cores.def > (define_attr "tune" > - > > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a" > + > > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a" > (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 9456ced468a..eabe09dc28f 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -21323,6 +21323,7 @@ performance of the code. Permissible values for > this option are: > @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34}, > @samp{cortex-a78}, @samp{cortex-a78ae}, @samp{cortex-a78c}, > @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor}, > +@samp{oyron-1}, Typo in the name. LGTM with that fixed. Thanks, Kyrill > > @samp{neoverse-512tvb}, @samp{neoverse-e1}, @samp{neoverse-n1}, > @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{neoverse-v2}, > @samp{qdf24xx}, > @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan}, > -- > 2.43.0 > >
Re: [PATCH v3][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
Hi Srinath, On 3/10/20 6:19 PM, Srinath Parvathaneni wrote: Hello Kyrill, This patch addresses all the comments in patch version v2. (version v2) https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540417.html Hello, This patch is part of MVE ACLE intrinsics framework. The patch supports the use of emulation for the single-precision arithmetic operations for MVE. This changes are to support the MVE ACLE intrinsics which operates on vector floating point arithmetic operations. Please refer to Arm reference manual [1] for more details. [1] https://developer.arm.com/docs/ddi0553/latest Regression tested on target arm-none-eabi and armeb-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2020-03-06 Andre Vieira Srinath Parvathaneni * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify function to add emulator calls for dobule precision arithmetic operations for MVE. 2020-03-06 Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/mve_libcall1.c: New test. * gcc.target/arm/mve/intrinsics/mve_libcall2.c: Likewise. ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c28a475629c7fbad48730beed5550e0cffdf2e1b..40db35a2a8b6dedb4f536b4995e80c8b9a38b588 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -5754,9 +5754,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall) /* Values from double-precision helper functions are returned in core registers if the selected core only supports single-precision arithmetic, even if we are using the hard-float ABI. The same is - true for single-precision helpers, but we will never be using the - hard-float ABI on a CPU which doesn't support single-precision - operations in hardware. */ + true for single-precision helpers except in case of MVE, because in + MVE we will be using the hard-float ABI on a CPU which doesn't support + single-precision operations in hardware. In MVE the following check + enables use of emulation for the single-precision arithmetic + operations. */ + if (TARGET_HAVE_MVE) + { + add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode)); + } add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode)); add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode)); add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode)); diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c new file mode 100644 index ..f89301228c577291fc3095420df1937e1a0c7104 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -mthumb -mfpu=auto" } */ + +float +foo (float a, float b, float c) +{ + return a + b + c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fadd" } } */ +/* { dg-final { scan-assembler-times "bl\\t__aeabi_fadd" 2 } } */ What is the point of repeating the scan-assembler directives here? The first scan-assembler should be redundant given the scan-assembler-times ? Otherwise ok. Thanks, Kyrill + +float +foo1 (float a, float b, float c) +{ + return a - b - c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fsub" } } */ +/* { dg-final { scan-assembler-times "bl\\t__aeabi_fsub" 2 } } */ + +float +foo2 (float a, float b, float c) +{ + return a * b * c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fmul" } } */ +/* { dg-final { scan-assembler-times "bl\\t__aeabi_fmul" 2 } } */ + +float +foo3 (float b, float c) +{ + return b / c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fdiv" } } */ + +int +foo4 (float b, float c) +{ + return b < c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fcmplt" } } */ + +int +foo5 (float b, float c) +{ + return b > c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fcmpgt" } } */ + +int +foo6 (flo
Re: [PATCH v3][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
Hi Srinath, On 3/10/20 6:19 PM, Srinath Parvathaneni wrote: Hello Kyrill, This patch addresses all the comments in patch version v2. (version v2) https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540416.html Hello, This patch is part of MVE ACLE intrinsics framework. This patches add support to update (read/write) the APSR (Application Program Status Register) register and FPSCR (Floating-point Status and Control Register) register for MVE. This patch also enables thumb2 mov RTL patterns for MVE. A new feature bit vfp_base is added. This bit is enabled for all VFP, MVE and MVE with floating point extensions. This bit is used to enable the macro TARGET_VFP_BASE. For all the VFP instructions, RTL patterns, status and control registers are guarded by TARGET_HAVE_FLOAT. But this patch modifies that and the common instructions, RTL patterns, status and control registers bewteen MVE and VFP are guarded by TARGET_VFP_BASE macro. The RTL pattern set_fpscr and get_fpscr are updated to use VFPCC_REGNUM because few MVE intrinsics set/get carry bit of FPSCR register. Please refer to Arm reference manual [1] for more details. [1] https://developer.arm.com/docs/ddi0553/latest Regression tested on target arm-none-eabi and armeb-none-eabi and found no regressions. Ok for trunk? Ok, but make sure it bootstraps on arm-none-linux-gnueabihf (as with the other patches in this series) Thanks, Kyrill Thanks, Srinath gcc/ChangeLog: 2020-03-06 Andre Vieira Mihail Ionescu Srinath Parvathaneni * common/config/arm/arm-common.c (arm_asm_auto_mfpu): When vfp_base feature bit is on and -mfpu=auto is passed as compiler option, do not generate error on not finding any match fpu. Because in this case fpu is not required. * config/arm/arm-cpus.in (vfp_base): Define feature bit, this bit is enabled for MVE and also for all VFP extensions. (VFPv2): Modify fgroup to enable vfp_base feature bit when ever VFPv2 is enabled. (MVE): Define fgroup to enable feature bits mve, vfp_base and armv7em. (MVE_FP): Define fgroup to enable feature bits is fgroup MVE and FPv5 along with feature bits mve_float. (mve): Modify add options in armv8.1-m.main arch for MVE. (mve.fp): Modify add options in armv8.1-m.main arch for MVE with floating point. * config/arm/arm.c (use_return_insn): Replace the check with TARGET_VFP_BASE. (thumb2_legitimate_index_p): Replace TARGET_HARD_FLOAT with TARGET_VFP_BASE. (arm_rtx_costs_internal): Replace "TARGET_HARD_FLOAT || TARGET_HAVE_MVE" with TARGET_VFP_BASE, to allow cost calculations for copies in MVE as well. (arm_get_vfp_saved_size): Replace TARGET_HARD_FLOAT with TARGET_VFP_BASE, to allow space calculation for VFP registers in MVE as well. (arm_compute_frame_layout): Likewise. (arm_save_coproc_regs): Likewise. (arm_fixed_condition_code_regs): Modify to enable using VFPCC_REGNUM in MVE as well. (arm_hard_regno_mode_ok): Replace "TARGET_HARD_FLOAT || TARGET_HAVE_MVE" with equivalent macro TARGET_VFP_BASE. (arm_expand_epilogue_apcs_frame): Likewise. (arm_expand_epilogue): Likewise. (arm_conditional_register_usage): Likewise. (arm_declare_function_name): Add check to skip printing .fpu directive in assembly file when TARGET_VFP_BASE is enabled and fpu_to_print is "softvfp". * config/arm/arm.h (TARGET_VFP_BASE): Define. * config/arm/arm.md (arch): Add "mve" to arch. (eq_attr "arch" "mve"): Enable on TARGET_HAVE_MVE is true. (vfp_pop_multiple_with_writeback): Replace "TARGET_HARD_FLOAT || TARGET_HAVE_MVE" with equivalent macro TARGET_VFP_BASE. * config/arm/constraints.md (Uf): Define to allow modification to FPCCR in MVE. * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Modify target guard to not allow for MVE. * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Move to volatile unspecs enum. (VUNSPEC_GET_FPSCR): Define. * config/arm/vfp.md (thumb2_movhi_vfp): Add support for VMSR and VMRS instructions which move to general-purpose Register from Floating-point Special register and vice-versa. (thumb2_movhi_fp16): Likewise. (thumb2_movsi_vfp): Add support for VMSR and VMRS instructions along with MCR and MRC instructions which set and get Floating-point Status and Control Register (FPSCR). (movdi_vfp): Modify pattern to enable Single-precision scalar float move in MVE. (thumb2_movdf_vfp): Modify pattern to enable Double-precision scalar float move patterns in MVE. (thumb2_movsfcc_vfp): Modify pattern to enable single float conditional code
Re: [PATCH v3][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
Hi Srinath, On 3/10/20 6:19 PM, Srinath Parvathaneni wrote: Hello Kyrill, This patch addresses all the comments in patch version v2. (version v2) https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html Hello, This patch creates the required framework for MVE ACLE intrinsics. The following changes are done in this patch to support MVE ACLE intrinsics. Header file arm_mve.h is added to source code, which contains the definitions of MVE ACLE intrinsics and different data types used in MVE. Machine description file mve.md is also added which contains the RTL patterns defined for MVE. A new reigster "p0" is added which is used in by MVE predicated patterns. A new register class "VPR_REG" is added and its contents are defined in REG_CLASS_CONTENTS. The vec-common.md file is modified to support the standard move patterns. The prefix of neon functions which are also used by MVE is changed from "neon_" to "simd_". eg: neon_immediate_valid_for_move changed to simd_immediate_valid_for_move. In the patch standard patterns mve_move, mve_store and move_load for MVE are added and neon.md and vfp.md files are modified to support this common patterns. Please refer to Arm reference manual [1] for more details. [1] https://developer.arm.com/docs/ddi0553/latest Regression tested on target arm-none-eabi and armeb-none-eabi and found no regressions. Ok for trunk? This is ok but please bootstrap it on arm-none-linux-gnueabihf as well. Thanks, Kyrill Thanks, Srinath gcc/ChangeLog: 2020-03-06 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config.gcc (arm_mve.h): Include mve intrinsics header file. * config/arm/aout.h (p0): Add new register name for MVE predicated cases. * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define macro common to Neon and MVE. (ARM_BUILTIN_NEON_LANE_CHECK): Renamed to ARM_BUILTIN_SIMD_LANE_CHECK. (arm_init_simd_builtin_types): Disable poly types for MVE. (arm_init_neon_builtins): Move a check to arm_init_builtins function. (arm_init_builtins): Use ARM_BUILTIN_SIMD_LANE_CHECK instead of ARM_BUILTIN_NEON_LANE_CHECK. (mve_dereference_pointer): Add function. (arm_expand_builtin_args): Call to mve_dereference_pointer when MVE is enabled. (arm_expand_neon_builtin): Moved to arm_expand_builtin function. (arm_expand_builtin): Moved from arm_expand_neon_builtin function. * config/arm/arm-c.c (__ARM_FEATURE_MVE): Define macro for MVE and MVE with floating point enabled. * config/arm/arm-protos.h (neon_immediate_valid_for_move): Renamed to simd_immediate_valid_for_move. (simd_immediate_valid_for_move): Renamed from neon_immediate_valid_for_move function. * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Generate error if vfpv2 feature bit is disabled and mve feature bit is also disabled for HARD_FLOAT_ABI. (use_return_insn): Check to not push VFP regs for MVE. (aapcs_vfp_allocate): Add MVE check to have same Procedure Call Standard as Neon. (aapcs_vfp_allocate_return_reg): Likewise. (thumb2_legitimate_address_p): Check to return 0 on valid Thumb-2 address operand for MVE. (arm_rtx_costs_internal): MVE check to determine cost of rtx. (neon_valid_immediate): Rename to simd_valid_immediate. (simd_valid_immediate): Rename from neon_valid_immediate. (simd_valid_immediate): MVE check on size of vector is 128 bits. (neon_immediate_valid_for_move): Rename to simd_immediate_valid_for_move. (simd_immediate_valid_for_move): Rename from neon_immediate_valid_for_move. (neon_immediate_valid_for_logic): Modify call to neon_valid_immediate function. (neon_make_constant): Modify call to neon_valid_immediate function. (neon_vector_mem_operand): Return VFP register for POST_INC or PRE_DEC for MVE. (output_move_neon): Add MVE check to generate vldm/vstm instrcutions. (arm_compute_frame_layout): Calculate space for saved VFP registers for MVE. (arm_save_coproc_regs): Save coproc registers for MVE. (arm_print_operand): Add case 'E' to print memory operands for MVE. (arm_print_operand_address): Check to print register number for MVE. (arm_hard_regno_mode_ok): Check for arm hard regno mode ok for MVE. (arm_modes_tieable_p): Check to allow structure mode for MVE. (arm_regno_class): Add VPR_REGNUM check. (arm_expand_epilogue_apcs_frame): MVE check to calculate epilogue code for APCS frame. (arm_expand_epilogue): MVE check for enabling pop instructions in epilogue. (arm_print_asm_arch_directives): Modify function to disable print of .arch_exten
Re: [GCC][Patch]Bug fix: cannot convert 'const short int*' to 'const __bf16*'
On 3/11/20 5:59 PM, Kyrill Tkachov wrote: Hi Delia, On 3/11/20 5:49 PM, Delia Burduv wrote: This patch fixes a bug introduced by my earlier patch ( https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541680.html ). It introduces a new scalar builtin type that was missing in the original patch. Bootstrapped cleanly on arm-none-linux-gnueabihf. Tested for regression on arm-none-linux-gnueabihf. No regression from before the original patch. Tests that failed or became unsupported because of the original tests now work as they did before it. gcc/ChangeLog: 2020-03-11 Delia Burduv * config/arm/arm-builtins.c (arm_init_simd_builtin_scalar_types): New * config/arm/arm_neon.h (vld2_bf16): Used new builtin type (vld2q_bf16): Used new builtin type (vld3_bf16): Used new builtin type (vld3q_bf16): Used new builtin type (vld4_bf16): Used new builtin type (vld4q_bf16): Used new builtin type (vld2_dup_bf16): Used new builtin type (vld2q_dup_bf16): Used new builtin type (vld3_dup_bf16): Used new builtin type (vld3q_dup_bf16): Used new builtin type (vld4_dup_bf16): Used new builtin type (vld4q_dup_bf16): Used new builtin type ChangeLog entries should have a full stop after each entry. The patch is ok. Thanks for the quick fix, To be clear, I've pushed it to master with a fixed ChangeLog as 1c43ee69f4f6148fff4b5ace80d709d7f8b250d7 Kyrill Kyrill
Re: [GCC][Patch]Bug fix: cannot convert 'const short int*' to 'const __bf16*'
Hi Delia, On 3/11/20 5:49 PM, Delia Burduv wrote: This patch fixes a bug introduced by my earlier patch ( https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541680.html ). It introduces a new scalar builtin type that was missing in the original patch. Bootstrapped cleanly on arm-none-linux-gnueabihf. Tested for regression on arm-none-linux-gnueabihf. No regression from before the original patch. Tests that failed or became unsupported because of the original tests now work as they did before it. gcc/ChangeLog: 2020-03-11 Delia Burduv * config/arm/arm-builtins.c (arm_init_simd_builtin_scalar_types): New * config/arm/arm_neon.h (vld2_bf16): Used new builtin type (vld2q_bf16): Used new builtin type (vld3_bf16): Used new builtin type (vld3q_bf16): Used new builtin type (vld4_bf16): Used new builtin type (vld4q_bf16): Used new builtin type (vld2_dup_bf16): Used new builtin type (vld2q_dup_bf16): Used new builtin type (vld3_dup_bf16): Used new builtin type (vld3q_dup_bf16): Used new builtin type (vld4_dup_bf16): Used new builtin type (vld4q_dup_bf16): Used new builtin type ChangeLog entries should have a full stop after each entry. The patch is ok. Thanks for the quick fix, Kyrill
Re: [AArch64] Backporting -moutline-atomics to gcc 9.x and 8.x
ch64/atomic.md (atomic_): Fully expand LSE operations here. (atomic_fetch_): Likewise. (atomic__fetch): Likewise. (aarch64_atomic__lse): Drop atomic_op iterator and use ATOMIC_LDOP instead; use register_operand for the input; drop the split and emit insns directly. (aarch64_atomic_fetch__lse): Likewise. (aarch64_atomic__fetch_lse): Remove. (@aarch64_atomic_load): Remove. From-SVN: r265660 From 53de1ea800db54b47290d578c43892799b66c8dc Mon Sep 17 00:00:00 2001 From: Richard Henderson Date: Wed, 31 Oct 2018 23:11:22 + Subject: [PATCH] aarch64: Remove early clobber from ATOMIC_LDOP scratch * config/aarch64/atomics.md (aarch64_atomic__lse): The scratch register need not be early-clobber. Document the reason why we cannot use ST. From-SVN: r265703 On 2/27/20, 12:06 PM, "Kyrill Tkachov" wrote: Hi Sebastian, On 2/27/20 4:53 PM, Pop, Sebastian wrote: > > Hi, > > is somebody already working on backporting -moutline-atomics to gcc > 8.x and 9.x branches? > I'm not aware of such work going on. Thanks, Kyrill > Thanks, > > Sebastian >
Re: [PATCH] aarch64: Fix ICE in aarch64_add_offset_1 [PR94121]
Hi Jakub, On 3/11/20 7:22 AM, Jakub Jelinek wrote: Hi! abs_hwi asserts that the argument is not HOST_WIDE_INT_MIN and as the (invalid) testcase shows, the function can be called with such an offset. The following patch is IMHO minimal fix, absu_hwi unlike abs_hwi allows even that value and will return (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MIN in that case. The function then uses moffset in two spots which wouldn't care if the value is (unsigned HOST_WIDE_INT) HOST_WIDE_INT_MIN or HOST_WIDE_INT_MIN and wouldn't accept it (!moffset and aarch64_uimm12_shift (moffset)), then in one spot where the signedness of moffset does matter and using unsigned is the right thing - moffset < 0x100 - and finally has code which will handle even this value right; the assembler doesn't really care for DImode immediates if mov x1, -9223372036854775808 or mov x1, 9223372036854775808 is used and similarly it doesn't matter if we add or sub it in DImode. Bootstrapped/regtested on aarch64-linux, ok for trunk? Ok. Thanks, Kyrill 2020-03-10 Jakub Jelinek PR target/94121 * config/aarch64/aarch64.c (aarch64_add_offset_1): Use absu_hwi instead of abs_hwi, change moffset type to unsigned HOST_WIDE_INT. * gcc.dg/pr94121.c: New test. --- gcc/config/aarch64/aarch64.c.jj 2020-02-28 17:33:03.414258503 +0100 +++ gcc/config/aarch64/aarch64.c 2020-03-10 17:01:39.435302124 +0100 @@ -3713,7 +3713,7 @@ aarch64_add_offset_1 (scalar_int_mode mo gcc_assert (emit_move_imm || temp1 != NULL_RTX); gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, src)); - HOST_WIDE_INT moffset = abs_hwi (offset); + unsigned HOST_WIDE_INT moffset = absu_hwi (offset); rtx_insn *insn; if (!moffset) --- gcc/testsuite/gcc.dg/pr94121.c.jj 2020-03-10 16:58:40.246974306 +0100 +++ gcc/testsuite/gcc.dg/pr94121.c 2020-03-10 16:58:40.246974306 +0100 @@ -0,0 +1,16 @@ +/* PR target/94121 */ +/* { dg-do compile { target pie } } */ +/* { dg-options "-O2 -fpie -w" } */ + +#define DIFF_MAX __PTRDIFF_MAX__ +#define DIFF_MIN (-DIFF_MAX - 1) + +extern void foo (char *); +extern char v[]; + +void +bar (void) +{ + char *p = v; + foo (&p[DIFF_MIN]); +} Jakub
[PATCH][AArch64][SVE] Add missing movprfx attribute to some ternary arithmetic patterns
Hi all, The two affected SVE2 patterns in this patch output a movprfx'ed instruction in their second alternative but don't set the "movprfx" attribute, which will result in the wrong instruction length being assumed by the midend. This patch fixes that in the same way as the other SVE patterns in the backend. Bootstrapped and tested on aarch64-none-linux-gnu. Committing to trunk. Thanks, Kyrill 2020-03-06 Kyrylo Tkachov * config/aarch64/aarch64-sve2.md (@aarch64_sve_: Specify movprfx attribute. (@aarch64_sve__lane_): Likewise. commit b9694320e1bfbfc92255b30cc108a81a243770c6 Author: Kyrylo Tkachov Date: Fri Mar 6 15:26:20 2020 + [AArch64] Add movprfx attribute to a couple of SVE2 patterns diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index f82e60e25c7..e18b9fef16e 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -690,6 +690,7 @@ "@ \t%0., %2., %3. movprfx\t%0, %1\;\t%0., %2., %3." + [(set_attr "movprfx" "*,yes")] ) (define_insn "@aarch64_sve__lane_" @@ -706,6 +707,7 @@ "@ \t%0., %2., %3.[%4] movprfx\t%0, %1\;\t%0., %2., %3.[%4]" + [(set_attr "movprfx" "*,yes")] ) ;; -
Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Hi Delia, On 3/5/20 4:38 PM, Delia Burduv wrote: Hi, This is the latest version of the patch. I am forcing -mfloat-abi=hard because the code generated is slightly differently depending on the float-abi used. Thanks, I've pushed it with an updated ChangeLog. 2020-03-06 Delia Burduv * config/arm/arm_neon.h (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF2): New iterator. *config/arm/neon.md (neon_vld2): Use new iterators. (neon_vld2_dup): Likewise. (neon_vld3qa): Likewise. (neon_vld3qb): Likewise. (neon_vld3_dup): Likewise. (neon_vld4): Likewise. (neon_vld4qa): Likewise. (neon_vld4qb): Likewise. (neon_vld4_dup): Likewise. (neon_vld2_dupv8bf): New. (neon_vld3_dupv8bf): Likewise. (neon_vld4_dupv8bf): Likewise. Kyrill Thanks, Delia On 3/4/20 5:20 PM, Kyrill Tkachov wrote: Hi Delia, On 3/4/20 2:05 PM, Delia Burduv wrote: Hi, The previous version of this patch shared part of its code with the store intrinsics patch (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed any duplicated code. This patch now depends on the previously mentioned store intrinsics patch. Here is the latest version and the updated ChangeLog. gcc/ChangeLog: 2019-03-04 Delia Burduv * config/arm/arm_neon.h (bfloat16_t): New typedef. (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF): New iterator. (VQ2BF): New iterator. *config/arm/neon.md (vld2): Used new iterators. (vld2_dup): Used new iterators. (vld2_dupv8bf): New. (vst3): Used new iterators. (vst3qa): Used new iterators. (vst3qb): Used new iterators. (vld3_dup): Used new iterators. (vld3_dupv8bf): New. (vst4): Used new iterators. (vst4qa): Used new iterators. (vst4qb): Used new iterators. (vld4_dup): Used new iterators. (vld4_dupv8bf): New. gcc/testsuite/ChangeLog: 2019-03-04 Delia Burduv * gcc.target/arm/simd/bf16_vldn_1.c: New test. Thanks, Delia On 2/19/20 5:25 PM, Delia Burduv wrote: > > Hi, > > Here is the latest version of the patch. It just has some minor > formatting changes that were brought up by Richard Sandiford in the > AArch64 patches > > Thanks, > Delia > > On 1/22/20 5:31 PM, Delia Burduv wrote: >> Ping. >> >> I will change the tests to use the exact input and output registers as >> Richard Sandiford suggested for the AArch64 patches. >> >> On 12/20/19 6:48 PM, Delia Burduv wrote: >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics >>> vld{q}_bf16 as part of the BFloat16 extension. >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >>> >>> The intrinsics are declared in arm_neon.h . >>> A new test is added to check assembler output. >>> >>> This patch depends on the Arm back-end patche. >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >>> have commit rights, so if this is ok can someone please commit it for >>> me? >>> >>> gcc/ChangeLog: >>> >>> 2019-11-14 Delia Burduv >>> >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. >>> (bfloat16x4x2_t): New typedef. >>> (bfloat16x8x2_t): New typedef. >>> (bfloat16x4x3_t): New typedef. >>>
Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32
Hi Delia, On 3/5/20 3:53 PM, Delia Burduv wrote: Hi, This is the latest version of the patch. I am forcing -mfloat-abi=hard because the register allocator behaves differently depending on the float-abi used. Thanks, I've pushed it to master with an updated ChangeLog reflecting the recent changes. In the future, please send an updated ChangeLog whenever something changes in the patches. Thanks again! Kyrill 2020-03-06 Delia Burduv * config/arm/arm_neon.h (bfloat16x4x2_t): New typedef. (bfloat16x8x2_t): New typedef. (bfloat16x4x3_t): New typedef. (bfloat16x8x3_t): New typedef. (bfloat16x4x4_t): New typedef. (bfloat16x8x4_t): New typedef. (vst2_bf16): New. (vst2q_bf16): New. (vst3_bf16): New. (vst3q_bf16): New. (vst4_bf16): New. (vst4q_bf16): New. * config/arm/arm-builtins.c (v2bf_UP): Define. (VAR13): New. (arm_init_simd_builtin_types): Init Bfloat16x2_t eltype. * config/arm/arm-modes.def (V2BF): New mode. * config/arm/arm-simd-builtin-types.def (Bfloat16x2_t): New entry. * config/arm/arm_neon_builtins.def (vst2): Changed to VAR13 and added v4bf, v8bf (vst3): Changed to VAR13 and added v4bf, v8bf (vst4): Changed to VAR13 and added v4bf, v8bf * config/arm/iterators.md (VDXBF): New iterator. (VQ2BF): New iterator. *config/arm/neon.md (neon_vst2): Used new iterators. (neon_vst2): Used new iterators. (neon_vst3): Used new iterators. (neon_vst3): Used new iterators. (neon_vst3qa): Used new iterators. (neon_vst3qb): Used new iterators. (neon_vst4): Used new iterators. (neon_vst4): Used new iterators. (neon_vst4qa): Used new iterators. (neon_vst4qb): Used new iterators. Thanks, Delia On 3/4/20 5:20 PM, Kyrill Tkachov wrote: Hi Delia, On 3/3/20 5:23 PM, Delia Burduv wrote: Hi, I noticed that the patch doesn't apply cleanly. I fixed it and this is the latest version. Thanks, Delia On 3/3/20 4:23 PM, Delia Burduv wrote: Sorry, I forgot the attachment. On 3/3/20 4:20 PM, Delia Burduv wrote: Hi, I made a mistake in the previous patch. This is the latest version. Please let me know if it is ok. Thanks, Delia On 2/21/20 3:18 PM, Delia Burduv wrote: Hi Kyrill, The arm_bf16.h is only used for scalar operations. That is how the aarch64 versions are implemented too. Thanks, Delia On 2/21/20 2:06 PM, Kyrill Tkachov wrote: Hi Delia, On 2/19/20 5:25 PM, Delia Burduv wrote: Hi, Here is the latest version of the patch. It just has some minor formatting changes that were brought up by Richard Sandiford in the AArch64 patches Thanks, Delia On 1/22/20 5:29 PM, Delia Burduv wrote: > Ping. > > I will change the tests to use the exact input and output registers as > Richard Sandiford suggested for the AArch64 patches. > > On 12/20/19 6:46 PM, Delia Burduv wrote: >> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics >> vst{q}_bf16 as part of the BFloat16 extension. >> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >> >> The intrinsics are declared in arm_neon.h . >> A new test is added to check assembler output. >> >> This patch depends on the Arm back-end patche. >> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >> >> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >> have commit rights, so if this is ok can someone please commit it for me? >> >> gcc/ChangeLog: >> >> 2019-11-14 Delia Burduv >> >> * config/arm/arm_neon.h (bfloat16_t): New typedef. >> (bfloat16x4x2_t): New typedef. >> (bfloat16x8x2_t): New typedef. >> (bfloat16x4x3_t): New typedef. >> (bfloat16x8x3_t): New typedef. >> (bfloat16x4x4_t): New typedef. >> (bfloat16x8x4_t): New typedef. >> (vst2_bf16): New. >> (vst2q_bf16): New. >> (vst3_bf16): New. >> (vst3q_bf16): New. >> (vst4_bf16): New. >> (vst4q_bf16): New. >> * config/arm/arm-builtins.c (E_V2BFmode): New mode. >> (VAR13): New. >> (arm_simd_types[Bfloat16x2_t]):New type. >> * config/arm/arm-modes.def (V2BF): New mode. >> * config/arm/arm-simd-builtin-types.def >> (Bfloat16x2_t): New entry. >> * config/arm/arm_neon_builtins.def >> (vst2): Changed to VAR13 and added v4bf, v8bf >> (vst3): Changed to VAR13 and added v4bf, v8bf >> (vst4): Changed to VAR13 and added v4bf, v8bf >> * config/arm/iterators.md (VDXBF): New iterator. >> (VQ2BF): New iterator. >> (V_elem): Added V4BF, V8BF. >> (V_sz_elem): Added V4BF, V8BF. >> (V_mode_nu
Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD
On 3/5/20 11:22 AM, Kyrill Tkachov wrote: Hi Delia, On 3/4/20 5:20 PM, Delia Burduv wrote: Hi, This is the latest version of the patch. Thanks, Delia On 2/21/20 11:41 AM, Kyrill Tkachov wrote: Hi Delia, On 2/19/20 5:23 PM, Delia Burduv wrote: Hi, Here is the latest version of the patch. It just has some minor formatting changes that were brought up by Richard Sandiford in the AArch64 patches Thanks, Delia On 1/31/20 3:23 PM, Delia Burduv wrote: Here is the updated patch. The changes are minor, so let me know if there is anything else to fix or if it can be committed. Thank you, Delia On 1/30/20 2:55 PM, Kyrill Tkachov wrote: Hi Delia, On 1/28/20 4:44 PM, Delia Burduv wrote: Ping. *From:* Delia Burduv *Sent:* 22 January 2020 17:26 *To:* gcc-patches@gcc.gnu.org *Cc:* ni...@redhat.com ; Richard Earnshaw ; Ramana Radhakrishnan ; Kyrylo Tkachov *Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD Ping. I have read Richard Sandiford's comments on the AArch64 patches and I will apply what is relevant to this patch as well. Particularly, I will change the tests to use the exact input and output registers and I will change the types of the rtl patterns. Please send the updated patches so that someone can commit them for you once they're reviewed. Thanks, Kyrill On 12/20/19 6:44 PM, Delia Burduv wrote: > This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and vfmat > as part of the BFloat16 extension. > (https://developer.arm.com/docs/101028/latest.) > The intrinsics are declared in arm_neon.h and the RTL patterns are > defined in neon.md. > Two new tests are added to check assembler output and lane indices. > > This patch depends on the Arm back-end patche. > (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) > > Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have > commit rights, so if this is ok can someone please commit it for me? > > gcc/ChangeLog: > > 2019-11-12� Delia Burduv > >� ����* config/arm/arm_neon.h (vbfmmlaq_f32): New. >� ����� (vbfmlalbq_f32): New. >� ����� (vbfmlaltq_f32): New. >� ����� (vbfmlalbq_lane_f32): New. >� ����� (vbfmlaltq_lane_f32): New. >� ������� (vbfmlalbq_laneq_f32): New. >� ����� (vbfmlaltq_laneq_f32): New. >� ����* config/arm/arm_neon_builtins.def (vbfmmla): New. >� ��������� (vbfmab): New. >� ��������� (vbfmat): New. >� ��������� (vbfmab_lane): New. >� ��������� (vbfmat_lane): New. >� ��������� (vbfmab_laneq): New. >� ��������� (vbfmat_laneq): New. >� ���� * config/arm/iterators.md (BF_MA): New int iterator. >� ��������� (bt): New int attribute. >� ��������� (VQXBF): Copy of VQX with V8BF. >� ��������� (V_HALF): Added V8BF. >� ����� * config/arm/neon.md (neon_vbfmmlav8hi): New insn. >� ��������� (neon_vbfmav8hi): New insn. >� ��������� (neon_vbfma_lanev8hi): New insn. >� ��������� (neon_vbfma_laneqv8hi): New expand. >� ��������� (neon_vget_high): Changed iterator to VQXBF. >� ����* config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC. >� ��������� (UNSPEC_BFMAB): New UNSPEC. >� ��������� (UNSPEC_BFMAT): New UNSPEC. > > 2019-11-12� Delia Burduv > >� ������� * gcc.target/arm/simd/bf16_ma_1.c: New test. >� ������� * gcc.target/arm/simd/bf16_ma_2.c: New test. >� ������� * gcc.target/arm/simd/bf16_mmla_1.c: New test. This looks good, a few minor things though... diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3c78f435009ab027f92693d00ab5b40960d5419d..81f8008ea6a5fb11eb09f6685ba24bb0c54fb248 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18742,6 +18742,64 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b, �� return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); �} +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+bf16") + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ +� return __builtin_neon_vbfmmlav8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_f32 (f
Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD
Hi Delia, On 3/4/20 5:20 PM, Delia Burduv wrote: Hi, This is the latest version of the patch. Thanks, Delia On 2/21/20 11:41 AM, Kyrill Tkachov wrote: Hi Delia, On 2/19/20 5:23 PM, Delia Burduv wrote: Hi, Here is the latest version of the patch. It just has some minor formatting changes that were brought up by Richard Sandiford in the AArch64 patches Thanks, Delia On 1/31/20 3:23 PM, Delia Burduv wrote: Here is the updated patch. The changes are minor, so let me know if there is anything else to fix or if it can be committed. Thank you, Delia On 1/30/20 2:55 PM, Kyrill Tkachov wrote: Hi Delia, On 1/28/20 4:44 PM, Delia Burduv wrote: Ping. *From:* Delia Burduv *Sent:* 22 January 2020 17:26 *To:* gcc-patches@gcc.gnu.org *Cc:* ni...@redhat.com ; Richard Earnshaw ; Ramana Radhakrishnan ; Kyrylo Tkachov *Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD Ping. I have read Richard Sandiford's comments on the AArch64 patches and I will apply what is relevant to this patch as well. Particularly, I will change the tests to use the exact input and output registers and I will change the types of the rtl patterns. Please send the updated patches so that someone can commit them for you once they're reviewed. Thanks, Kyrill On 12/20/19 6:44 PM, Delia Burduv wrote: > This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and vfmat > as part of the BFloat16 extension. > (https://developer.arm.com/docs/101028/latest.) > The intrinsics are declared in arm_neon.h and the RTL patterns are > defined in neon.md. > Two new tests are added to check assembler output and lane indices. > > This patch depends on the Arm back-end patche. > (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) > > Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have > commit rights, so if this is ok can someone please commit it for me? > > gcc/ChangeLog: > > 2019-11-12� Delia Burduv > >� ����* config/arm/arm_neon.h (vbfmmlaq_f32): New. >� ����� (vbfmlalbq_f32): New. >� ����� (vbfmlaltq_f32): New. >� ����� (vbfmlalbq_lane_f32): New. >� ����� (vbfmlaltq_lane_f32): New. >� ������� (vbfmlalbq_laneq_f32): New. >� ����� (vbfmlaltq_laneq_f32): New. >� ����* config/arm/arm_neon_builtins.def (vbfmmla): New. >� ��������� (vbfmab): New. >� ��������� (vbfmat): New. >� ��������� (vbfmab_lane): New. >� ��������� (vbfmat_lane): New. >� ��������� (vbfmab_laneq): New. >� ��������� (vbfmat_laneq): New. >� ���� * config/arm/iterators.md (BF_MA): New int iterator. >� ��������� (bt): New int attribute. >� ��������� (VQXBF): Copy of VQX with V8BF. >� ��������� (V_HALF): Added V8BF. >� ����� * config/arm/neon.md (neon_vbfmmlav8hi): New insn. >� ��������� (neon_vbfmav8hi): New insn. >� ��������� (neon_vbfma_lanev8hi): New insn. >� ��������� (neon_vbfma_laneqv8hi): New expand. >� ��������� (neon_vget_high): Changed iterator to VQXBF. >� ����* config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC. >� ��������� (UNSPEC_BFMAB): New UNSPEC. >� ��������� (UNSPEC_BFMAT): New UNSPEC. > > 2019-11-12� Delia Burduv > >� ������� * gcc.target/arm/simd/bf16_ma_1.c: New test. >� ������� * gcc.target/arm/simd/bf16_ma_2.c: New test. >� ������� * gcc.target/arm/simd/bf16_mmla_1.c: New test. This looks good, a few minor things though... diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3c78f435009ab027f92693d00ab5b40960d5419d..81f8008ea6a5fb11eb09f6685ba24bb0c54fb248 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18742,6 +18742,64 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b, �� return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); �} +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+bf16") + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ +� return __builtin_neon_vbfmmlav8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32
Hi Delia, On 3/3/20 5:23 PM, Delia Burduv wrote: Hi, I noticed that the patch doesn't apply cleanly. I fixed it and this is the latest version. Thanks, Delia On 3/3/20 4:23 PM, Delia Burduv wrote: Sorry, I forgot the attachment. On 3/3/20 4:20 PM, Delia Burduv wrote: Hi, I made a mistake in the previous patch. This is the latest version. Please let me know if it is ok. Thanks, Delia On 2/21/20 3:18 PM, Delia Burduv wrote: Hi Kyrill, The arm_bf16.h is only used for scalar operations. That is how the aarch64 versions are implemented too. Thanks, Delia On 2/21/20 2:06 PM, Kyrill Tkachov wrote: Hi Delia, On 2/19/20 5:25 PM, Delia Burduv wrote: Hi, Here is the latest version of the patch. It just has some minor formatting changes that were brought up by Richard Sandiford in the AArch64 patches Thanks, Delia On 1/22/20 5:29 PM, Delia Burduv wrote: > Ping. > > I will change the tests to use the exact input and output registers as > Richard Sandiford suggested for the AArch64 patches. > > On 12/20/19 6:46 PM, Delia Burduv wrote: >> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics >> vst{q}_bf16 as part of the BFloat16 extension. >> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >> >> The intrinsics are declared in arm_neon.h . >> A new test is added to check assembler output. >> >> This patch depends on the Arm back-end patche. >> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >> >> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >> have commit rights, so if this is ok can someone please commit it for me? >> >> gcc/ChangeLog: >> >> 2019-11-14 Delia Burduv >> >> * config/arm/arm_neon.h (bfloat16_t): New typedef. >> (bfloat16x4x2_t): New typedef. >> (bfloat16x8x2_t): New typedef. >> (bfloat16x4x3_t): New typedef. >> (bfloat16x8x3_t): New typedef. >> (bfloat16x4x4_t): New typedef. >> (bfloat16x8x4_t): New typedef. >> (vst2_bf16): New. >> (vst2q_bf16): New. >> (vst3_bf16): New. >> (vst3q_bf16): New. >> (vst4_bf16): New. >> (vst4q_bf16): New. >> * config/arm/arm-builtins.c (E_V2BFmode): New mode. >> (VAR13): New. >> (arm_simd_types[Bfloat16x2_t]):New type. >> * config/arm/arm-modes.def (V2BF): New mode. >> * config/arm/arm-simd-builtin-types.def >> (Bfloat16x2_t): New entry. >> * config/arm/arm_neon_builtins.def >> (vst2): Changed to VAR13 and added v4bf, v8bf >> (vst3): Changed to VAR13 and added v4bf, v8bf >> (vst4): Changed to VAR13 and added v4bf, v8bf >> * config/arm/iterators.md (VDXBF): New iterator. >> (VQ2BF): New iterator. >> (V_elem): Added V4BF, V8BF. >> (V_sz_elem): Added V4BF, V8BF. >> (V_mode_nunits): Added V4BF, V8BF. >> (q): Added V4BF, V8BF. >> *config/arm/neon.md (vst2): Used new iterators. >> (vst3): Used new iterators. >> (vst3qa): Used new iterators. >> (vst3qb): Used new iterators. >> (vst4): Used new iterators. >> (vst4qa): Used new iterators. >> (vst4qb): Used new iterators. >> >> >> gcc/testsuite/ChangeLog: >> >> 2019-11-14 Delia Burduv >> >> * gcc.target/arm/simd/bf16_vstn_1.c: New test. One thing I just noticed in this and the other arm bfloat16 patches... diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b, return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); } +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+bf16") + +typedef struct bfloat16x4x2_t +{ + bfloat16x4_t val[2]; +} bfloat16x4x2_t; These should be in a new arm_bf16.h file that gets included in the main arm_neon.h file, right? I believe the aarch64 versions are implemented that way. Otherwise the patch looks good to me. Thanks! Kyrill + +typedef struct bfloat16x8x2_t +{ + bfloat16x8_t val[2]; +} bfloat16x8x2_t; + diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c new file mode 100644 index ..b52ecfb959776fd04c7c33908cb7f8898ec3fe0b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf
Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Hi Delia, On 3/4/20 2:05 PM, Delia Burduv wrote: Hi, The previous version of this patch shared part of its code with the store intrinsics patch (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed any duplicated code. This patch now depends on the previously mentioned store intrinsics patch. Here is the latest version and the updated ChangeLog. gcc/ChangeLog: 2019-03-04 Delia Burduv * config/arm/arm_neon.h (bfloat16_t): New typedef. (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF): New iterator. (VQ2BF): New iterator. *config/arm/neon.md (vld2): Used new iterators. (vld2_dup): Used new iterators. (vld2_dupv8bf): New. (vst3): Used new iterators. (vst3qa): Used new iterators. (vst3qb): Used new iterators. (vld3_dup): Used new iterators. (vld3_dupv8bf): New. (vst4): Used new iterators. (vst4qa): Used new iterators. (vst4qb): Used new iterators. (vld4_dup): Used new iterators. (vld4_dupv8bf): New. gcc/testsuite/ChangeLog: 2019-03-04 Delia Burduv * gcc.target/arm/simd/bf16_vldn_1.c: New test. Thanks, Delia On 2/19/20 5:25 PM, Delia Burduv wrote: > > Hi, > > Here is the latest version of the patch. It just has some minor > formatting changes that were brought up by Richard Sandiford in the > AArch64 patches > > Thanks, > Delia > > On 1/22/20 5:31 PM, Delia Burduv wrote: >> Ping. >> >> I will change the tests to use the exact input and output registers as >> Richard Sandiford suggested for the AArch64 patches. >> >> On 12/20/19 6:48 PM, Delia Burduv wrote: >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics >>> vld{q}_bf16 as part of the BFloat16 extension. >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >>> >>> The intrinsics are declared in arm_neon.h . >>> A new test is added to check assembler output. >>> >>> This patch depends on the Arm back-end patche. >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >>> have commit rights, so if this is ok can someone please commit it for >>> me? >>> >>> gcc/ChangeLog: >>> >>> 2019-11-14 Delia Burduv >>> >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. >>> (bfloat16x4x2_t): New typedef. >>> (bfloat16x8x2_t): New typedef. >>> (bfloat16x4x3_t): New typedef. >>> (bfloat16x8x3_t): New typedef. >>> (bfloat16x4x4_t): New typedef. >>> (bfloat16x8x4_t): New typedef. >>> (vld2_bf16): New. >>> (vld2q_bf16): New. >>> (vld3_bf16): New. >>> (vld3q_bf16): New. >>> (vld4_bf16): New. >>> (vld4q_bf16): New. >>> (vld2_dup_bf16): New. >>> (vld2q_dup_bf16): New. >>> (vld3_dup_bf16): New. >>> (vld3q_dup_bf16): New. >>> (vld4_dup_bf16): New. >>> (vld4q_dup_bf16): New. >>> * config/arm/arm-builtins.c (E_V2BFmode): New mode. >>> (VAR13): New. >>> (arm_simd_types[Bfloat16x2_t]):New type. >>> * config/arm/arm-modes.def (V2BF): New mode. >>> * config/arm/arm-simd-builtin-types.def >>> (Bfloat16x2_t): New entry. >>> * config/arm/arm_neon_builtins.def >>> (vld2): Changed to VAR13 and added v4bf, v8bf >>> (vld2_dup): Changed to VAR8 and added v4bf, v8bf >>> (vld3): Changed to VAR13 and added v4bf, v8bf >>> (vld3_dup): Changed to VAR8 and added v4bf, v8bf >>> (vld4): Changed to VAR13 and added v4bf, v8bf >>> (vld4_dup): Changed to VAR8 and added v4bf, v8bf >>> * config/arm/iterators.md (VDXBF): New iterator. >>> (VQ2BF): New iterator. >>> (V_elem): Added V4BF, V8BF. >>> (V_sz_elem): Added V4BF, V8BF. >>> (V_mode_nunits): Added V4BF, V8BF. >>> (q): Added V4BF, V8BF. >>> *config/arm/neon.md (vld2): Used new iterators. >>> (vld2_dup): Used new iterators. >>> (vld2_dupv8bf): New. >>> (vst3): Used new iterators. >>> (vst3qa): Used new iterators. >>> (vst3qb): Used new iterators. >>> (vld3_dup): Used new iterators. >>>
Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 3/4/20 2:14 PM, Tamar Christina wrote: Hi Kyrill, Ok for backporting this patch to GCC 8 and GCC 9? Ok assuming bootstrap and test shows no problems. Thanks, Kyrill Thanks, Tamar -Original Message- From: gcc-patches-ow...@gcc.gnu.org On Behalf Of Kyrill Tkachov Sent: Thursday, January 30, 2020 14:55 To: Stam Markianos-Wright ; gcc- patc...@gcc.gnu.org Cc: ni...@redhat.com; Ramana Radhakrishnan ; Richard Earnshaw Subject: Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816) On 1/30/20 2:42 PM, Stam Markianos-Wright wrote: On 1/28/20 10:35 AM, Kyrill Tkachov wrote: Hi Stam, On 1/8/20 3:18 PM, Stam Markianos-Wright wrote: On 12/10/19 5:03 PM, Kyrill Tkachov wrote: Hi Stam, On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: Pinging with more correct maintainers this time :) Also would need to backport to gcc7,8,9, but need to get this approved first! Sorry for the delay. Same here now! Sorry totally forget about this in the lead up to Xmas! Done the changes marked below and also removed the unnecessary extra #defines from the test. This is ok with a nit on the testcase... diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c b/gcc/testsuite/gcc.target/arm/pr91816.c new file mode 100644 index ..757c897e9c0db32709227b3fdf 1 b4a8033428232 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr91816.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" } */ int +printf(const char *, ...); + I think this needs a couple of effective target checks like arm_hard_vfp_ok and arm_thumb2_ok. See other tests in gcc.target/arm that add -mthumb to the options. Hmm, looking back at this now, is there any reason why it can't just be: /* { dg-do compile } */ /* { dg-require-effective-target arm_thumb2_ok } */ /* { dg-additional-options "-mthumb" } */ were we don't override the march or fpu options at all, but just use `require-effective-target arm_thumb2_ok` to make sure that thumb2 is supported? The attached new diff does just that. Works for me, there are plenty of configurations run with fpu that it should get the right coverage. Ok (make sure commit the updated, if needed, ChangeLog as well) Thanks! Kyrill Cheers :) Stam. Thanks, Kyrill
Re: [Ping][PATCH][Arm] ACLE intrinsics: AdvSIMD BFloat16 convert instructions
Hi Dennis, On 3/2/20 5:41 PM, Dennis Zhang wrote: Hi all, On 17/01/2020 16:46, Dennis Zhang wrote: > Hi all, > > This patch is part of a series adding support for Armv8.6-A features. > It depends on Arm BFMode patch > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html > > This patch implements intrinsics to convert between bfloat16 and float32 > formats. > ACLE documents are at https://developer.arm.com/docs/101028/latest > ISA documents are at https://developer.arm.com/docs/ddi0596/latest > > Regression tested. > > Is it OK for trunk please? Ok. Thanks, Kyrill > > Thanks, > Dennis > > gcc/ChangeLog: > > 2020-01-17 Dennis Zhang > > * config/arm/arm_bf16.h (vcvtah_f32_bf16, vcvth_bf16_f32): New. > * config/arm/arm_neon.h (vcvt_f32_bf16, vcvtq_low_f32_bf16): New. > (vcvtq_high_f32_bf16, vcvt_bf16_f32): New. > (vcvtq_low_bf16_f32, vcvtq_high_bf16_f32): New. > * config/arm/arm_neon_builtins.def (vbfcvt, vbfcvt_high): New entries. > (vbfcvtv4sf, vbfcvtv4sf_high): Likewise. > * config/arm/iterators.md (VBFCVT, VBFCVTM): New mode iterators. > (V_bf_low, V_bf_cvt_m): New mode attributes. > * config/arm/neon.md (neon_vbfcvtv4sf): New. > (neon_vbfcvtv4sf_highv8bf, neon_vbfcvtsf): New. > (neon_vbfcvt, neon_vbfcvt_highv8bf): New. > (neon_vbfcvtbf_cvtmode, neon_vbfcvtbf): New > * config/arm/unspecs.md (UNSPEC_BFCVT, UNSPEC_BFCVT_HIG): New. > > gcc/testsuite/ChangeLog: > > 2020-01-17 Dennis Zhang > > * gcc.target/arm/simd/bf16_cvt_1.c: New test. > > The tests are updated in this patch for assembly test. Rebased to trunk top. Is it OK to commit please? Cheers Dennis
Re: [GCC][PATCH][ARM] Add multilib mapping for Armv8.1-M+MVE with -mfloat-abi=hard
Hi Mihail, On 2/20/20 4:15 PM, Mihail Ionescu wrote: Hi, This patch adds a new multilib for armv8.1-m.main+mve with hard float abi. For armv8.1-m.main+mve soft and softfp, the v8-M multilibs will be reused. The following mappings are also updated: "-mfloat-abi=hard -march=armv8.1-m.main+mve.fp -> armv8-m.main+fp/hard" "-mfloat-abi=softfp -march=armv8.1-m.main+mve.fp -> armv8-m.main+fp/softfp" "-mfloat-abi=soft -march=armv8.1-m.main+mve.fp -> armv8-m.main/nofp" The patch also includes a libgcc change to prevent cmse_nonsecure_call.S from being compiled for v8.1-M. v8.1-M doesn't need it since the same behaviour is achieved during code generation by using the new instructions[1]. [1] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html Tested on arm-none-eabi. gcc/ChangeLog: 2020-02-20 Mihail Ionescu * config/arm/t-rmprofile: create new multilib for armv8.1-m.main+mve hard float and reuse v8-m.main ones for v8.1-m.main+mve . gcc/testsuite/ChangeLog: 2020-02-20 Mihail Ionescu * testsuite/gcc.target/arm/multilib.exp: Add new v8.1-M entry. No testsuite/ in the prefix here. 2020-02-20 Mihail Ionescu libgcc/ChangLog: * config/arm/t-arm: Do not compile cmse_nonsecure_call.S for v8.1-m. Ok for trunk? Ok. Thanks, Kyrill Regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile index 0fb3084c8b20f16ccadba632fc55162b196651d5..16e368f25cc2e3ad341adc2752120ad0defdf2a4 100644 --- a/gcc/config/arm/t-rmprofile +++ b/gcc/config/arm/t-rmprofile @@ -27,8 +27,8 @@ # Arch and FPU variants to build libraries with -MULTI_ARCH_OPTS_RM = march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp -MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main v8-m.main+fp v8-m.main+dp +MULTI_ARCH_OPTS_RM = march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve +MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main v8-m.main+fp v8-m.main+dp v8.1-m.main+mve # Base M-profile (no fp) MULTILIB_REQUIRED += mthumb/march=armv6s-m/mfloat-abi=soft @@ -48,8 +48,7 @@ MULTILIB_REQUIRED += mthumb/march=armv8-m.main+fp/mfloat-abi=hard MULTILIB_REQUIRED += mthumb/march=armv8-m.main+fp/mfloat-abi=softfp MULTILIB_REQUIRED += mthumb/march=armv8-m.main+fp.dp/mfloat-abi=hard MULTILIB_REQUIRED += mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp - - +MULTILIB_REQUIRED += mthumb/march=armv8.1-m.main+mve/mfloat-abi=hard # Arch Matches MULTILIB_MATCHES += march?armv6s-m=march?armv6-m @@ -66,12 +65,14 @@ MULTILIB_MATCHES += march?armv7e-m+fp=march?armv7e-m+fpv5 MULTILIB_REUSE += $(foreach ARCH, armv6s-m armv7-m armv7e-m armv8-m\.base armv8-m\.main, \ mthumb/march.$(ARCH)/mfloat-abi.soft=mthumb/march.$(ARCH)/mfloat-abi.softfp) + # Map v8.1-M to v8-M. MULTILIB_MATCHES += march?armv8-m.main=march?armv8.1-m.main MULTILIB_MATCHES += march?armv8-m.main=march?armv8.1-m.main+dsp -MULTILIB_MATCHES += march?armv8-m.main=march?armv8.1-m.main+mve +MULTILIB_REUSE += mthumb/march.armv8-m\.main/mfloat-abi.soft=mthumb/march.armv8\.1-m\.main+mve/mfloat-abi.soft +MULTILIB_REUSE += mthumb/march.armv8-m\.main/mfloat-abi.soft=mthumb/march.armv8\.1-m\.main+mve/mfloat-abi.softfp -v8_1m_sp_variants = +fp +dsp+fp +mve.fp +v8_1m_sp_variants = +fp +dsp+fp +mve.fp +fp+mve v8_1m_dp_variants = +fp.dp +dsp+fp.dp +fp.dp+mve +fp.dp+mve.fp # Map all v8.1-m.main FP sp variants down to v8-m. diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp b/gcc/testsuite/gcc.target/arm/multilib.exp index 67d00266f6b5e69aa2a7831cfb9a4353ac4f4340..42aaebfabdf76c45a1909b2aaa1651d3c42ee4b7 100644 --- a/gcc/testsuite/gcc.target/arm/multilib.exp +++ b/gcc/testsuite/gcc.target/arm/multilib.exp @@ -813,6 +813,9 @@ if {[multilib_config "rmprofile"] } { {-march=armv8.1-m.main+mve.fp -mfpu=auto -mfloat-abi=soft} "thumb/v8-m.main/nofp" {-march=armv8.1-m.main+mve -mfpu=auto -mfloat-abi=softfp} "thumb/v8-m.main/nofp" {-march=armv8.1-m.main+mve.fp -mfpu=auto -mfloat-abi=softfp} "thumb/v8-m.main+fp/softfp" + {-march=armv8.1-m.main+mve -mfpu=auto -mfloat-abi=hard} "thumb/v8.1-m.main+mve/hard" + {-march=armv8.1-m.main+mve+fp -mfpu=auto -mfloat-abi=hard} "thumb/v8-m.main+fp/hard" + {-march=armv8.1-m.main+mve+fp -mfpu=auto -mfloat-abi=softfp} "thumb/v8-m.main+fp/softfp" {-march=armv8.1-m.main+mve.fp -mfpu=auto -mfloat-abi=hard} "thumb/v8-m.main+fp/hard" {-march=armv8.1-m.main+mve+fp.dp -mfpu=auto -mfloat-abi=soft} "thumb/v8-m.main
Re: [GCC][PATCH][ARM] Add vreinterpret, vdup, vget and vset bfloat16 intrinsic
Hi Mihail, On 2/27/20 2:44 PM, Mihail Ionescu wrote: Hi Kyrill, On 02/27/2020 11:09 AM, Kyrill Tkachov wrote: Hi Mihail, On 2/27/20 10:27 AM, Mihail Ionescu wrote: Hi, This patch adds support for the bf16 vector create, get, set, duplicate and reinterpret intrinsics. ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest Regression tested on arm-none-eabi. gcc/ChangeLog: 2020-02-27 Mihail Ionescu * (__ARM_NUM_LANES, __arm_lane, __arm_lane_q): Move to the beginning of the file. (vcreate_bf16, vcombine_bf16): New. (vdup_n_bf16, vdupq_n_bf16): New. (vdup_lane_bf16, vdup_laneq_bf16): New. (vdupq_lane_bf16, vdupq_laneq_bf16): New. (vduph_lane_bf16, vduph_laneq_bf16): New. (vset_lane_bf16, vsetq_lane_bf16): New. (vget_lane_bf16, vgetq_lane_bf16): New. (vget_high_bf16, vget_low_bf16): New. (vreinterpret_bf16_u8, vreinterpretq_bf16_u8): New. (vreinterpret_bf16_u16, vreinterpretq_bf16_u16): New. (vreinterpret_bf16_u32, vreinterpretq_bf16_u32): New. (vreinterpret_bf16_u64, vreinterpretq_bf16_u64): New. (vreinterpret_bf16_s8, vreinterpretq_bf16_s8): New. (vreinterpret_bf16_s16, vreinterpretq_bf16_s16): New. (vreinterpret_bf16_s32, vreinterpretq_bf16_s32): New. (vreinterpret_bf16_s64, vreinterpretq_bf16_s64): New. (vreinterpret_bf16_p8, vreinterpretq_bf16_p8): New. (vreinterpret_bf16_p16, vreinterpretq_bf16_p16): New. (vreinterpret_bf16_p64, vreinterpretq_bf16_p64): New. (vreinterpret_bf16_f32, vreinterpretq_bf16_f32): New. (vreinterpret_bf16_f64, vreinterpretq_bf16_f64): New. (vreinterpretq_bf16_p128): New. (vreinterpret_s8_bf16, vreinterpretq_s8_bf16): New. (vreinterpret_s16_bf16, vreinterpretq_s16_bf16): New. (vreinterpret_s32_bf16, vreinterpretq_s32_bf16): New. (vreinterpret_s64_bf16, vreinterpretq_s64_bf16): New. (vreinterpret_u8_bf16, vreinterpretq_u8_bf16): New. (vreinterpret_u16_bf16, vreinterpretq_u16_bf16): New. (vreinterpret_u32_bf16, vreinterpretq_u32_bf16): New. (vreinterpret_u64_bf16, vreinterpretq_u64_bf16): New. (vreinterpret_p8_bf16, vreinterpretq_p8_bf16): New. (vreinterpret_p16_bf16, vreinterpretq_p16_bf16): New. (vreinterpret_p64_bf16, vreinterpretq_p64_bf16): New. (vreinterpret_f32_bf16, vreinterpretq_f32_bf16): New. (vreinterpretq_p128_bf16): New. * config/arm/arm_neon_builtins.def (VDX): Add V4BF. (V_elem): Likewise. (V_elem_l): Likewise. (VD_LANE): Likewise. (VQX) Add V8BF. (V_DOUBLE): Likewise. (VDQX): Add V4BF and V8BF. (V_two_elem, V_three_elem, V_four_elem): Likewise. (V_reg): Likewise. (V_HALF): Likewise. (V_double_vector_mode): Likewise. (V_cmp_result): Likewise. (V_uf_sclr): Likewise. (V_sz_elem): Likewise. (Is_d_reg): Likewise. (V_mode_nunits): Likewise. * config/arm/neon.md (neon_vdup_lane): Enable for BFloat. gcc/testsuite/ChangeLog: 2020-02-27 Mihail Ionescu * gcc.target/arm/bf16_dup.c: New test. * gcc.target/arm/bf16_reinterpret.c: Likewise. Is it ok for trunk? This looks mostly ok with a few nits... Regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 09297831cdcd6e695843c17b7724c114f3a129fe..5901a8f1fb84f204ae95f0ccc97bf5ae944c482c 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -42,6 +42,15 @@ extern "C" { #include #include +#ifdef __ARM_BIG_ENDIAN +#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0])) +#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1)) +#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1)) +#else +#define __arm_lane(__vec, __idx) __idx +#define __arm_laneq(__vec, __idx) __idx +#endif + typedef __simd64_int8_t int8x8_t; typedef __simd64_int16_t int16x4_t; typedef __simd64_int32_t int32x2_t; @@ -6147,14 +6156,6 @@ vget_lane_s32 (int32x2_t __a, const int __b) /* For big-endian, GCC's vector indices are reversed within each 64 bits compared to the architectural lane indices used by Neon intrinsics. */ Please move this comment as well. -#ifdef __ARM_BIG_ENDIAN -#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0])) -#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1)) -#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1)) -#else -#define __arm_lane(__vec, __idx) __idx -#define __arm_laneq(__vec, __idx) __idx -#endif #define vget_lane_f16(__v, __idx) \ __extension__ \ @@ -14476,6 +14477,15 @@ vrein
Re: [GCC] Fix misleading aarch64 mcpu/march warning string
Hi Joel, On 2/27/20 2:31 PM, Joel Hutton wrote: The message for conflicting mcpu and march previously printed the architecture of the CPU instead of the CPU name, as well as omitting the extensions to the march string. This patch corrects both errors. This patch fixes PR target/87612. before: $ aarch64-unknown-linux-gnu-gcc -S -O3 -march=armv8-a+sve -mcpu=cortex-a76 foo.c cc1: warning: switch '-mcpu=armv8.2-a' conflicts with '-march=armv8-a' switch after: $ aarch64-unknown-linux-gnu-gcc -S -O3 -march=armv8-a+sve -mcpu=cortex-a76 foo.c cc1: warning: switch '-mcpu=cortex-a76' conflicts with '-march=armv8-a+sve' switch gcc/ChangeLog: 2020-02-27 Joel Hutton PR target/87612 * config/aarch64/aarch64.c (aarch64_override_options): Fix misleading warning string. Newline after the Name/email line in the ChangeLog. This is okay for trunk. Do you have commit access? If not, please follow the steps at https://gcc.gnu.org/gitwrite.html#authenticated listing myself as approver. Then you can commit this yourself. Thanks, Kyrill
Re: [GCC][PATCH][ARM] Add vreinterpret, vdup, vget and vset bfloat16 intrinsic
Hi Mihail, On 2/27/20 10:27 AM, Mihail Ionescu wrote: Hi, This patch adds support for the bf16 vector create, get, set, duplicate and reinterpret intrinsics. ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest Regression tested on arm-none-eabi. gcc/ChangeLog: 2020-02-27 Mihail Ionescu * (__ARM_NUM_LANES, __arm_lane, __arm_lane_q): Move to the beginning of the file. (vcreate_bf16, vcombine_bf16): New. (vdup_n_bf16, vdupq_n_bf16): New. (vdup_lane_bf16, vdup_laneq_bf16): New. (vdupq_lane_bf16, vdupq_laneq_bf16): New. (vduph_lane_bf16, vduph_laneq_bf16): New. (vset_lane_bf16, vsetq_lane_bf16): New. (vget_lane_bf16, vgetq_lane_bf16): New. (vget_high_bf16, vget_low_bf16): New. (vreinterpret_bf16_u8, vreinterpretq_bf16_u8): New. (vreinterpret_bf16_u16, vreinterpretq_bf16_u16): New. (vreinterpret_bf16_u32, vreinterpretq_bf16_u32): New. (vreinterpret_bf16_u64, vreinterpretq_bf16_u64): New. (vreinterpret_bf16_s8, vreinterpretq_bf16_s8): New. (vreinterpret_bf16_s16, vreinterpretq_bf16_s16): New. (vreinterpret_bf16_s32, vreinterpretq_bf16_s32): New. (vreinterpret_bf16_s64, vreinterpretq_bf16_s64): New. (vreinterpret_bf16_p8, vreinterpretq_bf16_p8): New. (vreinterpret_bf16_p16, vreinterpretq_bf16_p16): New. (vreinterpret_bf16_p64, vreinterpretq_bf16_p64): New. (vreinterpret_bf16_f32, vreinterpretq_bf16_f32): New. (vreinterpret_bf16_f64, vreinterpretq_bf16_f64): New. (vreinterpretq_bf16_p128): New. (vreinterpret_s8_bf16, vreinterpretq_s8_bf16): New. (vreinterpret_s16_bf16, vreinterpretq_s16_bf16): New. (vreinterpret_s32_bf16, vreinterpretq_s32_bf16): New. (vreinterpret_s64_bf16, vreinterpretq_s64_bf16): New. (vreinterpret_u8_bf16, vreinterpretq_u8_bf16): New. (vreinterpret_u16_bf16, vreinterpretq_u16_bf16): New. (vreinterpret_u32_bf16, vreinterpretq_u32_bf16): New. (vreinterpret_u64_bf16, vreinterpretq_u64_bf16): New. (vreinterpret_p8_bf16, vreinterpretq_p8_bf16): New. (vreinterpret_p16_bf16, vreinterpretq_p16_bf16): New. (vreinterpret_p64_bf16, vreinterpretq_p64_bf16): New. (vreinterpret_f32_bf16, vreinterpretq_f32_bf16): New. (vreinterpretq_p128_bf16): New. * config/arm/arm_neon_builtins.def (VDX): Add V4BF. (V_elem): Likewise. (V_elem_l): Likewise. (VD_LANE): Likewise. (VQX) Add V8BF. (V_DOUBLE): Likewise. (VDQX): Add V4BF and V8BF. (V_two_elem, V_three_elem, V_four_elem): Likewise. (V_reg): Likewise. (V_HALF): Likewise. (V_double_vector_mode): Likewise. (V_cmp_result): Likewise. (V_uf_sclr): Likewise. (V_sz_elem): Likewise. (Is_d_reg): Likewise. (V_mode_nunits): Likewise. * config/arm/neon.md (neon_vdup_lane): Enable for BFloat. gcc/testsuite/ChangeLog: 2020-02-27 Mihail Ionescu * gcc.target/arm/bf16_dup.c: New test. * gcc.target/arm/bf16_reinterpret.c: Likewise. Is it ok for trunk? This looks mostly ok with a few nits... Regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 09297831cdcd6e695843c17b7724c114f3a129fe..5901a8f1fb84f204ae95f0ccc97bf5ae944c482c 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -42,6 +42,15 @@ extern "C" { #include #include +#ifdef __ARM_BIG_ENDIAN +#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0])) +#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1)) +#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1)) +#else +#define __arm_lane(__vec, __idx) __idx +#define __arm_laneq(__vec, __idx) __idx +#endif + typedef __simd64_int8_t int8x8_t; typedef __simd64_int16_t int16x4_t; typedef __simd64_int32_t int32x2_t; @@ -6147,14 +6156,6 @@ vget_lane_s32 (int32x2_t __a, const int __b) /* For big-endian, GCC's vector indices are reversed within each 64 bits compared to the architectural lane indices used by Neon intrinsics. */ Please move this comment as well. -#ifdef __ARM_BIG_ENDIAN -#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0])) -#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1)) -#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1)) -#else -#define __arm_lane(__vec, __idx) __idx -#define __arm_laneq(__vec, __idx) __idx -#endif #define vget_lane_f16(__v, __idx) \ __extension__ \ @@ -14476,6 +14477,15 @@ vreinterpret_p16_u32 (uint32x2_t __a) #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE) __extension__ extern
Re: [Ping][PATCH][Arm] ACLE intrinsics for AdvSIMD bfloat16 dot product
Hi Dennis, On 2/25/20 5:18 PM, Dennis Zhang wrote: Hi Kyrill, On 25/02/2020 12:18, Kyrill Tkachov wrote: Hi Dennis, On 2/25/20 11:54 AM, Dennis Zhang wrote: Hi all, On 07/01/2020 12:12, Dennis Zhang wrote: > Hi all, > > This patch is part of a series adding support for Armv8.6-A features. > It depends on the patch enabling Arm BFmode > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html > > This patch adds intrinsics for brain half-precision float-point dot > product. > ACLE documents are at https://developer.arm.com/docs/101028/latest > ISA documents are at https://developer.arm.com/docs/ddi0596/latest > > Regression tested for arm-none-linux-gnueabi-armv8-a. > > Is it OK for trunk please? > > Thanks, > Dennis > > gcc/ChangeLog: > > 2020-01-03 Dennis Zhang > > * config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New > (vbfdot_lane_f32, vbfdotq_laneq_f32): New. > (vbfdot_laneq_f32, vbfdotq_lane_f32): New. > * config/arm/arm_neon_builtins.def (vbfdot): New. > (vbfdot_lanev4bf, vbfdot_lanev8bf): New. > * config/arm/iterators.md (VSF2BF): New mode attribute. > * config/arm/neon.md (neon_vbfdot): New. > (neon_vbfdot_lanev4bf): New. > (neon_vbfdot_lanev8bf): New. > > gcc/testsuite/ChangeLog: > > 2020-01-03 Dennis Zhang > > * gcc.target/arm/simd/bf16_dot_1.c: New test. > * gcc.target/arm/simd/bf16_dot_2.c: New test. > This patch updates tests in bf16_dot_1.c to make proper assembly check. Is it OK for trunk, please? Cheers Dennis Looks ok but... new file mode 100644 index 000..c533f9d0b2f --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_dot_2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include "arm_neon.h" + +float32x2_t +test_vbfdot_lane_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x4_t b) +{ + return __builtin_neon_vbfdot_lanev4bfv2sf (r, a, b, 2); /* { dg-error {out of range 0 - 1} } */ +} + +float32x4_t +test_vbfdotq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return __builtin_neon_vbfdot_lanev4bfv4sf (r, a, b, 2); /* { dg-error {out of range 0 - 1} } */ +} + +float32x2_t +test_vbfdot_laneq_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x8_t b) +{ + return __builtin_neon_vbfdot_lanev8bfv2sf (r, a, b, 4); /* { dg-error {out of range 0 - 3} } */ +} + +float32x4_t +test_vbfdotq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return __builtin_neon_vbfdot_lanev8bfv4sf (r, a, b, 4); /* { dg-error {out of range 0 - 3} } */ +} These tests shouldn't be calling the __builtin* directly, they are just an implementation detail. What we want to test is the intrinsic itself. Thanks, Kyrill Many thanks for the review. The issue is fixed in the updated patch. Is it ready please? Ok. Thanks, Kyrill Dennis Cheers gcc/ChangeLog: 2020-02-25 Dennis Zhang * config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New (vbfdot_lane_f32, vbfdotq_laneq_f32): New. (vbfdot_laneq_f32, vbfdotq_lane_f32): New. * config/arm/arm_neon_builtins.def (vbfdot): New entry. (vbfdot_lanev4bf, vbfdot_lanev8bf): Likewise. * config/arm/iterators.md (VSF2BF): New attribute. * config/arm/neon.md (neon_vbfdot): New entry. (neon_vbfdot_lanev4bf): Likewise. (neon_vbfdot_lanev8bf): Likewise. gcc/testsuite/ChangeLog: 2020-02-25 Dennis Zhang * gcc.target/arm/simd/bf16_dot_1.c: New test. * gcc.target/arm/simd/bf16_dot_2.c: New test. * gcc.target/arm/simd/bf16_dot_3.c: New test.
Re: [ARM] Fix -mpure-code for v6m
Hi Christophe, On 2/24/20 2:16 PM, Christophe Lyon wrote: Ping? I'd also like to backport this and the main patch (svn r279463, r10-5505-ge24f6408df1e4c5e8c09785d7b488c492dfb68b3) to the gcc-9 branch. I found the problem addressed by this patch while validating the backport to gcc-9: although the patch applies cleanly except for testcases dg directives, there were some failures which I could finally reproduce on trunk with -fdisable-rtl-fwprop2. Here is a summary of the validations I ran using --target arm-eabi: * without my patches: (1) --with-cpu cortex-m0 (2) --with-cpu cortex-m4 (3) --with-cpu cortex-m4 CFLAGS_FOR_TARGET=-mpure-code (to build the libs with -mpure-code) (4) --with-cpu cortex-m4 CFLAGS_FOR_TARGET=-mpure-code --target-board=-mpure-code (to also run the tests with -mpure-code) * with my patches: (5) --with-cpu cortex-m0 CFLAGS_FOR_TARGET=-mpure-code --target-board=-mpure-code (6) --with-cpu cortex-m4 CFLAGS_FOR_TARGET=-mpure-code --target-board=-mpure-code Comparing (4) and (6) ensured that my (v6m) patches introduce no regression in v7m cases. Comparison of (1) vs (5) gave results similar to (2) vs (6), there's a bit of noise because some tests cases don't cope well with -mpure-code despite my previous testsuite-only patch (svn r277828) Comparison of (1) vs (2) gave similar results to (5) vs (6). Ideally, we may also want to backport svn r277828 (testsuite-only patch, to handle -mpure-code better), but that's not mandatory. In summary, is this patch OK for trunk? Are this patch and r279463, r10-5505-ge24f6408df1e4c5e8c09785d7b488c492dfb68b3 OK to backport to gcc-9? This is okay with me. I don't think any of the branches are frozen at the moment, so it should be okay to backport it. Thanks, Kyrill Thanks, Christophe On Thu, 13 Feb 2020 at 11:14, Christophe Lyon wrote: > > On Mon, 10 Feb 2020 at 17:45, Richard Earnshaw (lists) > wrote: > > > > On 10/02/2020 09:27, Christophe Lyon wrote: > > > On Fri, 7 Feb 2020 at 17:55, Richard Earnshaw (lists) > > > wrote: > > >> > > >> On 07/02/2020 16:43, Christophe Lyon wrote: > > >>> On Fri, 7 Feb 2020 at 14:49, Richard Earnshaw (lists) > > >>> wrote: > > > > On 07/02/2020 13:19, Christophe Lyon wrote: > > > When running the testsuite with -fdisable-rtl-fwprop2 and -mpure-code > > > for cortex-m0, I noticed that some testcases were failing because we > > > still generate "ldr rX, .LCY", which is what we want to avoid with > > > -mpure-code. This is latent since a recent improvement in fwprop > > > (PR88833). > > > > > > In this patch I change the thumb1_movsi_insn pattern so that it emits > > > the desired instruction sequence when arm_disable_literal_pool is set. > > > > > > I tried to add a define_split instead, but couldn't make it work: the > > > compiler then complains it cannot split the instruction, while my new > > > define_split accepts the same operand types as thumb1_movsi_insn: > > > > > > c-c++-common/torture/complex-sign-mixed-add.c:41:1: error: could not split insn > > > (insn 2989 425 4844 (set (reg/f:SI 3 r3 [1342]) > > > (symbol_ref/u:SI ("*.LC6") [flags 0x2])) 836 {*thumb1_movsi_insn} > > > (expr_list:REG_EQUIV (symbol_ref/u:SI ("*.LC6") [flags 0x2]) > > > (nil))) > > > during RTL pass: final > > > > > > (define_split > > > [(set (match_operand:SI 0 "register_operand" "") > > > (match_operand:SI 1 "general_operand" ""))] > > > "TARGET_THUMB1 > > > && arm_disable_literal_pool > > > && GET_CODE (operands[1]) == SYMBOL_REF" > > > [(clobber (const_int 0))] > > > " > > > gen_thumb1_movsi_symbol_ref(operands[0], operands[1]); > > > DONE; > > > " > > > ) > > > and I put this in thumb1_movsi_insn: > > > if (GET_CODE (operands[1]) == SYMBOL_REF && arm_disable_literal_pool) > > > { > > > return \"#\"; > > > } > > > return \"ldr\\t%0, %1\"; > > > > > > 2020-02-07 Christophe Lyon > > > > > > * config/arm/thumb1.md (thumb1_movsi_insn): Fix ldr alternative to > > > work with -mpure-code. > > > > > > > + case 0: > > + case 1: > > + return \"movs %0, %1\"; > > + case 2: > > + return \"movw %0, %1\"; > > > > This is OK, but please replace the hard tab in the strings for MOVS/MOVW > > with \\t. > > > > >>> > > >>> OK that was merely a cut & paste from the existing code. > > >>> > > >>> I'm concerned that the length attribute is becoming wrong with my > > >>> patch, isn't this a problem? > > >>> > > >> > > >> Potentially yes. The branch range code needs this to handle overly long > > >> jumps correctly. > > >> > > > > > > Do you mean that the probability of problems due to that shortcoming > > > is low enough tha
Re: [Ping][PATCH][Arm] ACLE intrinsics for AdvSIMD bfloat16 dot product
Hi Dennis, On 2/25/20 11:54 AM, Dennis Zhang wrote: Hi all, On 07/01/2020 12:12, Dennis Zhang wrote: > Hi all, > > This patch is part of a series adding support for Armv8.6-A features. > It depends on the patch enabling Arm BFmode > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html > > This patch adds intrinsics for brain half-precision float-point dot > product. > ACLE documents are at https://developer.arm.com/docs/101028/latest > ISA documents are at https://developer.arm.com/docs/ddi0596/latest > > Regression tested for arm-none-linux-gnueabi-armv8-a. > > Is it OK for trunk please? > > Thanks, > Dennis > > gcc/ChangeLog: > > 2020-01-03 Dennis Zhang > > * config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New > (vbfdot_lane_f32, vbfdotq_laneq_f32): New. > (vbfdot_laneq_f32, vbfdotq_lane_f32): New. > * config/arm/arm_neon_builtins.def (vbfdot): New. > (vbfdot_lanev4bf, vbfdot_lanev8bf): New. > * config/arm/iterators.md (VSF2BF): New mode attribute. > * config/arm/neon.md (neon_vbfdot): New. > (neon_vbfdot_lanev4bf): New. > (neon_vbfdot_lanev8bf): New. > > gcc/testsuite/ChangeLog: > > 2020-01-03 Dennis Zhang > > * gcc.target/arm/simd/bf16_dot_1.c: New test. > * gcc.target/arm/simd/bf16_dot_2.c: New test. > This patch updates tests in bf16_dot_1.c to make proper assembly check. Is it OK for trunk, please? Cheers Dennis Looks ok but... new file mode 100644 index 000..c533f9d0b2f --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_dot_2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include "arm_neon.h" + +float32x2_t +test_vbfdot_lane_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x4_t b) +{ + return __builtin_neon_vbfdot_lanev4bfv2sf (r, a, b, 2); /* { dg-error {out of range 0 - 1} } */ +} + +float32x4_t +test_vbfdotq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return __builtin_neon_vbfdot_lanev4bfv4sf (r, a, b, 2); /* { dg-error {out of range 0 - 1} } */ +} + +float32x2_t +test_vbfdot_laneq_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x8_t b) +{ + return __builtin_neon_vbfdot_lanev8bfv2sf (r, a, b, 4); /* { dg-error {out of range 0 - 3} } */ +} + +float32x4_t +test_vbfdotq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return __builtin_neon_vbfdot_lanev8bfv4sf (r, a, b, 4); /* { dg-error {out of range 0 - 3} } */ +} These tests shouldn't be calling the __builtin* directly, they are just an implementation detail. What we want to test is the intrinsic itself. Thanks, Kyrill
Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops
Hi Roman, On 2/21/20 3:49 PM, Roman Zhuykov wrote: 11.02.2020 14:00, Richard Earnshaw (lists) wrote: +(define_insn "*doloop_end" + [(parallel [(set (pc) + (if_then_else + (ne (reg:SI LR_REGNUM) (const_int 1)) + (label_ref (match_operand 0 "" "")) + (pc))) + (set (reg:SI LR_REGNUM) + (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])] + "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched" + "le\tlr, %l0") Is it deliberate that this pattern name has a '*' prefix? doloop_end is a named expansion pattern according to md.texi. R. 21.02.2020 18:30, Kyrill Tkachov wrote: +;; Originally expanded by 'doloop_end'. +(define_insn "doloop_end_internal" We usually prefer to name these patterns with a '*' in front to prevent the gen* machinery from generating gen_* unneeded expanders for them if they're not used. It seems you and Richard asking Andrea to do the opposite things. :) LOL.patch Almost, but not exactly incompatible things ;) doloop_end is a standard name and if we wanted to use it directly it cannot have a '*', which Richard is right to point out. Once "doloop_end" is moved to its own expander and the define_insn is doloop_end_internal, there is no reason for it to not have a '*' as its gen_* form is never called. Thanks, Kyrill Roman PS. I don't have an idea what approach is correct.
Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops
Hi Andrea, On 2/19/20 1:01 PM, Andrea Corallo wrote: Hi all, Second version of the patch here addressing comments. This patch enables the Armv8.1-M Mainline LOB (low overhead branch) extension low overhead loops (LOL) feature by using the 'loop-doloop' pass. Given the following function: void loop (int *a) { for (int i = 0; i < 1000; i++) a[i] = i; } 'doloop_begin' and 'doloop_end' patterns translates into 'dls' and 'le' giving: loop: movw r2, #1 movs r3, #0 subs r0, r0, #4 push {lr} dls lr, r2 .L2: str r3, [r0, #4]! adds r3, r3, #1 le lr, .L2 ldr pc, [sp], #4 SMS is disabled in tests not to break them when SMS does loop versioning. bootstrapped arm-none-linux-gnueabihf, do not introduce testsuite regressions. This should be aimed at GCC 11 at this point. Some comments inline... Andrea gcc/ChangeLog: 2020-??-?? Andrea Corallo Mihail-Calin Ionescu Iain Apreotesei * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP): (arm_invalid_within_doloop): Implement invalid_within_doloop hook. * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro. * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn): Add new patterns. * config/arm/unspecs.md: Add new unspec. gcc/testsuite/ChangeLog: 2020-??-?? Andrea Corallo Mihail-Calin Ionescu Iain Apreotesei * gcc.target/arm/lob.h: New header. * gcc.target/arm/lob1.c: New testcase. * gcc.target/arm/lob2.c: Likewise. * gcc.target/arm/lob3.c: Likewise. * gcc.target/arm/lob4.c: Likewise. * gcc.target/arm/lob5.c: Likewise. * gcc.target/arm/lob6.c: Likewise. lol.patch diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index e07cf03538c5..1269f40bd77c 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -586,6 +586,9 @@ extern int arm_arch_bf16; /* Target machine storage Layout. */ +/* Nonzero if this chip provides Armv8.1-M Mainline + LOB (low overhead branch features) extension instructions. */ +#define TARGET_HAVE_LOB (arm_arch8_1m_main) /* Define this macro if it is advisable to hold scalars in registers in a wider mode than that declared by the program. In such cases, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9cc7bc0e5621..7c2a7b7e9e97 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -833,6 +833,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_CONSTANT_ALIGNMENT #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment +#undef TARGET_INVALID_WITHIN_DOLOOP +#define TARGET_INVALID_WITHIN_DOLOOP arm_invalid_within_doloop + #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust @@ -32937,6 +32940,27 @@ arm_ge_bits_access (void) return true; } +/* NULL if INSN insn is valid within a low-overhead loop. + Otherwise return why doloop cannot be applied. */ + +static const char * +arm_invalid_within_doloop (const rtx_insn *insn) +{ + if (!TARGET_HAVE_LOB) +return default_invalid_within_doloop (insn); + + if (CALL_P (insn)) +return "Function call in the loop."; + + if (tablejump_p (insn, NULL, NULL) || computed_jump_p (insn)) +return "Computed branch in the loop."; + + if (reg_mentioned_p (gen_rtx_REG (SImode, LR_REGNUM), insn)) +return "LR is used inside loop."; + + return NULL; +} + #if CHECKING_P namespace selftest { diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index b0d3bd1cf1c4..4aff1a0838d8 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1555,8 +1555,11 @@ using a certain 'count' register and (2) the loop count can be adjusted by modifying this register prior to the loop. ??? The possible introduction of a new block to initialize the - new IV can potentially affect branch optimizations. */ - if (optimize > 0 && flag_modulo_sched) + new IV can potentially affect branch optimizations. + + Also used to implement the low over head loops feature, which is part of + the Armv8.1-M Mainline Low Overhead Branch (LOB) extension. */ + if (optimize > 0 && (flag_modulo_sched || TARGET_HAVE_LOB)) { rtx s0; rtx bcomp; @@ -1569,6 +1572,11 @@ FAIL; s0 = operands [0]; + + /* Low over head loop instructions require the first operand to be LR. */ + if (TARGET_HAVE_LOB) + s0 = gen_rtx_REG (SImode, LR_REGNUM); + if (TARGET_THUMB2) insn = emit_insn (gen_thumb2_addsi3_compare0 (s0, s0, GEN_INT (-1))); else @@ -1650,3 +1658,30 @@ "TARGET_HAVE_MVE" "lsrl%?\\t%Q0, %R0, %1" [(set_attr "predicable" "yes")]) + +;; Originally expanded by 'doloop_end'. +(define_insn "doloop_end_internal" We usually prefer to name these patterns with a '*' in front to prevent the gen* mach
Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32
Hi Delia, On 2/19/20 5:25 PM, Delia Burduv wrote: Hi, Here is the latest version of the patch. It just has some minor formatting changes that were brought up by Richard Sandiford in the AArch64 patches Thanks, Delia On 1/22/20 5:29 PM, Delia Burduv wrote: > Ping. > > I will change the tests to use the exact input and output registers as > Richard Sandiford suggested for the AArch64 patches. > > On 12/20/19 6:46 PM, Delia Burduv wrote: >> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics >> vst{q}_bf16 as part of the BFloat16 extension. >> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >> >> The intrinsics are declared in arm_neon.h . >> A new test is added to check assembler output. >> >> This patch depends on the Arm back-end patche. >> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >> >> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >> have commit rights, so if this is ok can someone please commit it for me? >> >> gcc/ChangeLog: >> >> 2019-11-14 Delia Burduv >> >> * config/arm/arm_neon.h (bfloat16_t): New typedef. >> (bfloat16x4x2_t): New typedef. >> (bfloat16x8x2_t): New typedef. >> (bfloat16x4x3_t): New typedef. >> (bfloat16x8x3_t): New typedef. >> (bfloat16x4x4_t): New typedef. >> (bfloat16x8x4_t): New typedef. >> (vst2_bf16): New. >> (vst2q_bf16): New. >> (vst3_bf16): New. >> (vst3q_bf16): New. >> (vst4_bf16): New. >> (vst4q_bf16): New. >> * config/arm/arm-builtins.c (E_V2BFmode): New mode. >> (VAR13): New. >> (arm_simd_types[Bfloat16x2_t]):New type. >> * config/arm/arm-modes.def (V2BF): New mode. >> * config/arm/arm-simd-builtin-types.def >> (Bfloat16x2_t): New entry. >> * config/arm/arm_neon_builtins.def >> (vst2): Changed to VAR13 and added v4bf, v8bf >> (vst3): Changed to VAR13 and added v4bf, v8bf >> (vst4): Changed to VAR13 and added v4bf, v8bf >> * config/arm/iterators.md (VDXBF): New iterator. >> (VQ2BF): New iterator. >> (V_elem): Added V4BF, V8BF. >> (V_sz_elem): Added V4BF, V8BF. >> (V_mode_nunits): Added V4BF, V8BF. >> (q): Added V4BF, V8BF. >> *config/arm/neon.md (vst2): Used new iterators. >> (vst3): Used new iterators. >> (vst3qa): Used new iterators. >> (vst3qb): Used new iterators. >> (vst4): Used new iterators. >> (vst4qa): Used new iterators. >> (vst4qb): Used new iterators. >> >> >> gcc/testsuite/ChangeLog: >> >> 2019-11-14 Delia Burduv >> >> * gcc.target/arm/simd/bf16_vstn_1.c: New test. One thing I just noticed in this and the other arm bfloat16 patches... diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b, return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); } +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+bf16") + +typedef struct bfloat16x4x2_t +{ + bfloat16x4_t val[2]; +} bfloat16x4x2_t; These should be in a new arm_bf16.h file that gets included in the main arm_neon.h file, right? I believe the aarch64 versions are implemented that way. Otherwise the patch looks good to me. Thanks! Kyrill + +typedef struct bfloat16x8x2_t +{ + bfloat16x8_t val[2]; +} bfloat16x8x2_t; +
Re: [PATCH, GCC/ARM] Fix MVE scalar shift tests
On 2/21/20 11:51 AM, Kyrill Tkachov wrote: Hi Mihail, On 2/19/20 4:27 PM, Mihail Ionescu wrote: Hi Christophe, On 01/23/2020 09:34 AM, Christophe Lyon wrote: > On Mon, 20 Jan 2020 at 19:01, Mihail Ionescu > wrote: >> >> Hi, >> >> This patch fixes the scalar shifts tests added in: >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01195.html >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01196.html >> By adding mthumb and ensuring that the target supports >> thumb2 instructions. >> >> >> *** gcc/testsuite/ChangeLog *** >> >> 2020-01-20 Mihail-Calin Ionescu >> >> * gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c: Add mthumb and target check. >> * gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise. >> >> >> Is this ok for trunk? >> > > Why not add a new entry in check_effective_target_arm_arch_FUNC_ok? > (there are already plenty, including v8m_main for instance) > Sorry for the delay, we were going to add the check_effective_target to the MVE framework patches and then update this one. But I came across some big endian issues and decided to update this now. I've added the target check and changed the patch so it also disables the scalar shift patterns when generating big endian code. At the moment they are broken because the MVE shift instructions have the restriction of having an even gp register specified first, followed by the odd one, which requires swapping the data twice in big endian. In this case, the previous code gen is preferred. *** gcc/ChangeLog *** 2020-02-19 Mihail-Calin Ionescu * config/arm/arm.md (ashldi3, ashrdi3, lshrdi3): Prevent scalar shifts from being used on when big endian is enabled. *** gcc/testsuite/ChangeLog *** 2020-02-19 Mihail-Calin Ionescu * gcc.target/arm/armv8_1m-shift-imm-1.c: Add MVE target checks. * gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise. * lib/target-supports.exp (check_effective_target_arm_v8_1m_mve_ok_nocache): New. (check_effective_target_arm_v8_1m_mve_ok): New. (add_options_for_v8_1m_mve): New. Is this ok for trunk? This is ok, but please do a follow up patch to add the new effective target check to sourcebuild.texi (I know, we tend to forget to do it!) I should say that such a patch is pre-approved. Thanks, Kyrill > Christophe > >> >> Regards, >> Mihail >> >> >> ### Attachment also inlined for ease of reply ### >> >> >> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c >> index 5ffa3769e6ba42466242d3038857734e87b2f1fc..9822f59643c662c9302ad43c09057c59f3cbe07a 100644 >> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c >> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c >> @@ -1,5 +1,6 @@ >> /* { dg-do compile } */ >> -/* { dg-options "-O2 -march=armv8.1-m.main+mve -mfloat-abi=softfp" } */ >> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve -mfloat-abi=softfp" } */ >> +/* { dg-require-effective-target arm_thumb2_ok } */ >> >> long long longval1; >> long long unsigned longval2; >> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c >> index a97e9d687ef66e9642dd1d735125c8ee941fb151..a9aa7ed3ad9204c03d2c15dc6920ca3159403fa0 100644 >> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c >> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c >> @@ -1,5 +1,6 @@ >> /* { dg-do compile } */ >> -/* { dg-options "-O2 -march=armv8.1-m.main+mve -mfloat-abi=softfp" } */ >> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve -mfloat-abi=softfp" } */ >> +/* { dg-require-effective-target arm_thumb2_ok } */ >> >> long long longval2; >> int intval2; >> Regards, Mihail
Re: [PATCH, GCC/ARM] Fix MVE scalar shift tests
Hi Mihail, On 2/19/20 4:27 PM, Mihail Ionescu wrote: Hi Christophe, On 01/23/2020 09:34 AM, Christophe Lyon wrote: > On Mon, 20 Jan 2020 at 19:01, Mihail Ionescu > wrote: >> >> Hi, >> >> This patch fixes the scalar shifts tests added in: >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01195.html >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01196.html >> By adding mthumb and ensuring that the target supports >> thumb2 instructions. >> >> >> *** gcc/testsuite/ChangeLog *** >> >> 2020-01-20 Mihail-Calin Ionescu >> >> * gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c: Add mthumb and target check. >> * gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise. >> >> >> Is this ok for trunk? >> > > Why not add a new entry in check_effective_target_arm_arch_FUNC_ok? > (there are already plenty, including v8m_main for instance) > Sorry for the delay, we were going to add the check_effective_target to the MVE framework patches and then update this one. But I came across some big endian issues and decided to update this now. I've added the target check and changed the patch so it also disables the scalar shift patterns when generating big endian code. At the moment they are broken because the MVE shift instructions have the restriction of having an even gp register specified first, followed by the odd one, which requires swapping the data twice in big endian. In this case, the previous code gen is preferred. *** gcc/ChangeLog *** 2020-02-19 Mihail-Calin Ionescu * config/arm/arm.md (ashldi3, ashrdi3, lshrdi3): Prevent scalar shifts from being used on when big endian is enabled. *** gcc/testsuite/ChangeLog *** 2020-02-19 Mihail-Calin Ionescu * gcc.target/arm/armv8_1m-shift-imm-1.c: Add MVE target checks. * gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise. * lib/target-supports.exp (check_effective_target_arm_v8_1m_mve_ok_nocache): New. (check_effective_target_arm_v8_1m_mve_ok): New. (add_options_for_v8_1m_mve): New. Is this ok for trunk? This is ok, but please do a follow up patch to add the new effective target check to sourcebuild.texi (I know, we tend to forget to do it!) Thanks, Kyrill > Christophe > >> >> Regards, >> Mihail >> >> >> ### Attachment also inlined for ease of reply ### >> >> >> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c >> index 5ffa3769e6ba42466242d3038857734e87b2f1fc..9822f59643c662c9302ad43c09057c59f3cbe07a 100644 >> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c >> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c >> @@ -1,5 +1,6 @@ >> /* { dg-do compile } */ >> -/* { dg-options "-O2 -march=armv8.1-m.main+mve -mfloat-abi=softfp" } */ >> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve -mfloat-abi=softfp" } */ >> +/* { dg-require-effective-target arm_thumb2_ok } */ >> >> long long longval1; >> long long unsigned longval2; >> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c >> index a97e9d687ef66e9642dd1d735125c8ee941fb151..a9aa7ed3ad9204c03d2c15dc6920ca3159403fa0 100644 >> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c >> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c >> @@ -1,5 +1,6 @@ >> /* { dg-do compile } */ >> -/* { dg-options "-O2 -march=armv8.1-m.main+mve -mfloat-abi=softfp" } */ >> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve -mfloat-abi=softfp" } */ >> +/* { dg-require-effective-target arm_thumb2_ok } */ >> >> long long longval2; >> int intval2; >> Regards, Mihail
Re: [Ping][PATCH][Arm] ACLE 8-bit integer matrix multiply-accumulate intrinsics
Hi Dennis, On 2/11/20 12:03 PM, Dennis Zhang wrote: Hi all, On 16/12/2019 13:45, Dennis Zhang wrote: > Hi all, > > This patch is part of a series adding support for Armv8.6-A features. > It depends on the Arm Armv8.6-A CLI patch, > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html. > It also depends on the Armv8.6-A effective target checking patch, > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html. > It also depends on the ARMv8.6-A I8MM dot product patch for using the > same builtin qualifier > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00945.html. > > This patch adds intrinsics for matrix multiply-accumulate operations > including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32. > > ACLE documents are at https://developer.arm.com/docs/101028/latest > ISA documents are at https://developer.arm.com/docs/ddi0596/latest > > Regtested for arm-none-linux-gnueabi-armv8.2-a. > > Is it OK for trunk please? > This is ok. Thanks, Kyrill > Thanks, > Dennis > > gcc/ChangeLog: > > 2019-12-10 Dennis Zhang > > * config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, vusmmlaq_s32): New. > * config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New. > * config/arm/iterators.md (MATMUL): New. > (sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US. > (mmla_sfx): New. > * config/arm/neon.md (neon_mmlav16qi): New. > * config/arm/unspecs.md (UNSPEC_MATMUL_S): New. > (UNSPEC_MATMUL_U, UNSPEC_MATMUL_US): New. > > gcc/testsuite/ChangeLog: > > 2019-12-10 Dennis Zhang > > * gcc.target/arm/simd/vmmla_1.c: New test. This patch has been updated according to the feedback on related AArch64 version at https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01591.html Regtested. OK to commit please? Many thanks Dennis gcc/ChangeLog: 2020-02-11 Dennis Zhang * config/arm/arm-builtins.c (USTERNOP_QUALIFIERS): New macro. * config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, vusmmlaq_s32): New. * config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New. * config/arm/iterators.md (MATMUL): New iterator. (sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US. (mmla_sfx): New attribute. * config/arm/neon.md (neon_mmlav16qi): New. * config/arm/unspecs.md (UNSPEC_MATMUL_S, UNSPEC_MATMUL_U): New. (UNSPEC_MATMUL_US): New. gcc/testsuite/ChangeLog: 2020-02-11 Dennis Zhang * gcc.target/arm/simd/vmmla_1.c: New test.
Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD
Hi Delia, On 2/19/20 5:23 PM, Delia Burduv wrote: Hi, Here is the latest version of the patch. It just has some minor formatting changes that were brought up by Richard Sandiford in the AArch64 patches Thanks, Delia On 1/31/20 3:23 PM, Delia Burduv wrote: Here is the updated patch. The changes are minor, so let me know if there is anything else to fix or if it can be committed. Thank you, Delia On 1/30/20 2:55 PM, Kyrill Tkachov wrote: Hi Delia, On 1/28/20 4:44 PM, Delia Burduv wrote: Ping. *From:* Delia Burduv *Sent:* 22 January 2020 17:26 *To:* gcc-patches@gcc.gnu.org *Cc:* ni...@redhat.com ; Richard Earnshaw ; Ramana Radhakrishnan ; Kyrylo Tkachov *Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD Ping. I have read Richard Sandiford's comments on the AArch64 patches and I will apply what is relevant to this patch as well. Particularly, I will change the tests to use the exact input and output registers and I will change the types of the rtl patterns. Please send the updated patches so that someone can commit them for you once they're reviewed. Thanks, Kyrill On 12/20/19 6:44 PM, Delia Burduv wrote: > This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and vfmat > as part of the BFloat16 extension. > (https://developer.arm.com/docs/101028/latest.) > The intrinsics are declared in arm_neon.h and the RTL patterns are > defined in neon.md. > Two new tests are added to check assembler output and lane indices. > > This patch depends on the Arm back-end patche. > (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) > > Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have > commit rights, so if this is ok can someone please commit it for me? > > gcc/ChangeLog: > > 2019-11-12� Delia Burduv > >� ����* config/arm/arm_neon.h (vbfmmlaq_f32): New. >� ����� (vbfmlalbq_f32): New. >� ����� (vbfmlaltq_f32): New. >� ����� (vbfmlalbq_lane_f32): New. >� ����� (vbfmlaltq_lane_f32): New. >� ������� (vbfmlalbq_laneq_f32): New. >� ����� (vbfmlaltq_laneq_f32): New. >� ����* config/arm/arm_neon_builtins.def (vbfmmla): New. >� ��������� (vbfmab): New. >� ��������� (vbfmat): New. >� ��������� (vbfmab_lane): New. >� ��������� (vbfmat_lane): New. >� ��������� (vbfmab_laneq): New. >� ��������� (vbfmat_laneq): New. >� ���� * config/arm/iterators.md (BF_MA): New int iterator. >� ��������� (bt): New int attribute. >� ��������� (VQXBF): Copy of VQX with V8BF. >� ��������� (V_HALF): Added V8BF. >� ����� * config/arm/neon.md (neon_vbfmmlav8hi): New insn. >� ��������� (neon_vbfmav8hi): New insn. >� ��������� (neon_vbfma_lanev8hi): New insn. >� ��������� (neon_vbfma_laneqv8hi): New expand. >� ��������� (neon_vget_high): Changed iterator to VQXBF. >� ����* config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC. >� ��������� (UNSPEC_BFMAB): New UNSPEC. >� ��������� (UNSPEC_BFMAT): New UNSPEC. > > 2019-11-12� Delia Burduv > >� ������� * gcc.target/arm/simd/bf16_ma_1.c: New test. >� ������� * gcc.target/arm/simd/bf16_ma_2.c: New test. >� ������� * gcc.target/arm/simd/bf16_mmla_1.c: New test. This looks good, a few minor things though... diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3c78f435009ab027f92693d00ab5b40960d5419d..81f8008ea6a5fb11eb09f6685ba24bb0c54fb248 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18742,6 +18742,64 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b, return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); } +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+bf16") + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ + return __builtin_neon_vbfmmlav8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ + return __builtin_neon_vbfmabv8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
Re: [PATCH v2][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
On 2/14/20 4:34 PM, Srinath Parvathaneni wrote: Hello Kyrill, In this patch (v2) all the review comments mentioned in previous patch (v1) are addressed. (v1) https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01401.html # Hello, This patch is part of MVE ACLE intrinsics framework. The patch supports the use of emulation for the single-precision arithmetic operations for MVE. This changes are to support the MVE ACLE intrinsics which operates on vector floating point arithmetic operations. Please refer to Arm reference manual [1] for more details. [1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914 Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Ok. Thanks, Kyrill Thanks, Srinath. gcc/ChangeLog: 2019-11-11 Andre Vieira Srinath Parvathaneni * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify function to add emulator calls for dobule precision arithmetic operations for MVE. ### Attachment also inlined for ease of reply ### >From af9d1eb4470c26564b69518bbec3fce297501fdd Mon Sep 17 00:00:00 2001 From: Srinath Parvathaneni Date: Tue, 11 Feb 2020 18:42:20 + Subject: [PATCH] [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch. --- gcc/config/arm/arm.c | 22 ++- .../gcc.target/arm/mve/intrinsics/mve_libcall1.c | 70 ++ .../gcc.target/arm/mve/intrinsics/mve_libcall2.c | 70 ++ 3 files changed, 159 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall2.c diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 037f298..e00024b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -5754,9 +5754,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall) /* Values from double-precision helper functions are returned in core registers if the selected core only supports single-precision arithmetic, even if we are using the hard-float ABI. The same is - true for single-precision helpers, but we will never be using the - hard-float ABI on a CPU which doesn't support single-precision - operations in hardware. */ + true for single-precision helpers except in case of MVE, because in + MVE we will be using the hard-float ABI on a CPU which doesn't support + single-precision operations in hardware. In MVE the following check + enables use of emulation for the single-precision arithmetic + operations. */ + if (TARGET_HAVE_MVE) + { + add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode)); + } add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode)); add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode)); add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode)); diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c new file mode 100644 index 000..45f46b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ + +float +foo (float a, float b, float c) +{ + return a + b + c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fadd" } } */ +/* { dg-final { scan-assembler-times "bl\\t__aeabi_fadd" 2 } } */ + +float +foo1 (float a, float b, float c) +{ + return a - b - c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fsub" } } */ +/* { dg-final { scan-assembler-times "bl\\t__aeabi_fsub" 2 } } */ + +float +foo2 (float a, float b, float c) +{ + return a * b * c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fmul" } } */ +/* { dg-final { scan-assembler-times "bl\\t__aeabi_fmul" 2 } } */ + +float +foo3 (float b, float c) +{ + return b / c; +} + +/* { dg-final { scan-assembler "bl\\t__aeabi_fdiv" } } */ + +int +foo4 (float b, float c) +{ + return
Re: [PATCH v2][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
Hi Srinath, On 2/14/20 4:34 PM, Srinath Parvathaneni wrote: Hello Kyrill, In this patch (v2) all the review comments mentioned in previous patch (v1) are addressed. (v1) https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01395.html # Hello, This patch is part of MVE ACLE intrinsics framework. This patches add support to update (read/write) the APSR (Application Program Status Register) register and FPSCR (Floating-point Status and Control Register) register for MVE. This patch also enables thumb2 mov RTL patterns for MVE. A new feature bit vfp_base is added. This bit is enabled for all VFP, MVE and MVE with floating point extensions. This bit is used to enable the macro TARGET_VFP_BASE. For all the VFP instructions, RTL patterns, status and control registers are guarded by TARGET_HAVE_FLOAT. But this patch modifies that and the common instructions, RTL patterns, status and control registers bewteen MVE and VFP are guarded by TARGET_VFP_BASE macro. The RTL pattern set_fpscr and get_fpscr are updated to use VFPCC_REGNUM because few MVE intrinsics set/get carry bit of FPSCR register. Please refer to Arm reference manual [1] for more details. [1] https://developer.arm.com/docs/ddi0553/latest Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Ok (please test a big-endian target as well, as per the 1st framework patch). Thanks, Kyrill Thanks, Srinath gcc/ChangeLog: 2020-20-11 Andre Vieira Mihail Ionescu Srinath Parvathaneni * common/config/arm/arm-common.c (arm_asm_auto_mfpu): When vfp_base feature bit is on and -mfpu=auto is passed as compiler option, do not generate error on not finding any match fpu. Because in this case fpu is not required. * config/arm/arm-cpus.in (vfp_base): Define feature bit, this bit is enabled for MVE and also for all VFP extensions. (VFPv2): Modify fgroup to enable vfp_base feature bit when ever VFPv2 is enabled. (MVE): Define fgroup to enable feature bits mve, vfp_base and armv7em. (MVE_FP): Define fgroup to enable feature bits is fgroup MVE and FPv5 along with feature bits mve_float. (mve): Modify add options in armv8.1-m.main arch for MVE. (mve.fp): Modify add options in armv8.1-m.main arch for MVE with floating point. * config/arm/arm.c (use_return_insn): Replace the check with TARGET_VFP_BASE. (thumb2_legitimate_index_p): Replace TARGET_HARD_FLOAT with TARGET_VFP_BASE. (arm_rtx_costs_internal): Replace "TARGET_HARD_FLOAT || TARGET_HAVE_MVE" with TARGET_VFP_BASE, to allow cost calculations for copies in MVE as well. (arm_get_vfp_saved_size): Replace TARGET_HARD_FLOAT with TARGET_VFP_BASE, to allow space calculation for VFP registers in MVE as well. (arm_compute_frame_layout): Likewise. (arm_save_coproc_regs): Likewise. (arm_fixed_condition_code_regs): Modify to enable using VFPCC_REGNUM in MVE as well. (arm_hard_regno_mode_ok): Replace "TARGET_HARD_FLOAT || TARGET_HAVE_MVE" with equivalent macro TARGET_VFP_BASE. (arm_expand_epilogue_apcs_frame): Likewise. (arm_expand_epilogue): Likewise. (arm_conditional_register_usage): Likewise. (arm_declare_function_name): Add check to skip printing .fpu directive in assembly file when TARGET_VFP_BASE is enabled and fpu_to_print is "softvfp". * config/arm/arm.h (TARGET_VFP_BASE): Define. * config/arm/arm.md (arch): Add "mve" to arch. (eq_attr "arch" "mve"): Enable on TARGET_HAVE_MVE is true. (vfp_pop_multiple_with_writeback): Replace "TARGET_HARD_FLOAT || TARGET_HAVE_MVE" with equivalent macro TARGET_VFP_BASE. * config/arm/constraints.md (Uf): Define for MVE. * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Modify target guard to not allow for MVE. * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Move to volatile unspecs enum. (VUNSPEC_GET_FPSCR): Define. * config/arm/vfp.md (thumb2_movhi_vfp): Add support for VMSR and VMRS instructions which move to general-purpose Register from Floating-point Special register and vice-versa. (thumb2_movhi_fp16): Likewise. (thumb2_movsi_vfp): Add support for VMSR and VMRS instructions along with MCR and MRC instructions which set and get Floating-point Status and Control Register (FPSCR). (movdi_vfp): Modify pattern to enable Single-precision scalar float move in MVE. (thumb2_movdf_vfp): Modify pattern to enable Double-precision scalar float move patterns in MVE. (thumb2_movsfcc_vfp): Modify pattern to enable single float conditional code move patterns of VFP also in MVE by adding TARGET_VFP_BASE chec
Re: [PATCH v2][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
Hi Srinath, On 2/14/20 4:26 PM, Srinath Parvathaneni wrote: Hi Kyrill, > This patch series depends on upstream patches "Armv8.1-M Mainline Security Extension" [4], > "CLI and multilib support for Armv8.1-M Mainline MVE extensions" [5] and "support for Armv8.1-M > Mainline scalar shifts" [6]. Patch (version v1) was approved before. The above patches on which this patch (version v1) depends are committed to trunk last month. (version v1) https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01338.html This patch (Version v2) is re-based on latest trunk resolving few conflicts. Regression tested on arm-none-eabi and found no regressions. Can you please also test armeb-none-eabi to make sure big-endian works. Ok for trunk? If ok, please commit on my behalf. I don't have the commit rights. Ok, thanks. Please apply for a commit access using the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi using my name/email as approver. More details at https://gcc.gnu.org/gitwrite.html Then you can commit them yourself :) Thanks, Kyrill Thanks, Srinath ## Hello, This patch creates the required framework for MVE ACLE intrinsics. The following changes are done in this patch to support MVE ACLE intrinsics. Header file arm_mve.h is added to source code, which contains the definitions of MVE ACLE intrinsics and different data types used in MVE. Machine description file mve.md is also added which contains the RTL patterns defined for MVE. A new reigster "p0" is added which is used in by MVE predicated patterns. A new register class "VPR_REG" is added and its contents are defined in REG_CLASS_CONTENTS. The vec-common.md file is modified to support the standard move patterns. The prefix of neon functions which are also used by MVE is changed from "neon_" to "simd_". eg: neon_immediate_valid_for_move changed to simd_immediate_valid_for_move. In the patch standard patterns mve_move, mve_store and move_load for MVE are added and neon.md and vfp.md files are modified to support this common patterns. Please refer to Arm reference manual [1] for more details. [1] https://developer.arm.com/docs/ddi0553/latest Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath gcc/ChangeLog: 2020-02-10 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config.gcc (arm_mve.h): Include mve intrinsics header file. * config/arm/aout.h (p0): Add new register name for MVE predicated cases. * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define macro common to Neon and MVE. (ARM_BUILTIN_NEON_LANE_CHECK): Renamed to ARM_BUILTIN_SIMD_LANE_CHECK. (arm_init_simd_builtin_types): Disable poly types for MVE. (arm_init_neon_builtins): Move a check to arm_init_builtins function. (arm_init_builtins): Use ARM_BUILTIN_SIMD_LANE_CHECK instead of ARM_BUILTIN_NEON_LANE_CHECK. (mve_dereference_pointer): Add function. (arm_expand_builtin_args): Call to mve_dereference_pointer when MVE is enabled. (arm_expand_neon_builtin): Moved to arm_expand_builtin function. (arm_expand_builtin): Moved from arm_expand_neon_builtin function. * config/arm/arm-c.c (__ARM_FEATURE_MVE): Define macro for MVE and MVE with floating point enabled. * config/arm/arm-protos.h (neon_immediate_valid_for_move): Renamed to simd_immediate_valid_for_move. (simd_immediate_valid_for_move): Renamed from neon_immediate_valid_for_move function. * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Generate error if vfpv2 feature bit is disabled and mve feature bit is also disabled for HARD_FLOAT_ABI. (use_return_insn): Check to not push VFP regs for MVE. (aapcs_vfp_allocate): Add MVE check to have same Procedure Call Standard as Neon. (aapcs_vfp_allocate_return_reg): Likewise. (thumb2_legitimate_address_p): Check to return 0 on valid Thumb-2 address operand for MVE. (arm_rtx_costs_internal): MVE check to determine cost of rtx. (neon_valid_immediate): Rename to simd_valid_immediate. (simd_valid_immediate): Rename from neon_valid_immediate. (simd_valid_immediate): MVE check on size of vector is 128 bits. (neon_immediate_valid_for_move): Rename to simd_immediate_valid_for_move. (simd_immediate_valid_for_move): Rename from neon_immediate_valid_for_move. (neon_immediate_valid_for_logic): Modify call to neon_valid_immediate function. (neon_make_constant): Modify call to neon_valid_immediate function. (neon_vector_mem_operand): Return VFP register for POST_INC or PRE_DEC for MVE. (output_move_neon): Add MVE check to generate vldm/vstm instrcutions. (arm_compute_frame_layout): Ca
Re: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
Hi Stam, On 2/10/20 1:35 PM, Stam Markianos-Wright wrote: On 2/3/20 11:20 AM, Stam Markianos-Wright wrote: > > > On 1/27/20 3:54 PM, Stam Markianos-Wright wrote: >> >> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: >>> >>> >>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: > > > On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >> Hi all, >> >> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >> operations (vector/by element) to the ARM back-end. >> >> These are: >> usdot (vector), dot (by element). >> >> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >> for ARM they remain optional as of ARMv8.6-a. >> >> The functions are declared in arm_neon.h, RTL patterns are defined to >> generate assembler and tests are added to verify and perform adequate checks. >> >> Regression testing on arm-none-eabi passed successfully. >> >> This patch depends on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >> >> for ARM CLI updates, and on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >> >> for testsuite effective_target update. >> >> Ok for trunk? > New diff addressing review comments from Aarch64 version of the patch. _Change of order of operands in RTL patterns. _Change tests to use check-function-bodies, compile with optimisation and check for exact registers. _Rename tests to remove "-compile-" in filename. >>> > .Ping! Ping :) Diff re-attached in this ping email is same as the one posted on 10/01 Thank you! Sorry for the delay. This is ok. Thanks, Kyrill > . >>> >>> Cheers, >>> Stam >>> >> >> >> ACLE documents are at https://developer.arm.com/docs/101028/latest >> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >> >> PS. I don't have commit rights, so if someone could commit on my behalf, >> that would be great :) >> >> >> gcc/ChangeLog: >> >> 2019-11-28 Stam Markianos-Wright >> >> * config/arm/arm-builtins.c (enum arm_type_qualifiers): >> (USTERNOP_QUALIFIERS): New define. >> (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >> (arm_expand_builtin_args): >> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >> * config/arm/arm_neon.h (vusdot_s32): New. >> (vusdot_lane_s32): New. >> (vusdotq_lane_s32): New. >> (vsudot_lane_s32): New. >> (vsudotq_lane_s32): New. >> * config/arm/arm_neon_builtins.def >> (usdot,usdot_lane,sudot_lane): New. >> * config/arm/iterators.md (DOTPROD_I8MM): New. >> (sup, opsuffix): Add . >> * config/arm/neon.md (neon_usdot, dot_lane: New. >> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >> >> >> gcc/testsuite/ChangeLog: >> >> 2019-12-12 Stam Markianos-Wright >> >> * gcc.target/arm/simd/vdot-2-1.c: New test. >> * gcc.target/arm/simd/vdot-2-2.c: New test. >> * gcc.target/arm/simd/vdot-2-3.c: New test. >> * gcc.target/arm/simd/vdot-2-4.c: New test. >> >>
Re: [GCC][PATCH][ARM] Regenerate arm-tables.opt for Armv8.1-M patch
On 2/3/20 5:18 PM, Mihail Ionescu wrote: Hi all, I've regenerated arm-tables.opt in config/arm to replace the improperly generated arm-tables.opt file from "[PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline" (9722215a027b68651c3c7a8af9204d033197e9c0). 2020-02-03 Mihail Ionescu * config/arm/arm-tables.opt: Regenerate. Ok for trunk? Ok. I would consider it obvious too. Thanks, Kyrill Regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index f295a4cffa2bbb3f8163fb9cef784b5af59aee12..a51a131505d184f120a3cfc51273b419bb0cb103 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -353,13 +353,16 @@ EnumValue Enum(arm_arch) String(armv8-m.main) Value(28) EnumValue -Enum(arm_arch) String(armv8.1-m.main) Value(29) +Enum(arm_arch) String(armv8-r) Value(29) EnumValue -Enum(arm_arch) String(iwmmxt) Value(30) +Enum(arm_arch) String(armv8.1-m.main) Value(30) EnumValue -Enum(arm_arch) String(iwmmxt2) Value(31) +Enum(arm_arch) String(iwmmxt) Value(31) + +EnumValue +Enum(arm_arch) String(iwmmxt2) Value(32) Enum Name(arm_fpu) Type(enum fpu_type)
Re: [GCC][PATCH][ARM] Set profile to M for Armv8.1-M
On 2/4/20 1:49 PM, Christophe Lyon wrote: On Mon, 3 Feb 2020 at 18:20, Mihail Ionescu wrote: > > Hi, > > We noticed that the profile for armv8.1-m.main was not set in arm-cpus.in > , which led to TARGET_ARM_ARCH_PROFILE and _ARM_ARCH_PROFILE not being > defined properly. > > > > gcc/ChangeLog: > > 2020-02-03 Mihail Ionescu > > * config/arm/arm-cpus.in: Set profile M > for armv8.1-m.main. > > > Ok for trunk? > > Regards, > Mihail > > > ### Attachment also inlined for ease of reply ### > > > diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in > index 1805b2b1cd8d6f65a967b4e3945257854a7e0fc1..96f584da325172bd1460251e2de0ad679589d312 100644 > --- a/gcc/config/arm/arm-cpus.in > +++ b/gcc/config/arm/arm-cpus.in > @@ -692,6 +692,7 @@ begin arch armv8.1-m.main > tune for cortex-m7 > tune flags CO_PROC > base 8M_MAIN > + profile M > isa ARMv8_1m_main > # fp => FPv5-sp-d16; fp.dp => FPv5-d16 > option dsp add armv7em > I'm wondering whether this is obvious? OTOH, what's the impact of missing this (or why didn't we notice the problem via a failing testcase?) It's only used to set the __ARM_ARCH_PROFILE macro in arm-c.c I do agree that the patch is obvious, so go ahead and commit this please, Mihail. Thanks, Kyrill Christophe
Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD
Hi Delia, On 1/28/20 4:44 PM, Delia Burduv wrote: Ping. *From:* Delia Burduv *Sent:* 22 January 2020 17:26 *To:* gcc-patches@gcc.gnu.org *Cc:* ni...@redhat.com ; Richard Earnshaw ; Ramana Radhakrishnan ; Kyrylo Tkachov *Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD Ping. I have read Richard Sandiford's comments on the AArch64 patches and I will apply what is relevant to this patch as well. Particularly, I will change the tests to use the exact input and output registers and I will change the types of the rtl patterns. Please send the updated patches so that someone can commit them for you once they're reviewed. Thanks, Kyrill On 12/20/19 6:44 PM, Delia Burduv wrote: > This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and vfmat > as part of the BFloat16 extension. > (https://developer.arm.com/docs/101028/latest.) > The intrinsics are declared in arm_neon.h and the RTL patterns are > defined in neon.md. > Two new tests are added to check assembler output and lane indices. > > This patch depends on the Arm back-end patche. > (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) > > Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have > commit rights, so if this is ok can someone please commit it for me? > > gcc/ChangeLog: > > 2019-11-12 Delia Burduv > > * config/arm/arm_neon.h (vbfmmlaq_f32): New. > (vbfmlalbq_f32): New. > (vbfmlaltq_f32): New. > (vbfmlalbq_lane_f32): New. > (vbfmlaltq_lane_f32): New. > (vbfmlalbq_laneq_f32): New. > (vbfmlaltq_laneq_f32): New. > * config/arm/arm_neon_builtins.def (vbfmmla): New. > (vbfmab): New. > (vbfmat): New. > (vbfmab_lane): New. > (vbfmat_lane): New. > (vbfmab_laneq): New. > (vbfmat_laneq): New. > * config/arm/iterators.md (BF_MA): New int iterator. > (bt): New int attribute. > (VQXBF): Copy of VQX with V8BF. > (V_HALF): Added V8BF. > * config/arm/neon.md (neon_vbfmmlav8hi): New insn. > (neon_vbfmav8hi): New insn. > (neon_vbfma_lanev8hi): New insn. > (neon_vbfma_laneqv8hi): New expand. > (neon_vget_high): Changed iterator to VQXBF. > * config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC. > (UNSPEC_BFMAB): New UNSPEC. > (UNSPEC_BFMAT): New UNSPEC. > > 2019-11-12 Delia Burduv > > * gcc.target/arm/simd/bf16_ma_1.c: New test. > * gcc.target/arm/simd/bf16_ma_2.c: New test. > * gcc.target/arm/simd/bf16_mmla_1.c: New test.
Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 1/30/20 2:42 PM, Stam Markianos-Wright wrote: On 1/28/20 10:35 AM, Kyrill Tkachov wrote: Hi Stam, On 1/8/20 3:18 PM, Stam Markianos-Wright wrote: On 12/10/19 5:03 PM, Kyrill Tkachov wrote: Hi Stam, On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: Pinging with more correct maintainers this time :) Also would need to backport to gcc7,8,9, but need to get this approved first! Sorry for the delay. Same here now! Sorry totally forget about this in the lead up to Xmas! Done the changes marked below and also removed the unnecessary extra #defines from the test. This is ok with a nit on the testcase... diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c b/gcc/testsuite/gcc.target/arm/pr91816.c new file mode 100644 index ..757c897e9c0db32709227b3fdf1b4a8033428232 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr91816.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" } */ +int printf(const char *, ...); + I think this needs a couple of effective target checks like arm_hard_vfp_ok and arm_thumb2_ok. See other tests in gcc.target/arm that add -mthumb to the options. Hmm, looking back at this now, is there any reason why it can't just be: /* { dg-do compile } */ /* { dg-require-effective-target arm_thumb2_ok } */ /* { dg-additional-options "-mthumb" } */ were we don't override the march or fpu options at all, but just use `require-effective-target arm_thumb2_ok` to make sure that thumb2 is supported? The attached new diff does just that. Works for me, there are plenty of configurations run with fpu that it should get the right coverage. Ok (make sure commit the updated, if needed, ChangeLog as well) Thanks! Kyrill Cheers :) Stam. Thanks, Kyrill
Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
Hi Stam, On 1/8/20 3:18 PM, Stam Markianos-Wright wrote: On 12/10/19 5:03 PM, Kyrill Tkachov wrote: Hi Stam, On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: Pinging with more correct maintainers this time :) Also would need to backport to gcc7,8,9, but need to get this approved first! Sorry for the delay. Same here now! Sorry totally forget about this in the lead up to Xmas! Done the changes marked below and also removed the unnecessary extra #defines from the test. This is ok with a nit on the testcase... diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c b/gcc/testsuite/gcc.target/arm/pr91816.c new file mode 100644 index ..757c897e9c0db32709227b3fdf1b4a8033428232 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr91816.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" } */ +int printf(const char *, ...); + I think this needs a couple of effective target checks like arm_hard_vfp_ok and arm_thumb2_ok. See other tests in gcc.target/arm that add -mthumb to the options. Thanks, Kyrill
Re: [PATCH, GCC/ARM, 1/2] Add support for ASRL(reg) and LSLL(reg) instructions for Armv8.1-M Mainline
On 12/18/19 1:23 PM, Mihail Ionescu wrote: Hi Kyrill, On 12/11/2019 05:50 PM, Kyrill Tkachov wrote: > Hi Mihail, > > On 11/14/19 1:54 PM, Mihail Ionescu wrote: >> Hi, >> >> This patch adds the new scalar shift instructions for Armv8.1-M >> Mainline to the arm backend. >> This patch is adding the following instructions: >> >> ASRL (reg) >> LSLL (reg) >> > > Sorry for the delay, very busy time for GCC development :( > > >> >> ChangeLog entry are as follow: >> >> *** gcc/ChangeLog *** >> >> >> 2019-11-14 Mihail-Calin Ionescu >> 2019-11-14 Sudakshina Das >> >> * config/arm/arm.h (TARGET_MVE): New macro for MVE support. > > > I don't see this hunk in the patch... There's a lot of v8.1-M-related > patches in flight. Is it defined elsewhere? Thanks for having a look at this. Yes, I forgot to remove that bit from the ChangeLog and mention that the patch depends on the Armv8.1-M MVE CLI -- https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.htm which introduces the required TARGET_* macros needed. I've updated the ChangeLog to reflect that: *** gcc/ChangeLog *** 2019-12-18 Mihail-Calin Ionescu 2019-12-18 Sudakshina Das * config/arm/arm.md (ashldi3): Generate thumb2_lsll for TARGET_HAVE_MVE. (ashrdi3): Generate thumb2_asrl for TARGET_HAVE_MVE. * config/arm/arm.c (arm_hard_regno_mode_ok): Allocate even odd register pairs for doubleword quantities for ARMv8.1M-Mainline. * config/arm/thumb2.md (thumb2_asrl): New. (thumb2_lsll): Likewise. > > >> * config/arm/arm.md (ashldi3): Generate thumb2_lsll for >> TARGET_MVE. >> (ashrdi3): Generate thumb2_asrl for TARGET_MVE. >> * config/arm/arm.c (arm_hard_regno_mode_ok): Allocate even odd >> register pairs for doubleword quantities for ARMv8.1M-Mainline. >> * config/arm/thumb2.md (thumb2_asrl): New. >> (thumb2_lsll): Likewise. >> >> *** gcc/testsuite/ChangeLog *** >> >> 2019-11-14 Mihail-Calin Ionescu >> 2019-11-14 Sudakshina Das >> >> * gcc.target/arm/armv8_1m-shift-reg_1.c: New test. >> >> Testsuite shows no regression when run for arm-none-eabi targets. >> >> Is this ok for trunk? >> >> Thanks >> Mihail >> >> >> ### Attachment also inlined for ease of reply >> ### >> >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index >> be51df7d14738bc1addeab8ac5a3806778106bce..bf788087a30343269b30cf7054ec29212ad9c572 >> 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -24454,14 +24454,15 @@ arm_hard_regno_mode_ok (unsigned int regno, >> machine_mode mode) >> >> /* We allow almost any value to be stored in the general registers. >> Restrict doubleword quantities to even register pairs in ARM state >> - so that we can use ldrd. Do not allow very large Neon structure >> - opaque modes in general registers; they would use too many. */ >> + so that we can use ldrd and Armv8.1-M Mainline instructions. >> + Do not allow very large Neon structure opaque modes in general >> + registers; they would use too many. */ > > > This comment now reads: > > "Restrict doubleword quantities to even register pairs in ARM state > so that we can use ldrd and Armv8.1-M Mainline instructions." > > Armv8.1-M Mainline is not ARM mode though, so please clarify this > comment further. > > Looks ok to me otherwise (I may even have merged this with the second > patch, but I'm not complaining about keeping it simple :) ) > > Thanks, > > Kyrill > I've now updated the comment to read: "Restrict doubleword quantities to even register pairs in ARM state so that we can use ldrd. The same restriction applies for MVE." Ok. Thanks, Kyril Regards, Mihail > >> if (regno <= LAST_ARM_REGNUM) >> { >> if (ARM_NUM_REGS (mode) > 4) >> return false; >> >> - if (TARGET_THUMB2) >> + if (TARGET_THUMB2 && !TARGET_HAVE_MVE) >> return true; >> >> return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) >> != 0); >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >> index >> a91a4b941c3f9d2c3d443f9f4639069ae953fb3b..b735f858a6a5c94d02a6765c1b349cdcb5e77ee3 >> 100644 >> --- a/gcc/config/arm/arm.md >> +++ b/gcc/config/arm/arm.md >> @@ -3503,6 +3503,22 @@ >>
Re: [PATCH][AARCH64] Set jump-align=4 for neoversen1
Hi Richard, Wilco, On 1/17/20 8:43 AM, Richard Sandiford wrote: Wilco Dijkstra writes: > Testing shows the setting of 32:16 for jump alignment has a significant codesize > cost, however it doesn't make a difference in performance. So set jump-align > to 4 to get 1.6% codesize improvement. I was leaving this to others in case it was obvious to them. On the basis that silence suggests it wasn't, :-) could you go into more details? Is it expected on first principles that jump alignment doesn't matter for Neoverse N1, or is this purely based on experimentation? If it's expected, are we sure that the other "32:16" entries are still worthwhile? When you say it doesn't make a difference in performance, does that mean that no individual test's performance changed significantly, or just that the aggregate score didn't? Did you experiment with anything inbetween the current 32:16 and 4, such as 32:8 or even 32:4? Sorry for dragging my feet on this one, as I put in those numbers last year and I've been trying to recall my experiments from then. The Neoverse N1 Software Optimization guide recommends aligning branch targets to 32 bytes withing the bounds of code density requirements. From my benchmarking last year I do seem to remember function and loop alignment to matter. I probably added the jump alignment for completeness as it's a good idea from first principles. But if the code size hit is too large we could look to decrease it. I'd also be interested in seeing the impact of 32:8 and 32:4. Thanks, Kyrill The problem with applying the patch only with the explanation above is that if someone in future has evidence that jump alignment can make a difference for their testcase, it would be very hard for them to reproduce the reasoning that led to this change. Thanks, Richard > OK for commit? > > ChangeLog > 2019-12-24 Wilco Dijkstra > > * config/aarch64/aarch64.c (neoversen1_tunings): Set jump_align to 4. > > -- > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 1646ed1d9a3de8ee2f0abff385a1ea145e234475..209ed8ebbe81104d9d8cff0df31946ab7704fb33 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -1132,7 +1132,7 @@ static const struct tune_params neoversen1_tunings = > 3, /* issue_rate */ > (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ > "32:16",/* function_align. */ > - "32:16",/* jump_align. */ > + "4",/* jump_align. */ > "32:16",/* loop_align. */ > 2,/* int_reassoc_width. */ > 4,/* fp_reassoc_width. */
Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)
On 1/14/20 1:50 PM, Christophe Lyon wrote: On Mon, 13 Jan 2020 at 14:49, Kyrill Tkachov wrote: Hi Christophe, On 12/17/19 3:31 PM, Kyrill Tkachov wrote: On 12/17/19 2:33 PM, Christophe Lyon wrote: On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov wrote: Hi Christophe, On 11/18/19 9:00 AM, Christophe Lyon wrote: On Wed, 13 Nov 2019 at 15:46, Christophe Lyon wrote: On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists) wrote: On 18/10/2019 14:18, Christophe Lyon wrote: + bool not_supported = arm_arch_notm || flag_pic || TARGET_NEON; This is a poor name in the context of the function as a whole. What's not supported. Please think of a better name so that I have some idea what the intention is. That's to keep most of the code common when checking if -mpure-code and -mslow-flash-data are supported. These 3 cases are common to the two compilation flags, and -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition. Would "common_unsupported_modes" work better for you? Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in the two tests. Hi, Here is an updated version, using "common_unsupported_modes" instead of "not_supported", and fixing the typo reported by Kyrill. The ChangeLog is still the same. OK? The name looks ok to me. Richard had a concern about Armv8-M Baseline, but I do see it being supported as you pointed out. So I believe all the concerns are addressed. OK, thanks! Thus the code is ok. However, please also updated the documentation for -mpure-code in invoke.texi (it currently states that a MOVT instruction is needed). I didn't think about this :( It currently says: "This option is only available when generating non-pic code for M-profile targets with the MOVT instruction." I suggest to remove the "with the MOVT instruction" part. Is that OK if I commit my patch and this doc change? Yes, I think that is simplest correct change to make. Can you also send a patch to the changes.html page for GCC 10 making users aware that this restriction is now lifted? Sure. I should have thought of it when I submitted the GCC patch... How about the attached? I'm not sure about the right upper/lower case and markers Thanks, Christophe commit ba2a354c9ed6c75ec00bf21dd6938b89a113a96e Author: Christophe Lyon Date: Tue Jan 14 13:48:19 2020 + [arm] Document -mpure-code support for v6m in gcc-10 diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html index caa9df7..26cdf66 100644 --- a/htdocs/gcc-10/changes.html +++ b/htdocs/gcc-10/changes.html @@ -417,7 +417,11 @@ a work-in-progress. data-processing intrinsics to include 32-bit SIMD, saturating arithmetic, 16-bit multiplication and other related intrinsics aimed at DSP algorithm optimization. - + + Support for -mpure-code in Thumb-1 (v6m) has been + added: this M-profile feature is no longer restricted to targets + with MOTV. For instance, Cortex-M0 is now + supported Typo in MOVT. Let's make the last sentence. "For example, -mcpu=cortex-m0 now supports this option." Ok with those changes. Thanks, Kyrill AVR
Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM
On 12/18/19 1:26 PM, Mihail Ionescu wrote: Hi Kyrill, On 12/17/2019 10:26 AM, Kyrill Tkachov wrote: Hi Mihail, On 12/16/19 6:29 PM, Mihail Ionescu wrote: Hi Kyrill, On 11/12/2019 09:55 AM, Kyrill Tkachov wrote: Hi Mihail, On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to improve code density of functions with the cmse_nonsecure_entry attribute and when calling function with the cmse_nonsecure_call attribute by using CLRM to do all the general purpose registers clearing as well as clearing the APSR register. === Patch description === This patch adds a new pattern for the CLRM instruction and guards the current clearing code in output_return_instruction() and thumb_exit() on Armv8.1-M Mainline instructions not being present. cmse_clear_registers () is then modified to use the new CLRM instruction when targeting Armv8.1-M Mainline while keeping Armv8-M register clearing code for VFP registers. For the CLRM instruction, which does not mandated APSR in the register list, checking whether it is the right volatile unspec or a clearing register is done in clear_operation_p. Note that load/store multiple were deemed sufficiently different in terms of RTX structure compared to the CLRM pattern for a different function to be used to validate the match_parallel. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm-protos.h (clear_operation_p): Declare. * config/arm/arm.c (clear_operation_p): New function. (cmse_clear_registers): Generate clear_multiple instruction pattern if targeting Armv8.1-M Mainline or successor. (output_return_instruction): Only output APSR register clearing if Armv8.1-M Mainline instructions not available. (thumb_exit): Likewise. * config/arm/predicates.md (clear_multiple_operation): New predicate. * config/arm/thumb2.md (clear_apsr): New define_insn. (clear_multiple): Likewise. * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile unspec. *** gcc/testsuite/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM. * gcc.target/arm/cmse/bitfield-2.c: Likewise. * gcc.target/arm/cmse/bitfield-3.c: Likewise. * gcc.target/arm/cmse/struct-1.c: Likewise. * gcc.target/arm/cmse/cmse-14.c: Likewise. * gcc.target/arm/cmse/cmse-1.c: Likewise. Restrict checks for Armv8-M GPR clearing when CLRM is not available. * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise. Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Best regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-prot
Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)
Hi Christophe, On 12/17/19 3:31 PM, Kyrill Tkachov wrote: On 12/17/19 2:33 PM, Christophe Lyon wrote: On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov wrote: Hi Christophe, On 11/18/19 9:00 AM, Christophe Lyon wrote: On Wed, 13 Nov 2019 at 15:46, Christophe Lyon wrote: On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists) wrote: On 18/10/2019 14:18, Christophe Lyon wrote: + bool not_supported = arm_arch_notm || flag_pic || TARGET_NEON; This is a poor name in the context of the function as a whole. What's not supported. Please think of a better name so that I have some idea what the intention is. That's to keep most of the code common when checking if -mpure-code and -mslow-flash-data are supported. These 3 cases are common to the two compilation flags, and -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition. Would "common_unsupported_modes" work better for you? Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in the two tests. Hi, Here is an updated version, using "common_unsupported_modes" instead of "not_supported", and fixing the typo reported by Kyrill. The ChangeLog is still the same. OK? The name looks ok to me. Richard had a concern about Armv8-M Baseline, but I do see it being supported as you pointed out. So I believe all the concerns are addressed. OK, thanks! Thus the code is ok. However, please also updated the documentation for -mpure-code in invoke.texi (it currently states that a MOVT instruction is needed). I didn't think about this :( It currently says: "This option is only available when generating non-pic code for M-profile targets with the MOVT instruction." I suggest to remove the "with the MOVT instruction" part. Is that OK if I commit my patch and this doc change? Yes, I think that is simplest correct change to make. Can you also send a patch to the changes.html page for GCC 10 making users aware that this restriction is now lifted? Thanks, Kyrill Thanks, Kyrill Christophe Thanks, Kyrill Thanks, Christophe Thanks, Christophe R.
Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [2/2]
Hi Stam, On 1/10/20 6:47 PM, Stam Markianos-Wright wrote: Hi all, This patch is part 2 of Bfloat16_t enablement in the ARM back-end. This new type is constrained using target hooks TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only be used through ACLE intrinsics (will be provided in later patches). Regression testing on arm-none-eabi passed successfully. Ok for trunk? Ok. Thanks, Kyrill Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest Details on ARM Bfloat can be found here: https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a gcc/ChangeLog: 2020-01-10 Stam Markianos-Wright * config/arm/arm.c (arm_invalid_conversion): New function for target hook. (arm_invalid_unary_op): New function for target hook. (arm_invalid_binary_op): New function for target hook. 2020-01-10 Stam Markianos-Wright * gcc.target/arm/bfloat16_scalar_typecheck.c: New test. * gcc.target/arm/bfloat16_vector_typecheck_1.c: New test. * gcc.target/arm/bfloat16_vector_typecheck_2.c: New test.
Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
Hi Stam, On 1/10/20 6:45 PM, Stam Markianos-Wright wrote: Hi all, This is a respin of patch: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html which has now been split into two (similar to the Aarch64 version). This is patch 1 of 2 and adds Bfloat type support to the ARM back-end. It also adds a new machine_mode (BFmode) for this type and accompanying Vector modes V4BFmode and V8BFmode. The second patch in this series uses existing target hooks to restrict type use. Regression testing on arm-none-eabi passed successfully. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for test suite effective_target update. Ok for trunk? This is ok, thanks. You can commit it once the git conversion goes through :) Kyrill Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest Details on ARM Bfloat can be found here: https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a gcc/ChangeLog: 2020-01-10 Stam Markianos-Wright * config.gcc: Add arm_bf16.h. * config/arm/arm-builtins.c (arm_mangle_builtin_type): Fix comment. (arm_simd_builtin_std_type): Add BFmode. (arm_init_simd_builtin_types): Define element types for vector types. (arm_init_bf16_types): New function. (arm_init_builtins): Add arm_init_bf16_types function call. * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes. * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF. * config/arm/arm.c (aapcs_vfp_sub_candidate): Add BFmode. (arm_hard_regno_mode_ok): Add BFmode and tidy up statements. (arm_vector_mode_supported_p): Add V4BF, V8BF. (arm_mangle_type): * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE, VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node, arm_bf16_ptr_type_node. * config/arm/arm.md: New enabled_for_bfmode_scalar, enabled_for_bfmode_vector attributes. Add BFmode to movhf expand. pattern and define_split between ARM registers. * config/arm/arm_bf16.h: New file. * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types. * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New. (VQXMOV): Add V8BF. * config/arm/neon.md: Add BF vector types to NEON move patterns. * config/arm/vfp.md: Add BFmode to movhf patterns. gcc/testsuite/ChangeLog: 2020-01-10 Stam Markianos-Wright * g++.dg/abi/mangle-neon.C: Add Bfloat vector types. * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test. * gcc.target/arm/bfloat16_scalar_1_1.c: New test. * gcc.target/arm/bfloat16_scalar_1_2.c: New test. * gcc.target/arm/bfloat16_scalar_2_1.c: New test. * gcc.target/arm/bfloat16_scalar_2_2.c: New test. * gcc.target/arm/bfloat16_scalar_3_1.c: New test. * gcc.target/arm/bfloat16_scalar_3_2.c: New test. * gcc.target/arm/bfloat16_scalar_4.c: New test. * gcc.target/arm/bfloat16_simd_1_1.c: New test. * gcc.target/arm/bfloat16_simd_1_2.c: New test. * gcc.target/arm/bfloat16_simd_2_1.c: New test. * gcc.target/arm/bfloat16_simd_2_2.c: New test. * gcc.target/arm/bfloat16_simd_3_1.c: New test. * gcc.target/arm/bfloat16_simd_3_2.c: New test.
Re: [Patch 0/X] HWASAN v3
On 1/8/20 11:26 AM, Matthew Malcomson wrote: Hi everyone, I'm writing this email to summarise & publicise the state of this patch series, especially the difficulties around approval for GCC 10 mentioned on IRC. The main obstacle seems to be that no maintainer feels they have enough knowledge about hwasan and justification that it's worthwhile to approve the patch series. Similarly, Martin has given a review of the parts of the code he can (thanks!), but doesn't feel he can do a deep review of the code related to the RTL hooks and stack expansion -- hence that part is as yet not reviewed in-depth. The questions around justification raised on IRC are mainly that it seems like a proof-of-concept for MTE rather than a stand-alone useable sanitizer. Especially since in the GNU world hwasan instrumented code is not really ready for production since we can only use the less-"interceptor ABI" rather than the "platform ABI". This restriction is because there is no version of glibc with the required modifications to provide the "platform ABI". (n.b. that since https://reviews.llvm.org/D69574 the code-generation for these ABI's is the same). From my perspective the reasons that make HWASAN useful in itself are: 1) Much less memory usage. From a back-of-the-envelope calculation based on the hwasan paper's table of memory overhead from over-alignment https://arxiv.org/pdf/1802.09517.pdf I guess hwasan instrumented code has an overhead of about 1.1x (~4% from overalignment and ~6.25% from shadow memory), while asan seems to have an overhead somewhere in the range 1.5x - 3x. Maybe there's some data out there comparing total overheads that I haven't found? (I'd appreciate a reference if anyone has that info). 2) Available on more architectures that MTE. HWASAN only requires TBI, which is a feature of all AArch64 machines, while MTE will be an optional extension and only available on certain architectures. 3) This enables using hwasan in the kernel. While instrumented user-space applications will be using the "interceptor ABI" and hence are likely not production-quality, the biggest aim of implementing hwasan in GCC is to allow building the Linux kernel with tag-based sanitization using GCC. Instrumented kernel code uses hooks in the kernel itself, so this ABI distinction is no longer relevant, and this sanitizer should produce a production-quality kernel binary. I'm hoping I can find a maintainer willing to review and ACK this patch series -- especially with stage3 coming to a close soon. If there's anything else I could do to help get someone willing up-to-speed then please just ask. FWIW I've reviewed the aarch64 parts over the lifetime of the patch series and I am okay with them. Given the reviews of the sanitiser, library and aarch64 backend components, and the data at https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00387.html how can we move forward with commit approval ? Is this something a global reviewer can help with, Jeff ? :) Thanks, Kyrill Cheers, Matthew On 07/01/2020 15:14, Martin Liška wrote: > On 12/12/19 4:18 PM, Matthew Malcomson wrote: > > Hello. > > I've just sent few comments that are related to the v3 of the patch set. > Based on the HWASAN (limited) knowledge the patch seems reasonable to me. > I haven't looked much at the newly introduced RTL-hooks. > But these seems to me isolated to the aarch64 port. > > I can also verify that the patchset works on my aarch64 linux machine and > hwasan.exp and asan.exp tests succeed. > >> I haven't gotten ASAN_MARK to print as HWASAN_MARK when using memory >> tagging, >> since I'm not sure the way I found to implement this would be >> acceptable. The >> inlined patch below works but it requires a special declaration >> instead of just >> an ~#include~. > > Knowing that, I would not bother with the printing of HWASAN_MARK. > > Thanks for the series, > Martin
[PATCH][wwwdocs] GCC 10 changes.html for arm and aarch64
Hi all, This patch adds initial entries for notable features that went in to GCC 10 on the arm and aarch64 front. The list is by no means complete so if you'd like your contribution called please shout or post a follow-up patch. It is, nevertheless, a decent start at the relevant sections in changes.html Thanks, Kyrill commit b539d38b322883ed5aa6563ac879af6a5ebabd96 Author: Kyrylo Tkachov Date: Thu Nov 7 17:58:45 2019 + [arm/aarch64] GCC 10 changes.html diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html index d6108269..8f498017 100644 --- a/htdocs/gcc-10/changes.html +++ b/htdocs/gcc-10/changes.html @@ -322,17 +322,102 @@ a work-in-progress. New Targets and Target Specific Improvements - +AArch64 & arm + + The AArch64 and arm ports now support condition flag output constraints + in inline assembly, as indicated by the __GCC_ASM_FLAG_OUTPUTS__. + On arm this feature is only available for A32 and T32 targets. + Please refer to the documentation for more details. + + +AArch64 + + The -mbranch-protection=pac-ret option now accepts the + optional argument +b-key extension to perform return address + signing with the B-key instead of the A-key. + + The Transactional Memory Extension is now supported through ACLE + intrinsics. It can be enabled through the +tme option + extension (for example, -march=armv8.5-a+tme). + + Initial autovectorization support for SVE2 has been added and can be + enabled through the +sve2 option extension (for example, + -march=armv8.5-a+sve2). Additional extensions can be enabled + through +sve2-sm4, +sve2=aes, + +sve2-sha3, +sve2-bitperm. + + A number of features from the Armv8.5-a are now supported through ACLE + intrinsics. These include: + + The random number instructions that can be enabled + through the (already present in GCC 9.1) +rng option + extension. + Floating-point intrinsics to round to integer instructions from + Armv8.5-a when targeting -march=armv8.5-a or later. + Memory Tagging Extension intrinsics enabled through the + +memtag option extension. + + + The option -moutline-atomics has been added to aid + deployment of the Large System Extensions (LSE) on GNU/Linux systems built + with a baseline architecture targeting Armv8-A. When the option is + specified code is emitted to detect the presence of LSE instructions at + runtime and use them for standard atomic operations. + For more information please refer to the documentation. + + + Support has been added for the following processors + (GCC identifiers in parentheses): + + Arm Cortex-A77 (cortex-a77). + Arm Cortex-A76AE (cortex-a76ae). + Arm Cortex-A65 (cortex-a65). + Arm Cortex-A65AE (cortex-a65ae). + Arm Cortex-A34 (cortex-a34). + + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a77 or + -mtune=cortex-a65ae or as arguments to the equivalent target + attributes and pragmas. + + -ARM +arm Support for the FDPIC ABI has been added. It uses 64-bit function descriptors to represent pointers to functions, and enables code sharing on MMU-less systems. The corresponding target triple is arm-uclinuxfdpiceabi, and the C library is uclibc-ng. + Support has been added for the Arm EABI on NetBSD through the + arm*-*-netbsdelf-*eabi* triplet. + + The handling of 64-bit integer operations has been significantly reworked + and improved leading to improved performance and reduced stack usage when using + 64-bit integral data types. The option -mneon-for-64bits is now + deprecated and will be removed in a future release. + + Support has been added for the following processors + (GCC identifiers in parentheses): + + Arm Cortex-A77 (cortex-a77). + Arm Cortex-A76AE (cortex-a76ae). + Arm Cortex-M35P (cortex-m35p). + + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a77 or + -mtune=cortex-m35p. + + Support has been extended for the ACLE + https://developer.arm.com/docs/101028/0009/data-processing-intrinsics";> + data-processing intrinsics to include 32-bit SIMD, saturating arithmetic, + 16-bit multiplication and other related intrinsics aimed at DSP algorithm + optimization. +
Re: [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16
Hi Dennis, On 12/12/19 5:30 PM, Dennis Zhang wrote: Hi all, On 22/11/2019 14:33, Dennis Zhang wrote: > Hi all, > > This patch is part of a series adding support for Armv8.6-A features. > It enables options including -march=armv8.6-a, +i8mm and +bf16. > The +i8mm and +bf16 features are optional for Armv8.2-a and onward. > Documents are at https://developer.arm.com/docs/ddi0596/latest > > Regtested for arm-none-linux-gnueabi-armv8-a. > This is an update to rebase the patch to the top. Some issues are fixed according to the recent CLI patch for AArch64. ChangeLog is updated as following: gcc/ChangeLog: 2019-12-12 Dennis Zhang * config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC, __ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and __ARM_BF16_FORMAT_ALTERNATIVE when enabled. * config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features. * config/arm/arm-tables.opt: Regenerated. * config/arm/arm.c (arm_option_reconfigure_globals): Initialize arm_arch_i8mm and arm_arch_bf16 when enabled. * config/arm/arm.h (TARGET_I8MM): New macro. (TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise. * config/arm/t-aprofile: Add matching rules for -march=armv8.6-a. * config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a. * config/arm/t-multilib: Add matching rules for -march=armv8.6-a. (v8_6_a_simd_variants): New. (v8_*_a_simd_variants): Add i8mm and bf16. * doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options. gcc/testsuite/ChangeLog: 2019-12-12 Dennis Zhang * gcc.target/arm/multilib.exp: Add combination tests for armv8.6-a. Is it OK for trunk? This is ok for trunk. Please follow the steps at https://gcc.gnu.org/svnwrite.html to get write permission to the repo (listing me as approver). You can then commit it yourself :) Thanks, Kyrill Many thanks! Dennis
Re: [PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.
Hi Srinath, On 11/14/19 7:13 PM, Srinath Parvathaneni wrote: Hello, This patch supports following MVE ACLE intrinsics with binary operand. vsubq_n_f16, vsubq_n_f32, vbrsrq_n_f16, vbrsrq_n_f32, vcvtq_n_f16_s16, vcvtq_n_f32_s32, vcvtq_n_f16_u16, vcvtq_n_f32_u32, vcreateq_f16, vcreateq_f32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics In this patch new constraint "Rd" is added, which checks the constant is with in the range of 1 to 16. Also a new predicate "mve_imm_16" is added, to check the the matching constraint Rd. Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (BINOP_NONE_NONE_NONE_QUALIFIERS): Define qualifier for binary operands. (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise. (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise. (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vsubq_n_f16): Define macro. (vsubq_n_f32): Likewise. (vbrsrq_n_f16): Likewise. (vbrsrq_n_f32): Likewise. (vcvtq_n_f16_s16): Likewise. (vcvtq_n_f32_s32): Likewise. (vcvtq_n_f16_u16): Likewise. (vcvtq_n_f32_u32): Likewise. (vcreateq_f16): Likewise. (vcreateq_f32): Likewise. (__arm_vsubq_n_f16): Define intrinsic. (__arm_vsubq_n_f32): Likewise. (__arm_vbrsrq_n_f16): Likewise. (__arm_vbrsrq_n_f32): Likewise. (__arm_vcvtq_n_f16_s16): Likewise. (__arm_vcvtq_n_f32_s32): Likewise. (__arm_vcvtq_n_f16_u16): Likewise. (__arm_vcvtq_n_f32_u32): Likewise. (__arm_vcreateq_f16): Likewise. (__arm_vcreateq_f32): Likewise. (vsubq): Define polymorphic variant. (vbrsrq): Likewise. (vcvtq_n): Likewise. * config/arm/arm_mve_builtins.def (BINOP_NONE_NONE_NONE_QUALIFIERS): Use it. (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise. (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise. (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise. * config/arm/constraints.md (Rd): Define constraint to check constant is in the range of 1 to 16. * config/arm/mve.md (mve_vsubq_n_f): Define RTL pattern. mve_vbrsrq_n_f: Likewise. mve_vcvtq_n_to_f_: Likewise. mve_vcreateq_f: Likewise. * config/arm/predicates.md (mve_imm_16): Define predicate to check the matching constraint Rd. gcc/testsuite/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c: New test. * gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise. ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index cd82aa159089c288607e240de02a85dcbb134a14..c2dad057d1365914477c64d559aa1fd1c32bbf19 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -349,6 +349,30 @@ arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define UNOP_UNONE_IMM_QUALIFIERS \ (arm_unop_unone_imm_qualifiers) +static enum arm_type_qualifiers +arm_binop_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none }; +#define BINOP_NONE_NONE_NONE_QUALIFIERS \ + (arm_binop_none_none_none_qualifiers) + +static enum arm_type_qualifiers +arm_binop_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_immediate }; +#define BINOP_NONE_NONE_IMM_QUALIFIERS \ + (arm_binop_none_none_imm_qualifiers) + +static enum arm_type_qualifiers +arm_binop_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_unsigned, qualifier_immediate }; +#define BINOP_NONE_UNONE_IMM_QUALIFIERS \ + (arm_binop_none_unone_imm_qualifiers) + +static enum arm_type_qualifiers +arm_binop_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_unsigned, qualifier_unsigned }; +#define BINOP_NONE_UNONE_UNONE_QUALIFIERS \ + (arm_binop_none_unone_unone_qualifiers) + /* End of Qualifier for MVE builtins. */ /* void ([T eleme
Re: [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.
Hi Srinath, On 11/14/19 7:13 PM, Srinath Parvathaneni wrote: Hello, This patch supports following MVE ACLE intrinsics with unary operand. vctp16q, vctp32q, vctp64q, vctp8q, vpnot. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics There are few conflicts in defining the machine registers, resolved by re-ordering VPR_REGNUM, APSRQ_REGNUM and APSRGE_REGNUM. Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-12 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (hi_UP): Define mode. * config/arm/arm.h (IS_VPR_REGNUM): Move. * config/arm/arm.md (VPR_REGNUM): Define before APSRQ_REGNUM. (APSRQ_REGNUM): Modify. (APSRGE_REGNUM): Modify. * config/arm/arm_mve.h (vctp16q): Define macro. (vctp32q): Likewise. (vctp64q): Likewise. (vctp8q): Likewise. (vpnot): Likewise. (__arm_vctp16q): Define intrinsic. (__arm_vctp32q): Likewise. (__arm_vctp64q): Likewise. (__arm_vctp8q): Likewise. (__arm_vpnot): Likewise. * config/arm/arm_mve_builtins.def (UNOP_UNONE_UNONE): Use builtin qualifier. * config/arm/mve.md (mve_vctpqhi): Define RTL pattern. (mve_vpnothi): Likewise. gcc/testsuite/ChangeLog: 2019-11-12 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vctp16q.c: New test. * gcc.target/arm/mve/intrinsics/vctp32q.c: Likewise. * gcc.target/arm/mve/intrinsics/vctp64q.c: Likewise. * gcc.target/arm/mve/intrinsics/vctp8q.c: Likewise. * gcc.target/arm/mve/intrinsics/vpnot.c: Likewise. ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 21b213d8e1bc99a3946f15e97161e01d73832799..cd82aa159089c288607e240de02a85dcbb134a14 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -387,6 +387,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define oi_UP E_OImode #define hf_UP E_HFmode #define si_UP E_SImode +#define hi_UP E_HImode #define void_UP E_VOIDmode #define UP(X) X##_UP diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 485db72f05f16ca389227289a35c232dc982bf9d..95ec7963a57a1a5652a0a9dc30391a0ce6348242 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -955,6 +955,9 @@ extern int arm_arch_cmse; #define IS_IWMMXT_GR_REGNUM(REGNUM) \ (((REGNUM) >= FIRST_IWMMXT_GR_REGNUM) && ((REGNUM) <= LAST_IWMMXT_GR_REGNUM)) +#define IS_VPR_REGNUM(REGNUM) \ + ((REGNUM) == VPR_REGNUM) + /* Base register for access to local variables of the function. */ #define FRAME_POINTER_REGNUM 102 @@ -999,7 +1002,7 @@ extern int arm_arch_cmse; && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1)) /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP - + 1 APSRQ + 1 APSRGE + 1 VPR. */ + +1 VPR + 1 APSRQ + 1 APSRGE. */ /* Intel Wireless MMX Technology registers add 16 + 4 more. */ /* VFP (VFP3) adds 32 (64) + 1 VFPCC. */ #define FIRST_PSEUDO_REGISTER 107 @@ -1101,13 +1104,10 @@ extern int arm_regs_in_sequence[]; /* Registers not for general use. */ \ CC_REGNUM, VFPCC_REGNUM, \ FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM, \ - SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM, \ - VPR_REGNUM \ + SP_REGNUM, PC_REGNUM, VPR_REGNUM, APSRQ_REGNUM,\ + APSRGE_REGNUM \ } -#define IS_VPR_REGNUM(REGNUM) \ - ((REGNUM) == VPR_REGNUM) - /* Use different register alloc ordering for Thumb. */ #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc () diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 689baa0b0ff63ef90f47d2fd844cb98c9a1457a0..2a90482a873f8250a3b2b1dec141669f55e0c58b 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -39,9 +39,9 @@ (LAST_ARM_REGNUM 15) ; (CC_REGNUM 100) ; Condition code pseudo register (VFPCC_REGNUM 101) ; VFP Condition code pseudo register - (APSRQ_REGNUM 104) ; Q bit pseudo register - (APSRGE_REGNUM 105) ; GE bits pseudo register - (VPR_REGNUM 106) ; Vector Predication Register - MVE register. + (VPR_REGNUM 104) ; Vector Predication Register - MVE register. + (APSRQ_REGNUM 105) ; Q bit pseudo register + (APSRGE_REGNUM 106) ; GE bits pseudo register ] ) ;; 3rd operand to select_dominance_cc_mode diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 1d357180ba9ddb26347b55cde625903bdb09eef6..c8d9b6471634725cea9bab3f9fa14
Re: [PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.
Hi Srinath, On 11/14/19 7:13 PM, Srinath Parvathaneni wrote: Hello, This patch supports following MVE ACLE intrinsics with unary operand. vmvnq_n_s16, vmvnq_n_s32, vrev64q_s8, vrev64q_s16, vrev64q_s32, vcvtq_s16_f16, vcvtq_s32_f32, vrev64q_u8, vrev64q_u16, vrev64q_u32, vmvnq_n_u16, vmvnq_n_u32, vcvtq_u16_f16, vcvtq_u32_f32, vrev64q. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (UNOP_SNONE_SNONE_QUALIFIERS): Define. (UNOP_SNONE_NONE_QUALIFIERS): Likewise. (UNOP_SNONE_IMM_QUALIFIERS): Likewise. (UNOP_UNONE_NONE_QUALIFIERS): Likewise. (UNOP_UNONE_UNONE_QUALIFIERS): Likewise. (UNOP_UNONE_IMM_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vmvnq_n_s16): Define macro. (vmvnq_n_s32): Likewise. (vrev64q_s8): Likewise. (vrev64q_s16): Likewise. (vrev64q_s32): Likewise. (vcvtq_s16_f16): Likewise. (vcvtq_s32_f32): Likewise. (vrev64q_u8): Likewise. (vrev64q_u16): Likewise. (vrev64q_u32): Likewise. (vmvnq_n_u16): Likewise. (vmvnq_n_u32): Likewise. (vcvtq_u16_f16): Likewise. (vcvtq_u32_f32): Likewise. (__arm_vmvnq_n_s16): Define intrinsic. (__arm_vmvnq_n_s32): Likewise. (__arm_vrev64q_s8): Likewise. (__arm_vrev64q_s16): Likewise. (__arm_vrev64q_s32): Likewise. (__arm_vrev64q_u8): Likewise. (__arm_vrev64q_u16): Likewise. (__arm_vrev64q_u32): Likewise. (__arm_vmvnq_n_u16): Likewise. (__arm_vmvnq_n_u32): Likewise. (__arm_vcvtq_s16_f16): Likewise. (__arm_vcvtq_s32_f32): Likewise. (__arm_vcvtq_u16_f16): Likewise. (__arm_vcvtq_u32_f32): Likewise. (vrev64q): Define polymorphic variant. * config/arm/arm_mve_builtins.def (UNOP_SNONE_SNONE): Use it. (UNOP_SNONE_NONE): Likewise. (UNOP_SNONE_IMM): Likewise. (UNOP_UNONE_UNONE): Likewise. (UNOP_UNONE_NONE): Likewise. (UNOP_UNONE_IMM): Likewise. * config/arm/mve.md (mve_vrev64q_): Define RTL pattern. (mve_vcvtq_from_f_): Likewise. (mve_vmvnq_n_): Likewise. gcc/testsuite/ChangeLog: 2019-10-21 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c: New test. * gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q_u8.c: Likewise. ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 2fee417fe6585f457edd4cf96655366b1d6bd1a0..21b213d8e1bc99a3946f15e97161e01d73832799 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -313,6 +313,42 @@ arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define UNOP_NONE_UNONE_QUALIFIERS \ (arm_unop_none_unone_qualifiers) +static enum arm_type_qualifiers +arm_unop_snone_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none }; +#define UNOP_SNONE_SNONE_QUALIFIERS \ + (arm_unop_snone_snone_qualifiers) + +static enum arm_type_qualifiers +arm_unop_snone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none }; +#define UNOP_SNONE_NONE_QUALIFIERS \ + (arm_unop_snone_none_qualifiers) + +static enum arm_type_qualifiers +arm_unop_snone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_immediate }; +#define UNOP_SNONE_IMM_QUALIFIERS \ + (arm_unop_snone_imm_qualifiers) + +static enum arm_type_qualifiers +arm_unop_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none }; +#define UNOP_UNONE_NONE_QUALIFIERS \ + (arm_unop_unone_none_qualifiers) + +static enum arm_type_qualifiers +arm_unop_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qual
Re: [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand.
Hi Srinath, On 11/14/19 7:12 PM, Srinath Parvathaneni wrote: Hello, This patch supports MVE ACLE intrinsics vcvtq_f16_s16, vcvtq_f32_s32, vcvtq_f16_u16, vcvtq_f32_u32n vrndxq_f16, vrndxq_f32, vrndq_f16, vrndq_f32, vrndpq_f16, vrndpq_f32, vrndnq_f16, vrndnq_f32, vrndmq_f16, vrndmq_f32, vrndaq_f16, vrndaq_f32, vrev64q_f16, vrev64q_f32, vnegq_f16, vnegq_f32, vdupq_n_f16, vdupq_n_f32, vabsq_f16, vabsq_f32, vrev32q_f16, vcvttq_f32_f16, vcvtbq_f32_f16. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-10-17 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (UNOP_NONE_NONE_QUALIFIERS): Define macro. (UNOP_NONE_SNONE_QUALIFIERS): Likewise. (UNOP_NONE_UNONE_QUALIFIERS): Likewise. * config/arm/arm_mve.h (vrndxq_f16): Define macro. (vrndxq_f32): Likewise. (vrndq_f16) Likewise.: (vrndq_f32): Likewise. (vrndpq_f16): Likewise. (vrndpq_f32): Likewise. (vrndnq_f16): Likewise. (vrndnq_f32): Likewise. (vrndmq_f16): Likewise. (vrndmq_f32): Likewise. (vrndaq_f16): Likewise. (vrndaq_f32): Likewise. (vrev64q_f16): Likewise. (vrev64q_f32): Likewise. (vnegq_f16): Likewise. (vnegq_f32): Likewise. (vdupq_n_f16): Likewise. (vdupq_n_f32): Likewise. (vabsq_f16): Likewise. (vabsq_f32): Likewise. (vrev32q_f16): Likewise. (vcvttq_f32_f16): Likewise. (vcvtbq_f32_f16): Likewise. (vcvtq_f16_s16): Likewise. (vcvtq_f32_s32): Likewise. (vcvtq_f16_u16): Likewise. (vcvtq_f32_u32): Likewise. (__arm_vrndxq_f16): Define intrinsic. (__arm_vrndxq_f32): Likewise. (__arm_vrndq_f16): Likewise. (__arm_vrndq_f32): Likewise. (__arm_vrndpq_f16): Likewise. (__arm_vrndpq_f32): Likewise. (__arm_vrndnq_f16): Likewise. (__arm_vrndnq_f32): Likewise. (__arm_vrndmq_f16): Likewise. (__arm_vrndmq_f32): Likewise. (__arm_vrndaq_f16): Likewise. (__arm_vrndaq_f32): Likewise. (__arm_vrev64q_f16): Likewise. (__arm_vrev64q_f32): Likewise. (__arm_vnegq_f16): Likewise. (__arm_vnegq_f32): Likewise. (__arm_vdupq_n_f16): Likewise. (__arm_vdupq_n_f32): Likewise. (__arm_vabsq_f16): Likewise. (__arm_vabsq_f32): Likewise. (__arm_vrev32q_f16): Likewise. (__arm_vcvttq_f32_f16): Likewise. (__arm_vcvtbq_f32_f16): Likewise. (__arm_vcvtq_f16_s16): Likewise. (__arm_vcvtq_f32_s32): Likewise. (__arm_vcvtq_f16_u16): Likewise. (__arm_vcvtq_f32_u32): Likewise. (vrndxq): Define polymorphic variants. (vrndq): Likewise. (vrndpq): Likewise. (vrndnq): Likewise. (vrndmq): Likewise. (vrndaq): Likewise. (vrev64q): Likewise. (vnegq): Likewise. (vabsq): Likewise. (vrev32q): Likewise. (vcvtbq_f32): Likewise. (vcvttq_f32): Likewise. (vcvtq): Likewise. * config/arm/arm_mve_builtins.def (VAR2): Define. (VAR1): Define. * config/arm/mve.md (mve_vrndxq_f): Add RTL pattern. (mve_vrndq_f): Likewise. (mve_vrndpq_f): Likewise. (mve_vrndnq_f): Likewise. (mve_vrndmq_f): Likewise. (mve_vrndaq_f): Likewise. (mve_vrev64q_f): Likewise. (mve_vnegq_f): Likewise. (mve_vdupq_n_f): Likewise. (mve_vabsq_f): Likewise. (mve_vrev32q_fv8hf): Likewise. (mve_vcvttq_f32_f16v4sf): Likewise. (mve_vcvtbq_f32_f16v4sf): Likewise. (mve_vcvtq_to_f_): Likewise. gcc/testsuite/ChangeLog: 2019-10-17 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vabsq_f16.c: New test. * gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vdupq_n_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vnegq_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vnegq_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev32q_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vrev64q
Re: [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
Hi Srinath, On 11/14/19 7:12 PM, Srinath Parvathaneni wrote: Hello, This patch is part of MVE ACLE intrinsics framework. The patch supports the use of emulation for the double-precision arithmetic operations for MVE. This changes are to support the MVE ACLE intrinsics which operates on vector floating point arithmetic operations. Please refer to Arm reference manual [1] for more details. [1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914 Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-11 Andre Vieira Srinath Parvathaneni * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify function to add emulator calls for dobule precision arithmetic operations for MVE. I'm a bit confused by the changelog and the comment in the patch ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6faed76206b93c1a9dea048e2f693dc16ee58072..358b2638b65a2007d1c7e8062844b67682597f45 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -5658,9 +5658,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall) /* Values from double-precision helper functions are returned in core registers if the selected core only supports single-precision arithmetic, even if we are using the hard-float ABI. The same is - true for single-precision helpers, but we will never be using the - hard-float ABI on a CPU which doesn't support single-precision - operations in hardware. */ + true for single-precision helpers except in case of MVE, because in + MVE we will be using the hard-float ABI on a CPU which doesn't support + single-precision operations in hardware. In MVE the following check + enables use of emulation for the double-precision arithmetic + operations. */ + if (TARGET_HAVE_MVE) + { + add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode)); + add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode)); + } ... this adds emulation for SFmode but you say you want double-precision emulation? Can you demonstrate what this patch wants to achieve with a testcase? Thanks, Kyrill add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode)); add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode)); add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode));
Re: [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.
On 11/14/19 7:12 PM, Srinath Parvathaneni wrote: Hello, This patch supports MVE ACLE intrinsics vst4q_s8, vst4q_s16, vst4q_s32, vst4q_u8, vst4q_u16, vst4q_u32, vst4q_f16 and vst4q_f32. In this patch arm_mve_builtins.def file is added to the source code in which the builtins for MVE ACLE intrinsics are defined using builtin qualifiers. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Ok. Thanks, Kyrill Thanks, Srinath. gcc/ChangeLog: 2019-11-12 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm-builtins.c (CF): Define mve_builtin_data. (VAR1): Define. (ARM_BUILTIN_MVE_PATTERN_START): Define. (arm_init_mve_builtins): Define function. (arm_init_builtins): Add TARGET_HAVE_MVE check. (arm_expand_builtin_1): Check the range of fcode. (arm_expand_mve_builtin): Define function to expand MVE builtins. (arm_expand_builtin): Check the range of fcode. * config/arm/arm_mve.h (__ARM_FEATURE_MVE): Define MVE floating point types. (__ARM_MVE_PRESERVE_USER_NAMESPACE): Define to protect user namespace. (vst4q_s8): Define macro. (vst4q_s16): Likewise. (vst4q_s32): Likewise. (vst4q_u8): Likewise. (vst4q_u16): Likewise. (vst4q_u32): Likewise. (vst4q_f16): Likewise. (vst4q_f32): Likewise. (__arm_vst4q_s8): Define inline builtin. (__arm_vst4q_s16): Likewise. (__arm_vst4q_s32): Likewise. (__arm_vst4q_u8): Likewise. (__arm_vst4q_u16): Likewise. (__arm_vst4q_u32): Likewise. (__arm_vst4q_f16): Likewise. (__arm_vst4q_f32): Likewise. (__ARM_mve_typeid): Define macro with MVE types. (__ARM_mve_coerce): Define macro with _Generic feature. (vst4q): Define polymorphic variant for different vst4q builtins. * config/arm/arm_mve_builtins.def: New file. * config/arm/mve.md (MVE_VLD_ST): Define iterator. (unspec): Define unspec. (mve_vst4q): Define RTL pattern. * config/arm/t-arm (arm.o): Add entry for arm_mve_builtins.def. (arm-builtins.o): Likewise. gcc/testsuite/ChangeLog: 2019-11-12 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vst4q_f16.c: New test. * gcc.target/arm/mve/intrinsics/vst4q_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst4q_u8.c: Likewise. ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index d4cb0ea3deb49b10266d1620c85e243ed34aee4d..a9f76971ef310118bf7edea6a8dd3de1da46b46b 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -401,6 +401,13 @@ static arm_builtin_datum neon_builtin_data[] = }; #undef CF +#define CF(N,X) CODE_FOR_mve_##N##X +static arm_builtin_datum mve_builtin_data[] = +{ +#include "arm_mve_builtins.def" +}; + +#undef CF #undef VAR1 #define VAR1(T, N, A) \ {#N, UP (A), CODE_FOR_arm_##N, 0, T##_QUALIFIERS}, @@ -705,6 +712,13 @@ enum arm_builtins #include "arm_acle_builtins.def" + ARM_BUILTIN_MVE_BASE, + +#undef VAR1 +#define VAR1(T, N, X) \ + ARM_BUILTIN_MVE_##N##X, +#include "arm_mve_builtins.def" + ARM_BUILTIN_MAX }; @@ -714,6 +728,9 @@ enum arm_builtins #define ARM_BUILTIN_NEON_PATTERN_START \ (ARM_BUILTIN_NEON_BASE + 1) +#define ARM_BUILTIN_MVE_PATTERN_START \ + (ARM_BUILTIN_MVE_BASE + 1) + #define ARM_BUILTIN_ACLE_PATTERN_START \ (ARM_BUILTIN_ACLE_BASE + 1) @@ -1219,6 +1236,22 @@ arm_init_acle_builtins (void) } } +/* Set up all the MVE builtins mentioned in arm_mve_builtins.def file. */ +static void +arm_init_mve_builtins (void) +{ + volatile unsigned int i, fcode = ARM_BUILTIN_MVE_PATTERN_START; + + arm_init_simd_builtin_scalar_types (); + arm_init_simd_builtin_types (); + + for (i = 0; i < ARRAY_SIZE (mve_builtin_data); i++, fcode++) + { + arm_builtin_datum *d = &mve_builtin_data[i]; + arm_init_builtin (fcode, d, "__builtin_mve"); + } +} + /* Set up all the NEON builtins, even builtins for instructions that are not in the current target ISA to allow the user to compile particular modules with different target specific options that differ from the command line @@ -1961,8 +1994,10 @@ arm_init_builtins (void) = add_builtin_functio
Re: [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
Hi Srinath, On 11/14/19 7:12 PM, Srinath Parvathaneni wrote: Hello, This patch is part of MVE ACLE intrinsics framework. This patches add support to update (read/write) the APSR (Application Program Status Register) register and FPSCR (Floating-point Status and Control Register) register for MVE. This patch also enables thumb2 mov RTL patterns for MVE. Please refer to Arm reference manual [1] for more details. [1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914 Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath gcc/ChangeLog: 2019-11-11 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Add check to not allow TARGET_HAVE_MVE for this pattern. (thumb2_cmse_entry_return): Add TARGET_HAVE_MVE check to update APSR register. * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Define. (VUNSPEC_GET_FPSCR): Remove. * config/arm/vfp.md (thumb2_movhi_vfp): Add TARGET_HAVE_MVE check. (thumb2_movhi_fp16): Add TARGET_HAVE_MVE check. (thumb2_movsi_vfp): Add TARGET_HAVE_MVE check. (movdi_vfp): Add TARGET_HAVE_MVE check. (thumb2_movdf_vfp): Add TARGET_HAVE_MVE check. (thumb2_movsfcc_vfp): Add TARGET_HAVE_MVE check. (thumb2_movdfcc_vfp): Add TARGET_HAVE_MVE check. (push_multi_vfp): Add TARGET_HAVE_MVE check. (set_fpscr): Add TARGET_HAVE_MVE check. (get_fpscr): Add TARGET_HAVE_MVE check. These pattern changes do more that add a TARGET_HAVE_MVE check. Some add new alternatives, some even change the RTL pattern. I'd like to see them reflected in the ChangeLog so that I know they're deliberate. ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 809461a25da5a8058a8afce972dea0d3131effc0..81afd8fcdc1b0a82493dc0758bce16fa9e5fde20 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -435,10 +435,10 @@ (define_insn "*cmovsi_insn" [(set (match_operand:SI 0 "arm_general_register_operand" "=r,r,r,r,r,r,r") (if_then_else:SI - (match_operator 1 "arm_comparison_operator" - [(match_operand 2 "cc_register" "") (const_int 0)]) - (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1") - (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))] + (match_operator 1 "arm_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1") + (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))] "TARGET_THUMB2 && TARGET_COND_ARITH && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx) || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))" @@ -540,7 +540,7 @@ [(match_operand 4 "cc_register" "") (const_int 0)]) (match_operand:SF 1 "s_register_operand" "0,r") (match_operand:SF 2 "s_register_operand" "r,0")))] - "TARGET_THUMB2 && TARGET_SOFT_FLOAT" + "TARGET_THUMB2 && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE" "@ it\\t%D3\;mov%D3\\t%0, %2 it\\t%d3\;mov%d3\\t%0, %1" @@ -1226,7 +1226,7 @@ ; added to clear the APSR and potentially the FPSCR if VFP is available, so ; we adapt the length accordingly. (set (attr "length") - (if_then_else (match_test "TARGET_HARD_FLOAT") + (if_then_else (match_test "TARGET_HARD_FLOAT || TARGET_HAVE_MVE") (const_int 34) (const_int 8))) ; We do not support predicate execution of returns from cmse_nonsecure_entry diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index b3b4f8ee3e2d1bdad968a9dd8ccbc72ded274f48..ac7fe7d0af19f1965356d47d8327e24d410b99bd 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -170,6 +170,7 @@ UNSPEC_TORC ; Used by the intrinsic form of the iWMMXt TORC instruction. UNSPEC_TORVSC ; Used by the intrinsic form of the iWMMXt TORVSC instruction. UNSPEC_TEXTRC ; Used by the intrinsic form of the iWMMXt TEXTRC instruction. + UNSPEC_GET_FPSCR ; Represent fetch of FPSCR content. ]) @@ -216,7 +217,6 @@ VUNSPEC_SLX ; Represent a store-register-release-exclusive. VUNSPEC_LDA ; Represent a store-register-acquire. VUNSPEC_STL ; Represent a store-register-release. - VUNSPEC_GET_FPSCR ; Represent fetch of FPSCR content. VUNSPEC_SET_FPSCR ; Represent assign of FPSCR content. VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing. VUNSPEC_CDP ; Represent the coprocessor cdp instruction. diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 6349c0570540ec25a
Re: [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
On 11/14/19 7:12 PM, Srinath Parvathaneni wrote: Hello, This patch creates the required framework for MVE ACLE intrinsics. The following changes are done in this patch to support MVE ACLE intrinsics. Header file arm_mve.h is added to source code, which contains the definitions of MVE ACLE intrinsics and different data types used in MVE. Machine description file mve.md is also added which contains the RTL patterns defined for MVE. A new reigster "p0" is added which is used in by MVE predicated patterns. A new register class "VPR_REG" is added and its contents are defined in REG_CLASS_CONTENTS. The vec-common.md file is modified to support the standard move patterns. The prefix of neon functions which are also used by MVE is changed from "neon_" to "simd_". eg: neon_immediate_valid_for_move changed to simd_immediate_valid_for_move. In the patch standard patterns mve_move, mve_store and move_load for MVE are added and neon.md and vfp.md files are modified to support this common patterns. Please refer to Arm reference manual [1] for more details. [1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914 Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Ok. Thanks, Kyrill Thanks, Srinath gcc/ChangeLog: 2019-11-11 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config.gcc (arm_mve.h): Add header file. * config/arm/aout.h (p0): Add new register name. * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define. (ARM_BUILTIN_NEON_LANE_CHECK): Remove. (arm_init_simd_builtin_types): Add TARGET_HAVE_MVE check. (arm_init_neon_builtins): Move a check to arm_init_builtins function. (arm_init_builtins): Move a check from arm_init_neon_builtins function. (mve_dereference_pointer): Add new function. (arm_expand_builtin_args): Add TARGET_HAVE_MVE check. (arm_expand_neon_builtin): Move a check to arm_expand_builtin function. (arm_expand_builtin): Move a check from arm_expand_neon_builtin function. * config/arm/arm-c.c (arm_cpu_builtins): Define macros for MVE. * config/arm/arm-modes.def (INT_MODE): Add three new integer modes. * config/arm/arm-protos.h (neon_immediate_valid_for_move): Rename function. (simd_immediate_valid_for_move): Rename neon_immediate_valid_for_move function. * config/arm/arm.c (arm_options_perform_arch_sanity_checks):Enable mve isa bit. (use_return_insn): Add TARGET_HAVE_MVE check. (aapcs_vfp_allocate): Add TARGET_HAVE_MVE check. (aapcs_vfp_allocate_return_reg): Add TARGET_HAVE_MVE check. (thumb2_legitimate_address_p): Add TARGET_HAVE_MVE check. (arm_rtx_costs_internal): Add TARGET_HAVE_MVE check. (neon_valid_immediate): Rename to simd_valid_immediate. (simd_valid_immediate): Rename from neon_valid_immediate. (neon_immediate_valid_for_move): Rename to simd_immediate_valid_for_move. (simd_immediate_valid_for_move): Rename from neon_immediate_valid_for_move. (neon_immediate_valid_for_logic): Modify call to neon_valid_immediate function. (neon_make_constant): Modify call to neon_valid_immediate function. (neon_vector_mem_operand): Add TARGET_HAVE_MVE check. (output_move_neon): Add TARGET_HAVE_MVE check. (arm_compute_frame_layout): Add TARGET_HAVE_MVE check. (arm_save_coproc_regs): Add TARGET_HAVE_MVE check. (arm_print_operand): Add case 'E' to print memory operands. (arm_print_operand_address): Add TARGET_HAVE_MVE check. (arm_hard_regno_mode_ok): Add TARGET_HAVE_MVE check. (arm_modes_tieable_p): Add TARGET_HAVE_MVE check. (arm_regno_class): Add VPR_REGNUM check. (arm_expand_epilogue_apcs_frame): Add TARGET_HAVE_MVE check. (arm_expand_epilogue): Add TARGET_HAVE_MVE check. (arm_vector_mode_supported_p): Add TARGET_HAVE_MVE check for MVE vector modes. (arm_array_mode_supported_p): Add TARGET_HAVE_MVE check. (arm_conditional_register_usage): For TARGET_HAVE_MVE enable VPR register. * config/arm/arm.h (IS_VPR_REGNUM): Macro to check for VPR register. (FIRST_PSEUDO_REGISTER): Modify. (VALID_MVE_MODE): Define. (VALID_MVE_SI_MODE): Define. (VALID_MVE_SF_MODE): Define. (VALID_MVE_STRUCT_MODE): Define. (REG_ALLOC_ORDER): Add VPR_REGNUM entry. (enum reg_class): Add VPR_REG entry. (REG_CLASS_NAMES): Add VPR_REG entry. * config/arm/arm.md (VPR_REGNUM): Define. (arm_movsf_soft_insn): Add TARGET_HAVE_MVE check to not allow MVE. (vfp_pop_multiple_with_writeback): Add TARGET_HAVE_MVE check to allow writeback. (include "mve.md"): Include mve.md file. * config/arm/arm_mve.h: New file. * config/arm/c
Re: [PATCH][GCC][arm] Add CLI and multilib support for Armv8.1-M Mainline MVE extensions
On 12/18/19 5:00 PM, Mihail Ionescu wrote: Hi Kyrill, On 12/18/2019 02:13 PM, Kyrill Tkachov wrote: > Hi Mihail, > > On 11/8/19 4:52 PM, Mihail Ionescu wrote: >> Hi, >> >> This patch adds CLI and multilib support for Armv8.1-M MVE to the Arm >> backend. >> Two new option added for v8.1-m.main: "+mve" for integer MVE >> instructions only >> and "+mve.fp" for both integer and single-precision/half-precision >> floating-point MVE. >> The patch also maps the Armv8.1-M multilib variants to the >> corresponding v8-M ones. >> >> >> >> gcc/ChangeLog: >> >> 2019-11-08 Mihail Ionescu >> 2019-11-08 Andre Vieira >> >> * config/arm/arm-cpus.in (mve, mve_float): New features. >> (dsp, mve, mve.fp): New options. >> * config/arm/arm.h (TARGET_HAVE_MVE, TARGET_HAVE_MVE_FLOAT): >> Define. >> * config/arm/t-rmprofile: Map v8.1-M multilibs to v8-M. >> >> >> gcc/testsuite/ChangeLog: >> >> 2019-11-08 Mihail Ionescu >> 2019-11-08 Andre Vieira >> >> * testsuite/gcc.target/arm/multilib.exp: Add v8.1-M entries. >> >> >> Is this ok for trunk? > > > This is ok, but please document the new options in invoke.texi. > Here it is with the updated invoke.texi and ChangeLog. Thanks, looks great to me. Kyrill gcc/ChangeLog: 2019-12-18 Mihail Ionescu 2019-12-18 Andre Vieira * config/arm/arm-cpus.in (mve, mve_float): New features. (dsp, mve, mve.fp): New options. * config/arm/arm.h (TARGET_HAVE_MVE, TARGET_HAVE_MVE_FLOAT): Define. * config/arm/t-rmprofile: Map v8.1-M multilibs to v8-M. * doc/invoke.texi: Document the armv8.1-m mve and dsp options. gcc/testsuite/ChangeLog: 2019-12-18 Mihail Ionescu 2019-12-18 Andre Vieira * testsuite/gcc.target/arm/multilib.exp: Add v8.1-M entries. Thanks, Mihail > Thanks, > > Kyrill > > >> >> Best regards, >> >> Mihail >> >> >> ### Attachment also inlined for ease of reply >> ### >> >> >> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in >> index >> 59aad8f62ee5186cc87d3cefaf40ba2ce049012d..c2f016c75e2d8dd06890295321232bef61cbd234 >> 100644 >> --- a/gcc/config/arm/arm-cpus.in >> +++ b/gcc/config/arm/arm-cpus.in >> @@ -194,6 +194,10 @@ define feature sb >> # v8-A architectures, added by default from v8.5-A >> define feature predres >> >> +# M-profile Vector Extension feature bits >> +define feature mve >> +define feature mve_float >> + >> # Feature groups. Conventionally all (or mostly) upper case. >> # ALL_FPU lists all the feature bits associated with the floating-point >> # unit; these will all be removed if the floating-point unit is disabled >> @@ -654,9 +658,12 @@ begin arch armv8.1-m.main >> base 8M_MAIN >> isa ARMv8_1m_main >> # fp => FPv5-sp-d16; fp.dp => FPv5-d16 >> + option dsp add armv7em >> option fp add FPv5 fp16 >> option fp.dp add FPv5 FP_DBL fp16 >> option nofp remove ALL_FP >> + option mve add mve armv7em >> + option mve.fp add mve FPv5 fp16 mve_float armv7em >> end arch armv8.1-m.main >> >> begin arch iwmmxt >> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h >> index >> 64c292f2862514fb600a4faeaddfeacb2b69180b..9ec38c6af1b84fc92e20e30e8f07ce5360a966c1 >> 100644 >> --- a/gcc/config/arm/arm.h >> +++ b/gcc/config/arm/arm.h >> @@ -310,6 +310,12 @@ emission of floating point pcs attributes. */ >> instructions (most are floating-point related). */ >> #define TARGET_HAVE_FPCXT_CMSE (arm_arch8_1m_main) >> >> +#define TARGET_HAVE_MVE (bitmap_bit_p (arm_active_target.isa, \ >> + isa_bit_mve)) >> + >> +#define TARGET_HAVE_MVE_FLOAT (bitmap_bit_p (arm_active_target.isa, \ >> + isa_bit_mve_float)) >> + >> /* Nonzero if integer division instructions supported. */ >> #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \ >> || (TARGET_THUMB && arm_arch_thumb_hwdiv)) >> diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile >> index >> 807e69eaf78625f422e2d7ef5936c5c80c5b9073..62e27fd284b21524896430176d64ff5b08c6e0ef >> 100644 >> --- a/gcc/config/arm/t-rmprofile >> +++ b/gcc/config/arm/t-rmprofile >> @@ -54,7 +54,7 @@ MULTILIB_REQUIRED += >> mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp >> # Arch Matches >> MULTILIB_MATCHES += mar
Re: [PATCH, GCC/ARM, 9/10] Call nscall function with blxns
On 12/18/19 1:38 PM, Mihail Ionescu wrote: Hi, On 11/12/2019 10:23 AM, Kyrill Tkachov wrote: On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 9/10] Call nscall function with blxns Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to call functions with the cmse_nonsecure_call attribute directly using blxns with no undue restriction on the register used for that. === Patch description === This change to use BLXNS to call a nonsecure function from secure directly (not using a libcall) is made in 2 steps: - change nonsecure_call patterns to use blxns instead of calling __gnu_cmse_nonsecure_call - loosen requirement for function address to allow any register when doing BLXNS. The former is a straightforward check over whether instructions added in Armv8.1-M Mainline are available while the latter consist in making the nonsecure call pattern accept any register by using match_operand and changing the nonsecure_call_internal expander to no force r4 when targeting Armv8.1-M Mainline. The tricky bit is actually in the test update, specifically how to check that register lists for CLRM have all registers except for the one holding parameters (already done) and the one holding the address used by BLXNS. This is achieved with 3 scan-assembler directives. 1) The first one lists all registers that can appear in CLRM but make each of them optional. Property guaranteed: no wrong register is cleared and none appears twice in the register list. 2) The second directive check that the CLRM is made of a fixed number of the right registers to be cleared. The number used is the number of registers that could contain a secret minus one (used to hold the address of the function to call. Property guaranteed: register list has the right number of registers Cumulated property guaranteed: only registers with a potential secret are cleared and they are all listed but ont 3) The last directive checks that we cannot find a CLRM with a register in it that also appears in BLXNS. This is check via the use of a back-reference on any of the allowed register in CLRM, the back-reference enforcing that whatever register match in CLRM must be the same in the BLXNS. Property guaranteed: register used for BLXNS is different from registers cleared in CLRM. Some more care needs to happen for the gcc.target/arm/cmse/cmse-1.c testcase due to there being two CLRM generated. To ensure the third directive match the right CLRM to the BLXNS, a negative lookahead is used between the CLRM register list and the BLXNS. The way negative lookahead work is by matching the *position* where a given regular expression does not match. In this case, since it comes after the CLRM register list it is requesting that what comes after the register list does not have a CLRM again followed by BLXNS. This guarantees that the .*blxns after only matches a blxns without another CLRM before. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm.md (nonsecure_call_internal): Do not force memory address in r4 when targeting Armv8.1-M Mainline. (nonsecure_call_value_internal): Likewise. * config/arm/thumb2.md (nonsecure_call_reg_thumb2): Make memory address a register match_operand again. Emit BLXNS when targeting Armv8.1-M Mainline. (nonsecure_call_value_reg_thumb2): Likewise. *** gcc/testsuite/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/cmse-1.c: Add check for BLXNS when instructions introduced in Armv8.1-M Mainline Security Extensions are available and restrict checks for libcall to __gnu_cmse_nonsecure_call to Armv8-M targets only. Adapt CLRM check to verify register used for BLXNS is not in the CLRM register list. * gcc.target/arm/cmse/cmse-14.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise and adapt check for LSB clearing bit to be using the same register as BLXNS when targeting Armv8.1-M Mainline. * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard
Re: [PATCH][GCC][arm] Add CLI and multilib support for Armv8.1-M Mainline MVE extensions
Hi Mihail, On 11/8/19 4:52 PM, Mihail Ionescu wrote: Hi, This patch adds CLI and multilib support for Armv8.1-M MVE to the Arm backend. Two new option added for v8.1-m.main: "+mve" for integer MVE instructions only and "+mve.fp" for both integer and single-precision/half-precision floating-point MVE. The patch also maps the Armv8.1-M multilib variants to the corresponding v8-M ones. gcc/ChangeLog: 2019-11-08 Mihail Ionescu 2019-11-08 Andre Vieira * config/arm/arm-cpus.in (mve, mve_float): New features. (dsp, mve, mve.fp): New options. * config/arm/arm.h (TARGET_HAVE_MVE, TARGET_HAVE_MVE_FLOAT): Define. * config/arm/t-rmprofile: Map v8.1-M multilibs to v8-M. gcc/testsuite/ChangeLog: 2019-11-08 Mihail Ionescu 2019-11-08 Andre Vieira * testsuite/gcc.target/arm/multilib.exp: Add v8.1-M entries. Is this ok for trunk? This is ok, but please document the new options in invoke.texi. Thanks, Kyrill Best regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index 59aad8f62ee5186cc87d3cefaf40ba2ce049012d..c2f016c75e2d8dd06890295321232bef61cbd234 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -194,6 +194,10 @@ define feature sb # v8-A architectures, added by default from v8.5-A define feature predres +# M-profile Vector Extension feature bits +define feature mve +define feature mve_float + # Feature groups. Conventionally all (or mostly) upper case. # ALL_FPU lists all the feature bits associated with the floating-point # unit; these will all be removed if the floating-point unit is disabled @@ -654,9 +658,12 @@ begin arch armv8.1-m.main base 8M_MAIN isa ARMv8_1m_main # fp => FPv5-sp-d16; fp.dp => FPv5-d16 + option dsp add armv7em option fp add FPv5 fp16 option fp.dp add FPv5 FP_DBL fp16 option nofp remove ALL_FP + option mve add mve armv7em + option mve.fp add mve FPv5 fp16 mve_float armv7em end arch armv8.1-m.main begin arch iwmmxt diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 64c292f2862514fb600a4faeaddfeacb2b69180b..9ec38c6af1b84fc92e20e30e8f07ce5360a966c1 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -310,6 +310,12 @@ emission of floating point pcs attributes. */ instructions (most are floating-point related). */ #define TARGET_HAVE_FPCXT_CMSE (arm_arch8_1m_main) +#define TARGET_HAVE_MVE (bitmap_bit_p (arm_active_target.isa, \ + isa_bit_mve)) + +#define TARGET_HAVE_MVE_FLOAT (bitmap_bit_p (arm_active_target.isa, \ + isa_bit_mve_float)) + /* Nonzero if integer division instructions supported. */ #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \ || (TARGET_THUMB && arm_arch_thumb_hwdiv)) diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile index 807e69eaf78625f422e2d7ef5936c5c80c5b9073..62e27fd284b21524896430176d64ff5b08c6e0ef 100644 --- a/gcc/config/arm/t-rmprofile +++ b/gcc/config/arm/t-rmprofile @@ -54,7 +54,7 @@ MULTILIB_REQUIRED += mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp # Arch Matches MULTILIB_MATCHES += march?armv6s-m=march?armv6-m -# Map all v8-m.main+dsp FP variants down the the variant without DSP. +# Map all v8-m.main+dsp FP variants down to the variant without DSP. MULTILIB_MATCHES += march?armv8-m.main=march?armv8-m.main+dsp \ $(foreach FP, +fp +fp.dp, \ march?armv8-m.main$(FP)=march?armv8-m.main+dsp$(FP)) @@ -66,3 +66,18 @@ MULTILIB_MATCHES += march?armv7e-m+fp=march?armv7e-m+fpv5 MULTILIB_REUSE += $(foreach ARCH, armv6s-m armv7-m armv7e-m armv8-m\.base armv8-m\.main, \ mthumb/march.$(ARCH)/mfloat-abi.soft=mthumb/march.$(ARCH)/mfloat-abi.softfp) +# Map v8.1-M to v8-M. +MULTILIB_MATCHES += march?armv8-m.main=march?armv8.1-m.main +MULTILIB_MATCHES += march?armv8-m.main=march?armv8.1-m.main+dsp +MULTILIB_MATCHES += march?armv8-m.main=march?armv8.1-m.main+mve + +v8_1m_sp_variants = +fp +dsp+fp +mve.fp +v8_1m_dp_variants = +fp.dp +dsp+fp.dp +fp.dp+mve +fp.dp+mve.fp + +# Map all v8.1-m.main FP sp variants down to v8-m. +MULTILIB_MATCHES += $(foreach FP, $(v8_1m_sp_variants), \ + march?armv8-m.main+fp=march?armv8.1-m.main$(FP)) + +# Map all v8.1-m.main FP dp variants down to v8-m. +MULTILIB_MATCHES += $(foreach FP, $(v8_1m_dp_variants), \ + march?armv8-m.main+fp.dp=march?armv8.1-m.main$(FP)) diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp b/gcc/testsuite/gcc.target/arm/multilib.exp index dcea829965eb15e372401e6389df5a1403393ecb..63cca118da2578253740fcd95421eae9ddf219bd 100644 --- a/gcc/testsuite/gcc.target/arm/multilib.exp +++ b/gcc/testsuite/gcc.target/arm/multilib.exp @@ -775,6 +775,27 @@ if {[multilib_config "rmprofile"] } { {-march=armv8-r+fp.sp -mfpu=auto -mfloat-abi=hard} "thumb/v7-r+fp.sp/ha
Re: [PATCH][AArch64] Fixup core tunings
Hi Wilco, On 12/17/19 4:03 PM, Wilco Dijkstra wrote: Hi Richard, > This changelog entry is inadequate. It's also not in the correct style. > > It should say what has changed, not just that it has changed. Sure, but there is often no useful space for that. We should auto generate changelogs if they are deemed useful. I find the commit message a lot more useful in general. Here is the updated version: Several tuning settings in cores.def are not consistent. Set the tuning for Cortex-A76AE and Cortex-A77 to neoversen1 so it is the same as for Cortex-A76 and Neoverse N1. Set the tuning for Neoverse E1 to cortexa73 so it's the same as for Cortex-A65. Set the scheduler for Cortex-A65 and Cortex-A65AE to cortexa53. Bootstrap OK, OK for commit? Ok. Thanks, Kyrill ChangeLog: 2019-12-17 Wilco Dijkstra * config/aarch64/aarch64-cores.def: ("cortex-a76ae"): Use neoversen1 tuning. ("cortex-a77"): Likewise. ("cortex-a65"): Use cortexa53 scheduler. ("cortex-a65ae"): Likewise. ("neoverse-e1"): Use cortexa73 tuning. -- diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 053c6390e747cb9c818fe29a9b22990143b260ad..d170253c6eddca87f8b9f4f7fcc4692695ef83fb 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -101,13 +101,13 @@ AARCH64_CORE("thunderx2t99", thunderx2t99, thunderx2t99, 8_1A, AARCH64_FL_FOR AARCH64_CORE("cortex-a55", cortexa55, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1) AARCH64_CORE("cortex-a75", cortexa75, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1) AARCH64_CORE("cortex-a76", cortexa76, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, neoversen1, 0x41, 0xd0b, -1) -AARCH64_CORE("cortex-a76ae", cortexa76ae, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0e, -1) -AARCH64_CORE("cortex-a77", cortexa77, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0d, -1) -AARCH64_CORE("cortex-a65", cortexa65, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd06, -1) -AARCH64_CORE("cortex-a65ae", cortexa65ae, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd43, -1) +AARCH64_CORE("cortex-a76ae", cortexa76ae, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, neoversen1, 0x41, 0xd0e, -1) +AARCH64_CORE("cortex-a77", cortexa77, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, neoversen1, 0x41, 0xd0d, -1) +AARCH64_CORE("cortex-a65", cortexa65, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd06, -1) +AARCH64_CORE("cortex-a65ae", cortexa65ae, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd43, -1) AARCH64_CORE("ares", ares, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1) AARCH64_CORE("neoverse-n1", neoversen1, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1) -AARCH64_CORE("neoverse-e1", neoversee1, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1) +AARCH64_CORE("neoverse-e1", neoversee1, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd4a, -1) /* HiSilicon ('H') cores. */ AARCH64_CORE("tsv110", tsv110, tsv110, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110, 0x48, 0xd01, -1) @@ -127,6 +127,6 @@ AARCH64_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, AARCH /* ARM DynamIQ big.LITTLE configurations. */ AARCH64_CORE("cortex-a75.cortex-a55", cortexa75cortexa55, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1) -AARCH64_CORE("cortex-a76.cortex-a55", cortexa76cortexa55, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa72, 0x41, AARCH64_BIG_LITTLE (0xd0b, 0xd05), -1) +AARCH64_CORE("cortex-a76.cortex-a55", c
Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)
On 12/17/19 2:33 PM, Christophe Lyon wrote: On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov wrote: Hi Christophe, On 11/18/19 9:00 AM, Christophe Lyon wrote: On Wed, 13 Nov 2019 at 15:46, Christophe Lyon wrote: On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists) wrote: On 18/10/2019 14:18, Christophe Lyon wrote: + bool not_supported = arm_arch_notm || flag_pic || TARGET_NEON; This is a poor name in the context of the function as a whole. What's not supported. Please think of a better name so that I have some idea what the intention is. That's to keep most of the code common when checking if -mpure-code and -mslow-flash-data are supported. These 3 cases are common to the two compilation flags, and -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition. Would "common_unsupported_modes" work better for you? Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in the two tests. Hi, Here is an updated version, using "common_unsupported_modes" instead of "not_supported", and fixing the typo reported by Kyrill. The ChangeLog is still the same. OK? The name looks ok to me. Richard had a concern about Armv8-M Baseline, but I do see it being supported as you pointed out. So I believe all the concerns are addressed. OK, thanks! Thus the code is ok. However, please also updated the documentation for -mpure-code in invoke.texi (it currently states that a MOVT instruction is needed). I didn't think about this :( It currently says: "This option is only available when generating non-pic code for M-profile targets with the MOVT instruction." I suggest to remove the "with the MOVT instruction" part. Is that OK if I commit my patch and this doc change? Yes, I think that is simplest correct change to make. Thanks, Kyrill Christophe Thanks, Kyrill Thanks, Christophe Thanks, Christophe R.
Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)
Hi Christophe, On 11/18/19 9:00 AM, Christophe Lyon wrote: On Wed, 13 Nov 2019 at 15:46, Christophe Lyon wrote: > > On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists) > wrote: > > > > On 18/10/2019 14:18, Christophe Lyon wrote: > > > + bool not_supported = arm_arch_notm || flag_pic || TARGET_NEON; > > > > > > > This is a poor name in the context of the function as a whole. What's > > not supported. Please think of a better name so that I have some idea > > what the intention is. > > That's to keep most of the code common when checking if -mpure-code > and -mslow-flash-data are supported. > These 3 cases are common to the two compilation flags, and > -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition. > > Would "common_unsupported_modes" work better for you? > Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in > the two tests. > Hi, Here is an updated version, using "common_unsupported_modes" instead of "not_supported", and fixing the typo reported by Kyrill. The ChangeLog is still the same. OK? The name looks ok to me. Richard had a concern about Armv8-M Baseline, but I do see it being supported as you pointed out. So I believe all the concerns are addressed. Thus the code is ok. However, please also updated the documentation for -mpure-code in invoke.texi (it currently states that a MOVT instruction is needed). Thanks, Kyrill Thanks, Christophe > Thanks, > > Christophe > > > > > R.
Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM
Hi Mihail, On 12/16/19 6:29 PM, Mihail Ionescu wrote: Hi Kyrill, On 11/12/2019 09:55 AM, Kyrill Tkachov wrote: Hi Mihail, On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to improve code density of functions with the cmse_nonsecure_entry attribute and when calling function with the cmse_nonsecure_call attribute by using CLRM to do all the general purpose registers clearing as well as clearing the APSR register. === Patch description === This patch adds a new pattern for the CLRM instruction and guards the current clearing code in output_return_instruction() and thumb_exit() on Armv8.1-M Mainline instructions not being present. cmse_clear_registers () is then modified to use the new CLRM instruction when targeting Armv8.1-M Mainline while keeping Armv8-M register clearing code for VFP registers. For the CLRM instruction, which does not mandated APSR in the register list, checking whether it is the right volatile unspec or a clearing register is done in clear_operation_p. Note that load/store multiple were deemed sufficiently different in terms of RTX structure compared to the CLRM pattern for a different function to be used to validate the match_parallel. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm-protos.h (clear_operation_p): Declare. * config/arm/arm.c (clear_operation_p): New function. (cmse_clear_registers): Generate clear_multiple instruction pattern if targeting Armv8.1-M Mainline or successor. (output_return_instruction): Only output APSR register clearing if Armv8.1-M Mainline instructions not available. (thumb_exit): Likewise. * config/arm/predicates.md (clear_multiple_operation): New predicate. * config/arm/thumb2.md (clear_apsr): New define_insn. (clear_multiple): Likewise. * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile unspec. *** gcc/testsuite/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM. * gcc.target/arm/cmse/bitfield-2.c: Likewise. * gcc.target/arm/cmse/bitfield-3.c: Likewise. * gcc.target/arm/cmse/struct-1.c: Likewise. * gcc.target/arm/cmse/cmse-14.c: Likewise. * gcc.target/arm/cmse/cmse-1.c: Likewise. Restrict checks for Armv8-M GPR clearing when CLRM is not available. * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise. Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Best regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index f995974f9bb89ab3c7ff0888c394b0dfaf7da60c..1a948d2c97526ad7e67e8d4a610ac74cfdb13882 100644 --- a/gcc/config/arm/arm-protos.h
Re: [PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions
Hi Mihail, On 12/16/19 6:29 PM, Mihail Ionescu wrote: Hi Kyrill, On 11/06/2019 04:12 PM, Kyrill Tkachov wrote: Hi Mihail, On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to enable saving/restoring of nonsecure FP context in function with the cmse_nonsecure_entry attribute. === Motivation === In Armv8-M Baseline and Mainline, the FP context is cleared on return from nonsecure entry functions. This means the FP context might change when calling a nonsecure entry function. This patch uses the new VLDR and VSTR instructions available in Armv8.1-M Mainline to save/restore the FP context when calling a nonsecure entry functionfrom nonsecure code. === Patch description === This patch consists mainly of creating 2 new instruction patterns to push and pop special FP registers via vldm and vstr and using them in prologue and epilogue. The patterns are defined as push/pop with an unspecified operation on the memory accessed, with an unspecified constant indicating what special FP register is being saved/restored. Other aspects of the patch include: * defining the set of special registers that can be saved/restored and their name * reserving space in the stack frames for these push/pop * preventing return via pop * guarding the clearing of FPSCR to target architecture not having Armv8.1-M Mainline instructions. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm.c (fp_sysreg_names): Declare and define. (use_return_insn): Also return false for Armv8.1-M Mainline. (output_return_instruction): Skip FPSCR clearing if Armv8.1-M Mainline instructions are available. (arm_compute_frame_layout): Allocate space in frame for FPCXTNS when targeting Armv8.1-M Mainline Security Extensions. (arm_expand_prologue): Save FPCXTNS if this is an Armv8.1-M Mainline entry function. (cmse_nonsecure_entry_clear_before_return): Clear IP and r4 if targeting Armv8.1-M Mainline or successor. (arm_expand_epilogue): Fix indentation of caller-saved register clearing. Restore FPCXTNS if this is an Armv8.1-M Mainline entry function. * config/arm/arm.h (TARGET_HAVE_FP_CMSE): New macro. (FP_SYSREGS): Likewise. (enum vfp_sysregs_encoding): Define enum. (fp_sysreg_names): Declare. * config/arm/unspecs.md (VUNSPEC_VSTR_VLDR): New volatile unspec. * config/arm/vfp.md (push_fpsysreg_insn): New define_insn. (pop_fpsysreg_insn): Likewise. *** gcc/testsuite/Changelog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/bitfield-1.c: add checks for VSTR and VLDR. * gcc.target/arm/cmse/bitfield-2.c: Likewise. * gcc.target/arm/cmse/bitfield-3.c: Likewise. * gcc.target/arm/cmse/cmse-1.c: Likewise. * gcc.target/arm/cmse/struct-1.c: Likewise. * gcc.target/arm/cmse/cmse.exp: Run existing Armv8-M Mainline tests from mainline/8m subdirectory and new Armv8.1-M Mainline tests from mainline/8_1m subdirectory. * gcc.target/arm/cmse/mainline/bitfield-4.c: Move into ... * gcc.target/arm/cmse/mainline/8m/bitfield-4.c: This. * gcc.target/arm/cmse/mainline/bitfield-5.c: Move into ... * gcc.target/arm/cmse/mainline/8m/bitfield-5.c: This. * gcc.target/arm/cmse/mainline/bitfield-6.c: Move into ... * gcc.target/arm/cmse/mainline/8m/bitfield-6.c: This. * gcc.target/arm/cmse/mainline/bitfield-7.c: Move into ... * gcc.target/arm/cmse/mainline/8m/bitfield-7.c: This. * gcc.target/arm/cmse/mainline/bitfield-8.c: Move into ... * gcc.target/arm/cmse/mainline/8m/bitfield-8.c: This. * gcc.target/arm/cmse/mainline/bitfield-9.c: Move into ... * gcc.target/arm/cmse/mainline/8m/bitfield-9.c: This. * gcc.target/arm/cmse/mainline/bitfield-and-union-1.c: Move and rename into ... * gcc.target/arm/cmse/mainline/8m/bitfield-and-union.c: This. * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Move into ... * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-13.c: This. Clean up dg-skip-if directive for float ABI. * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Move into ... * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-5.c: This. Clean up dg-skip-if directive for float ABI. * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Move into ... * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-7.c: This. Clean up dg-skip-if directive for float ABI. * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c:
Re: [PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline
Hi Mihail, On 12/16/19 6:28 PM, Mihail Ionescu wrote: Hi Kyrill On 11/06/2019 03:59 PM, Kyrill Tkachov wrote: Hi Mihail, On 11/4/19 4:49 PM, Kyrill Tkachov wrote: Hi Mihail, On 10/23/19 10:26 AM, Mihail Ionescu wrote: > [PATCH, GCC/ARM, 2/10] Add command line support > > Hi, > > === Context === > > This patch is part of a patch series to add support for Armv8.1-M > Mainline Security Extensions architecture. Its purpose is to add > command-line support for that new architecture. > > === Patch description === > > Besides the expected enabling of the new value for the -march > command-line option (-march=armv8.1-m.main) and its extensions (see > below), this patch disables support of the Security Extensions for this > newly added architecture. This is done both by not including the cmse > bit in the architecture description and by throwing an error message > when user request Armv8.1-M Mainline Security Extensions. Note that > Armv8-M Baseline and Mainline Security Extensions are still enabled. > > Only extensions for already supported instructions are implemented in > this patch. Other extensions (MVE integer and float) will be added in > separate patches. The following configurations are allowed for Armv8.1-M > Mainline with regards to FPU and implemented in this patch: > + no FPU (+nofp) > + single precision VFPv5 with FP16 (+fp) > + double precision VFPv5 with FP16 (+fp.dp) > > ChangeLog entry are as follow: > > *** gcc/ChangeLog *** > > 2019-10-23 Mihail-Calin Ionescu > 2019-10-23 Thomas Preud'homme > > * config/arm/arm-cpus.in (armv8_1m_main): New feature. > (ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, ARMv6k, > ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, ARMv7ve, > ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, ARMv8_3a, > ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): Reindent. > (ARMv8_1m_main): New feature group. > (armv8.1-m.main): New architecture. > * config/arm/arm-tables.opt: Regenerate. > * config/arm/arm.c (arm_arch8_1m_main): Define and default > initialize. > (arm_option_reconfigure_globals): Initialize arm_arch8_1m_main. > (arm_options_perform_arch_sanity_checks): Error out when targeting > Armv8.1-M Mainline Security Extensions. > * config/arm/arm.h (arm_arch8_1m_main): Declare. > > *** gcc/testsuite/ChangeLog *** > > 2019-10-23 Mihail-Calin Ionescu > 2019-10-23 Thomas Preud'homme > > * lib/target-supports.exp > (check_effective_target_arm_arch_v8_1m_main_ok): Define. > (add_options_for_arm_arch_v8_1m_main): Likewise. > (check_effective_target_arm_arch_v8_1m_main_multilib): Likewise. > > Testing: bootstrapped on arm-linux-gnueabihf and arm-none-eabi; testsuite > shows no regression. > > Is this ok for trunk? > Ok. Something that I remembered last night upon reflection... New command-line options (or arguments to them) need documentation in invoke.texi. Please add some either as part of this patch or as a separate patch if you prefer. I've added the missing cli options in invoke.texi. Here's the updated ChangeLog: 2019-12-06 Mihail-Calin Ionescu 2019-12-16 Thomas Preud'homme * config/arm/arm-cpus.in (armv8_1m_main): New feature. (ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, ARMv6k, ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, ARMv7ve, ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, ARMv8_3a, ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): Reindent. (ARMv8_1m_main): New feature group. (armv8.1-m.main): New architecture. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.c (arm_arch8_1m_main): Define and default initialize. (arm_option_reconfigure_globals): Initialize arm_arch8_1m_main. (arm_options_perform_arch_sanity_checks): Error out when targeting Armv8.1-M Mainline Security Extensions. * config/arm/arm.h (arm_arch8_1m_main): Declare. * doc/invoke.texi: Document armv8.1-m.main. *** gcc/testsuite/ChangeLog *** 2019-12-16 Mihail-Calin Ionescu 2019-12-16 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_arm_arch_v8_1m_main_ok): Define. (add_options_for_arm_arch_v8_1m_main): Likewise. (check_effective_target_arm_arch_v8_1m_main_multilib): Likewise. Thanks, this is ok. Kyrill Regards, Mihail Thanks, Kyrill Thanks, Kyrill > Best regards, > > Mihail > > > ### Attachment also inlined for ease of reply > ### > > > diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in > index > f8a3b3db67a537163bfe787d78c8f2edc4253ab3..652f2a
Re: [PATCH] [AARCH64] Improve vector generation cost model
Hi Andrew, On 3/15/19 1:18 AM, apin...@marvell.com wrote: From: Andrew Pinski Hi, On OcteonTX2, ld1r and ld1 (with a single lane) are split into two different micro-ops unlike most other targets. This adds three extra costs to the cost table: ld1_dup: used for "ld1r {v0.4s}, [x0]" merge_dup: used for "dup v0.4s, v0.4s[0]" and "ins v0.4s[0], v0.4s[0]" ld1_merge: used fir "ld1 {v0.4s}[0], [x0]" OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. Sorry for the slow reply, missed it on gcc-patches :( Thanks, Andrew Pinski ChangeLog: * config/arm/aarch-common-protos.h (vector_cost_table): Add merge_dup, ld1_merge, and ld1_dup. * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs): Update for the new fields. (thunderx_extra_costs): Likewise. (thunderx2t99_extra_costs): Likewise. (tsv110_extra_costs): Likewise. * config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise. (cortexa53_extra_costs): Likewise. (cortexa57_extra_costs): Likewise. (exynosm1_extra_costs): Likewise. (xgene1_extra_costs): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle vec_dup of a memory. Hanlde vec_merge of a memory. Signed-off-by: Andrew Pinski --- gcc/config/aarch64/aarch64-cost-tables.h | 20 +++ gcc/config/aarch64/aarch64.c | 22 + gcc/config/arm/aarch-common-protos.h | 3 +++ gcc/config/arm/aarch-cost-tables.h | 25 +++- 4 files changed, 61 insertions(+), 9 deletions(-) diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h index 5c9442e1b89..9a7c70ba595 100644 --- a/gcc/config/aarch64/aarch64-cost-tables.h +++ b/gcc/config/aarch64/aarch64-cost-tables.h @@ -123,7 +123,10 @@ const struct cpu_cost_table qdf24xx_extra_costs = }, /* Vector */ { - COSTS_N_INSNS (1) /* alu. */ + COSTS_N_INSNS (1), /* Alu. */ + COSTS_N_INSNS (1), /* dup_merge. */ + COSTS_N_INSNS (1), /* ld1_merge. */ + COSTS_N_INSNS (1) /* ld1_dup. */ } }; @@ -227,7 +230,10 @@ const struct cpu_cost_table thunderx_extra_costs = }, /* Vector */ { - COSTS_N_INSNS (1) /* Alu. */ + COSTS_N_INSNS (1), /* Alu. */ + COSTS_N_INSNS (1), /* dup_merge. */ + COSTS_N_INSNS (1), /* ld1_merge. */ + COSTS_N_INSNS (1) /* ld1_dup. */ } }; @@ -330,7 +336,10 @@ const struct cpu_cost_table thunderx2t99_extra_costs = }, /* Vector */ { - COSTS_N_INSNS (1) /* Alu. */ + COSTS_N_INSNS (1), /* Alu. */ + COSTS_N_INSNS (1), /* dup_merge. */ + COSTS_N_INSNS (1), /* ld1_merge. */ + COSTS_N_INSNS (1) /* ld1_dup. */ } }; @@ -434,7 +443,10 @@ const struct cpu_cost_table tsv110_extra_costs = }, /* Vector */ { - COSTS_N_INSNS (1) /* alu. */ + COSTS_N_INSNS (1), /* Alu. */ + COSTS_N_INSNS (1), /* dup_merge. */ + COSTS_N_INSNS (1), /* ld1_merge. */ + COSTS_N_INSNS (1) /* ld1_dup. */ } }; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b38505b0872..dc4d3d39af8 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10568,6 +10568,28 @@ cost_plus: } break; + case VEC_DUPLICATE: + if (!speed) + return false; If I read the code right, before this patch we would be returning true for !speed i.e. not recursing. Do we want to trigger a recursion now? + + if (GET_CODE (XEXP (x, 0)) == MEM) + *cost += extra_cost->vect.ld1_dup; Please use MEM_P here. + else + *cost += extra_cost->vect.merge_dup; + return true; + + case VEC_MERGE: + if (speed && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE) + { + if (GET_CODE (XEXP (XEXP (x, 0), 0)) == MEM) And here. Thanks, Kyrill + *cost += extra_cost->vect.ld1_merge; + else + *cost += extra_cost->vect.merge_dup; + return true; + } + break; + + case TRUNCATE: /* Decompose muldi3_highpart. */ diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 11cd5145bbc..dbc1282402a 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -131,6 +131,9 @@ struct fp_cost_table struct vector_cost_table { const int alu; + const int merge_dup; + const int ld1_merge; + const int ld1_dup; }; struct cpu_cost_table diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h index bc33efadc6c..a51bc668f56 100644 --- a/gcc/config/arm/aarch-cost-tables.h +++ b/gcc/config/arm/aarch-cost-tables.h @@ -121,7 +121,10 @@ const struct cpu_cost_table generic_extra_costs = }, /* Vector */ { - COSTS_N_INSNS (1) /* alu. */ + COSTS_N_INSNS (1), /* alu. */ + COSTS_N_INSNS (1), /* dup_merge. */ + COSTS_N_INSNS (1), /* ld1_merge. */ + COSTS_N_INSNS (1) /* ld1_dup. */ } }; @@ -224,7 +227,10 @@ const struct cpu_cost_table cortexa53_
Re: [PATCH 3/X] [libsanitizer] Add option to bootstrap using HWASAN
Hi Matthew, Martin is the authority on this but I have a small comment inline... On 12/12/19 3:19 PM, Matthew Malcomson wrote: This is an analogous option to --bootstrap-asan to configure. It allows bootstrapping GCC using HWASAN. For the same reasons as for ASAN we have to avoid using the HWASAN sanitizer when compiling libiberty and the lto-plugin. Also add a function to query whether -fsanitize=hwaddress has been passed. ChangeLog: 2019-08-29 Matthew Malcomson * configure: Regenerate. * configure.ac: Add --bootstrap-hwasan option. config/ChangeLog: 2019-12-12 Matthew Malcomson * bootstrap-hwasan.mk: New file. libiberty/ChangeLog: 2019-12-12 Matthew Malcomson * configure: Regenerate. * configure.ac: Avoid using sanitizer. lto-plugin/ChangeLog: 2019-12-12 Matthew Malcomson * Makefile.am: Avoid using sanitizer. * Makefile.in: Regenerate. ### Attachment also inlined for ease of reply ### diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk new file mode 100644 index ..4f60bed3fd6e98b47a3a38aea6eba2a7c320da25 --- /dev/null +++ b/config/bootstrap-hwasan.mk @@ -0,0 +1,8 @@ +# This option enables -fsanitize=hwaddress for stage2 and stage3. + +STAGE2_CFLAGS += -fsanitize=hwaddress +STAGE3_CFLAGS += -fsanitize=hwaddress +POSTSTAGE1_LDFLAGS += -fsanitize=hwaddress -static-libhwasan \ + -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/ \ + -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/ \ + -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/.libs diff --git a/configure b/configure index aec9186b2b0123d3088b69eb1ee541567654953e..6f71b111bd18ec053180beecf83dd4549e83c2b9 100755 --- a/configure +++ b/configure @@ -7270,7 +7270,7 @@ fi # or bootstrap-ubsan, bootstrap it. if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 2>&1; then case "$BUILD_CONFIG" in - *bootstrap-asan* | *bootstrap-ubsan* ) + *bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* ) bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer, bootstrap_fixincludes=yes ;; diff --git a/configure.ac b/configure.ac index b8ce2ad20b9d03e42731252a9ec2a8417c13e566..16bfdf164555dad94c789f17b6a63ba1a2e3e9f4 100644 --- a/configure.ac +++ b/configure.ac @@ -2775,7 +2775,7 @@ fi # or bootstrap-ubsan, bootstrap it. if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 2>&1; then case "$BUILD_CONFIG" in - *bootstrap-asan* | *bootstrap-ubsan* ) + *bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* ) bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer, bootstrap_fixincludes=yes ;; diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 6c9579bfaff955eb43875b404fb7db1a667bf522..da9a8809c3440827ac22ef6936e080820197f4e7 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -2645,6 +2645,13 @@ Some examples of build configurations designed for developers of GCC are: Compiles GCC itself using Address Sanitization in order to catch invalid memory accesses within the GCC code. +@item @samp{bootstrap-hwasan} +Compiles GCC itself using HWAddress Sanitization in order to catch invalid +memory accesses within the GCC code. This option is only available on AArch64 +targets with a very recent linux kernel (5.4 or later). Using terms like "very recent" in documentation is discouraged. It won't be very recent in a couple of years time and I doubt any of us will remember to come update this snippet :) I suggest something like "this option requires a Linux kernel support that supports the right ABI () (5.4 or later)". Thanks, Kyrill + +@end table + @section Building a cross compiler When building a cross compiler, it is not generally possible to do a diff --git a/libiberty/configure b/libiberty/configure index 7a34dabec32b0b383bd33f07811757335f4dd39c..cb2dd4ff5295598343cc18b3a79a86a778f2261d 100755 --- a/libiberty/configure +++ b/libiberty/configure @@ -5261,6 +5261,7 @@ fi NOASANFLAG= case " ${CFLAGS} " in *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;; + *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;; esac diff --git a/libiberty/configure.ac b/libiberty/configure.ac index f1ce76010c9acde79c5dc46686a78b2e2f19244e..043237628b79cbf37d07359b59c5ffe17a7a22ef 100644 --- a/libiberty/configure.ac +++ b/libiberty/configure.ac @@ -240,6 +240,7 @@ AC_SUBST(PICFLAG) NOASANFLAG= case " ${CFLAGS} " in *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;; + *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;; esac AC_SUBST(NOASANFLAG) diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am index 28dc21014b2e86988fa88adabd63ce6092e18e02..34aa397d785e3cc9b6975de460d065900364c3ff 100644 --- a/lto-plugin/Makefile.am +++ b/lto-plugin/Makefile.am @@ -11,8 +11,8 @@ AM_CPPFLAGS = -I$(
Re: [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics.
Hi Srinath, On 11/14/19 7:12 PM, Srinath Parvathaneni wrote: Hello, This patches series is to support Arm MVE ACLE intrinsics. Please refer to Arm reference manual [1] and MVE intrinsics [2] for more details. Please refer to Chapter 13 MVE ACLE [3] for MVE intrinsics concepts. This patch series depends on upstream patches "Armv8.1-M Mainline Security Extension" [4], "CLI and multilib support for Armv8.1-M Mainline MVE extensions" [5] and "support for Armv8.1-M Mainline scalar shifts" [6]. [1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914 [2] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics [3] https://static.docs.arm.com/101028/0009/Q3-ACLE_2019Q3_release-0009.pdf?_ga=2.239684871.588348166.1573726994-1501600630.1548848914 [4] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html [5] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.html [6] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01194.html Srinath Parvathaneni(38): [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch. [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch. [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch. [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics. [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand. [PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand. [PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand. [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand. [PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands. [PATCH][ARM][GCC][1/3x]: MVE intrinsics with ternary operands. [PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands. [PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands. [PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands. [PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands. [PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands. [PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands. [PATCH][ARM][GCC][1/5x]: MVE store intrinsics. [PATCH][ARM][GCC][2/5x]: MVE load intrinsics. [PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix. [PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix. [PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory. [PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory. [PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory. [PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double word to memory. [PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus operator. [PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics. [PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup intrinsics with writeback. [PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store intrinsics with writeback. [PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) variant. [PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across beats" and "beat-wise substract". [PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and deinterleaving load intrinsics and also aliases to vstr and vldr intrinsics. [PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane. [PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics. [PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry intrinsics. Thank you for working on these. I will reply to individual patches with more targeted comments. As this is a fairly large amount of code, here's my high-level view: The MVE intrinsics spec has more complexities than the Neon intrinsics one: * It needs support for both the user-namespace versions, and the __arm_* ones. * There are also overloaded forms that in C are implemented using _Generic. The above two facts make for a rather bulky and messy arm_mve.h implementation. In the case of the _Generic usage we hit the performance problems reported in PR c/91937. Ideally, I'd like to see the frontend parts of these intrinsics implemented in a similar way to the SVE ACLE (https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00413.html) i.e. have the compiler inject the right functions into the language and do overload resolution through the appropriate hooks, thus keeping the (unavoidable) complexity in the backend rather than arm_mve.h That being said, this is a major feature that I would very much like to see in GCC 10 and the current implementation, outside of the new .md f
Re: [PATCH, GCC/ARM, 2/2] Add support for ASRL(imm), LSLL(imm) and LSRL(imm) instructions for Armv8.1-M Mainline
Hi Mihail, On 11/14/19 1:54 PM, Mihail Ionescu wrote: Hi, This is part of a series of patches where I am trying to add new instructions for Armv8.1-M Mainline to the arm backend. This patch is adding the following instructions: ASRL (imm) LSLL (imm) LSRL (imm) ChangeLog entry are as follow: *** gcc/ChangeLog *** 2019-11-14 Mihail-Calin Ionescu 2019-11-14 Sudakshina Das * config/arm/arm.md (ashldi3): Generate thumb2_lsll for both reg and valid immediate. (ashrdi3): Generate thumb2_asrl for both reg and valid immediate. (lshrdi3): Generate thumb2_lsrl for valid immediates. * config/arm/constraints.md (Pg): New. * config/arm/predicates.md (long_shift_imm): New. (arm_reg_or_long_shift_imm): Likewise. * config/arm/thumb2.md (thumb2_asrl): New immediate alternative. (thumb2_lsll): Likewise. (thumb2_lsrl): New. *** gcc/testsuite/ChangeLog *** 2019-11-14 Mihail-Calin Ionescu 2019-11-14 Sudakshina Das * gcc.target/arm/armv8_1m-shift-imm_1.c: New test. Testsuite shows no regression when run for arm-none-eabi targets. Is this ok for trunk? This is ok once the prerequisites are in. Thanks, Kyrill Thanks Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index b735f858a6a5c94d02a6765c1b349cdcb5e77ee3..82f4a5573d43925fb7638b9078a06699df38f88c 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3509,8 +3509,8 @@ operands[2] = force_reg (SImode, operands[2]); /* Armv8.1-M Mainline double shifts are not expanded. */ - if (REG_P (operands[2])) - { + if (arm_reg_or_long_shift_imm (operands[2], GET_MODE (operands[2]))) + { if (!reg_overlap_mentioned_p(operands[0], operands[1])) emit_insn (gen_movdi (operands[0], operands[1])); @@ -3547,7 +3547,8 @@ "TARGET_32BIT" " /* Armv8.1-M Mainline double shifts are not expanded. */ - if (TARGET_HAVE_MVE && REG_P (operands[2])) + if (TARGET_HAVE_MVE + && arm_reg_or_long_shift_imm (operands[2], GET_MODE (operands[2]))) { if (!reg_overlap_mentioned_p(operands[0], operands[1])) emit_insn (gen_movdi (operands[0], operands[1])); @@ -3580,6 +3581,17 @@ (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " + /* Armv8.1-M Mainline double shifts are not expanded. */ + if (TARGET_HAVE_MVE + && long_shift_imm (operands[2], GET_MODE (operands[2]))) + { + if (!reg_overlap_mentioned_p(operands[0], operands[1])) + emit_insn (gen_movdi (operands[0], operands[1])); + + emit_insn (gen_thumb2_lsrl (operands[0], operands[2])); + DONE; + } + arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1], operands[2], gen_reg_rtx (SImode), gen_reg_rtx (SImode)); diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index b76de81b85c8ce7a2ca484a750b908b7ca64600a..d807818c8499a6a65837f1ed0487e45947f68199 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -35,7 +35,7 @@ ;; Dt, Dp, Dz, Tu ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz -;; in all states: Pf +;; in all states: Pf, Pg ;; The following memory constraints have been used: ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us @@ -187,6 +187,11 @@ && !is_mm_consume (memmodel_from_int (ival)) && !is_mm_release (memmodel_from_int (ival))"))) +(define_constraint "Pg" + "@internal In Thumb-2 state a constant in range 1 to 32" + (and (match_code "const_int") + (match_test "TARGET_THUMB2 && ival >= 1 && ival <= 32"))) + (define_constraint "Ps" "@internal In Thumb-2 state a constant in the range -255 to +255" (and (match_code "const_int") diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 69c10c06ff405e19efa172217a08a512c66cb902..ef5b0303d4424981347287865efb3cca85e56f36 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -322,6 +322,15 @@ && (UINTVAL (XEXP (op, 1)) < 32)"))) (match_test "mode == GET_MODE (op)"))) +;; True for Armv8.1-M Mainline long shift instructions. +(define_predicate "long_shift_imm" + (match_test "satisfies_constraint_Pg (op)")) + +(define_predicate "arm_reg_or_long_shift_imm" + (ior (match_test "TARGET_THUMB2 + && arm_general_register_operand (op, GET_MODE (op))") + (match_test "satisfies_constraint_Pg (op)"))) + ;; True for MULT, to identify which variant of shift_operator is in use. (define_special_predicate "mult_operator" (match_code "mult")) diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 3a716ea954ac55b20811
Re: [PATCH, GCC/ARM, 1/2] Add support for ASRL(reg) and LSLL(reg) instructions for Armv8.1-M Mainline
Hi Mihail, On 11/14/19 1:54 PM, Mihail Ionescu wrote: Hi, This patch adds the new scalar shift instructions for Armv8.1-M Mainline to the arm backend. This patch is adding the following instructions: ASRL (reg) LSLL (reg) Sorry for the delay, very busy time for GCC development :( ChangeLog entry are as follow: *** gcc/ChangeLog *** 2019-11-14 Mihail-Calin Ionescu 2019-11-14 Sudakshina Das * config/arm/arm.h (TARGET_MVE): New macro for MVE support. I don't see this hunk in the patch... There's a lot of v8.1-M-related patches in flight. Is it defined elsewhere? * config/arm/arm.md (ashldi3): Generate thumb2_lsll for TARGET_MVE. (ashrdi3): Generate thumb2_asrl for TARGET_MVE. * config/arm/arm.c (arm_hard_regno_mode_ok): Allocate even odd register pairs for doubleword quantities for ARMv8.1M-Mainline. * config/arm/thumb2.md (thumb2_asrl): New. (thumb2_lsll): Likewise. *** gcc/testsuite/ChangeLog *** 2019-11-14 Mihail-Calin Ionescu 2019-11-14 Sudakshina Das * gcc.target/arm/armv8_1m-shift-reg_1.c: New test. Testsuite shows no regression when run for arm-none-eabi targets. Is this ok for trunk? Thanks Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index be51df7d14738bc1addeab8ac5a3806778106bce..bf788087a30343269b30cf7054ec29212ad9c572 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -24454,14 +24454,15 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode) /* We allow almost any value to be stored in the general registers. Restrict doubleword quantities to even register pairs in ARM state - so that we can use ldrd. Do not allow very large Neon structure - opaque modes in general registers; they would use too many. */ + so that we can use ldrd and Armv8.1-M Mainline instructions. + Do not allow very large Neon structure opaque modes in general + registers; they would use too many. */ This comment now reads: "Restrict doubleword quantities to even register pairs in ARM state so that we can use ldrd and Armv8.1-M Mainline instructions." Armv8.1-M Mainline is not ARM mode though, so please clarify this comment further. Looks ok to me otherwise (I may even have merged this with the second patch, but I'm not complaining about keeping it simple :) ) Thanks, Kyrill if (regno <= LAST_ARM_REGNUM) { if (ARM_NUM_REGS (mode) > 4) return false; - if (TARGET_THUMB2) + if (TARGET_THUMB2 && !TARGET_HAVE_MVE) return true; return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0); diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index a91a4b941c3f9d2c3d443f9f4639069ae953fb3b..b735f858a6a5c94d02a6765c1b349cdcb5e77ee3 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3503,6 +3503,22 @@ (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " + if (TARGET_HAVE_MVE) + { + if (!reg_or_int_operand (operands[2], SImode)) + operands[2] = force_reg (SImode, operands[2]); + + /* Armv8.1-M Mainline double shifts are not expanded. */ + if (REG_P (operands[2])) + { + if (!reg_overlap_mentioned_p(operands[0], operands[1])) + emit_insn (gen_movdi (operands[0], operands[1])); + + emit_insn (gen_thumb2_lsll (operands[0], operands[2])); + DONE; + } + } + arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], operands[2], gen_reg_rtx (SImode), gen_reg_rtx (SImode)); @@ -3530,6 +3546,16 @@ (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " + /* Armv8.1-M Mainline double shifts are not expanded. */ + if (TARGET_HAVE_MVE && REG_P (operands[2])) + { + if (!reg_overlap_mentioned_p(operands[0], operands[1])) + emit_insn (gen_movdi (operands[0], operands[1])); + + emit_insn (gen_thumb2_asrl (operands[0], operands[2])); + DONE; + } + arm_emit_coreregs_64bit_shift (ASHIFTRT, operands[0], operands[1], operands[2], gen_reg_rtx (SImode), gen_reg_rtx (SImode)); diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index c08dab233784bd1cbaae147ece795058d2ef234f..3a716ea954ac55b2081121248b930b7f11520ffa 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1645,3 +1645,19 @@ } [(set_attr "predicable" "yes")] ) + +(define_insn "thumb2_asrl" + [(set (match_operand:DI 0 "arm_general_register_operand" "+r") + (ashiftrt:DI (match_dup 0) + (match_operand:SI 1 "arm_general_register_operand" "r")))] + "TARGET_HAVE_MVE" + "asrl%?\\t%Q0, %R0, %1" + [(set_attr "predicable" "yes")]) + +(define_insn "thumb2
Re: Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end
Hi all, On 12/11/19 9:41 AM, Stam Markianos-Wright wrote: On 12/11/19 3:48 AM, Jeff Law wrote: > On Mon, 2019-12-09 at 13:40 +, Stam Markianos-Wright wrote: >> >> On 12/3/19 10:31 AM, Stam Markianos-Wright wrote: >>> >>> On 12/2/19 9:27 PM, Joseph Myers wrote: On Mon, 2 Dec 2019, Jeff Law wrote: >> 2019-11-13 Stam Markianos-Wright < >> stam.markianos-wri...@arm.com> >> >> * real.c (struct arm_bfloat_half_format, >> encode_arm_bfloat_half, decode_arm_bfloat_half): New. >> * real.h (arm_bfloat_half_format): New. >> >> > Generally OK. Please consider using "arm_bfloat_half" instead > of > "bfloat_half" for the name field in the arm_bfloat_half_format > structure. I'm not sure if that's really visible externally, > but it >>> Hi both! Agreed that we want to be conservative. See latest diff >>> attached with the name field change (also pasted below). >> >> .Ping :) > Sorry if I wasn't clear. WIth the name change I considered this OK for > the trunk. Please install on the trunk. > > If you don't have commit privs let me know. Ahh ok gotcha! Sorry I'm new here, and yes, I don't have commit privileges, yet! I've committed this on Stams' behalf with r279216. Thanks, Kyrill Cheers, Stam > > > Jeff >
Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
Hi Stam, On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: Pinging with more correct maintainers this time :) Also would need to backport to gcc7,8,9, but need to get this approved first! Sorry for the delay. Thank you, Stam Forwarded Message Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816) Date: Mon, 21 Oct 2019 10:37:09 +0100 From: Stam Markianos-Wright To: Ramana Radhakrishnan CC: gcc-patches@gcc.gnu.org , nd , James Greenhalgh , Richard Earnshaw On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote: >> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, >> however, on my native Aarch32 setup the test times out when run as part >> of a big "make check-gcc" regression, but not when run individually. >> >> 2019-10-11 Stamatis Markianos-Wright >> >> * config/arm/arm.md: Update b for Thumb2 range checks. >> * config/arm/arm.c: New function arm_gen_far_branch. >> * config/arm/arm-protos.h: New function arm_gen_far_branch >> prototype. >> >> gcc/testsuite/ChangeLog: >> >> 2019-10-11 Stamatis Markianos-Wright >> >> * testsuite/gcc.target/arm/pr91816.c: New test. > >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >> index f995974f9bb..1dce333d1c3 100644 >> --- a/gcc/config/arm/arm-protos.h >> +++ b/gcc/config/arm/arm-protos.h >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const cpu_arch_option *, >> >> void arm_initialize_isa (sbitmap, const enum isa_feature *); >> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char *); >> + >> + > > Lets get the nits out of the way. > > Unnecessary extra new line, need a space between int and const above. > > .Fixed! >> #endif /* ! GCC_ARM_PROTOS_H */ >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index 39e1a1ef9a2..1a693d2ddca 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void) >> } >> } /* Namespace selftest. */ >> >> + >> +/* Generate code to enable conditional branches in functions over 1 MiB. */ >> +const char * >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, >> + const char * branch_format) > > Not sure if this is some munging from the attachment but check > vertical alignment of parameters. > .Fixed! >> +{ >> + rtx_code_label * tmp_label = gen_label_rtx (); >> + char label_buf[256]; >> + char buffer[128]; >> + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ >> + CODE_LABEL_NUMBER (tmp_label)); >> + const char *label_ptr = arm_strip_name_encoding (label_buf); >> + rtx dest_label = operands[pos_label]; >> + operands[pos_label] = tmp_label; >> + >> + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr); >> + output_asm_insn (buffer, operands); >> + >> + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, label_ptr); >> + operands[pos_label] = dest_label; >> + output_asm_insn (buffer, operands); >> + return ""; >> +} >> + >> + > > Unnecessary extra newline. > .Fixed! >> #undef TARGET_RUN_TARGET_SELFTESTS >> #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests >> #endif /* CHECKING_P */ >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >> index f861c72ccfc..634fd0a59da 100644 >> --- a/gcc/config/arm/arm.md >> +++ b/gcc/config/arm/arm.md >> @@ -6686,9 +6686,16 @@ >> ;; And for backward branches we have >> ;; (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4). >> ;; >> +;; In 16-bit Thumb these ranges are: >> ;; For a 'b' pos_range = 2046, neg_range = -2048 giving (-2040->2048). >> ;; For a 'b' pos_range = 254, neg_range = -256 giving (-250 ->256). >> >> +;; In 32-bit Thumb these ranges are: >> +;; For a 'b' +/- 16MB is not checked for. >> +;; For a 'b' pos_range = 1048574, neg_range = -1048576 giving >> +;; (-1048568 -> 1048576). >> + >> + > > Unnecessary extra newline. > .Fixed! >> (define_expand "cbranchsi4" >> [(set (pc) (if_then_else >> (match_operator 0 "expandable_comparison_operator" >> @@ -6947,22 +6954,42 @@ >> (pc)))] >> "TARGET_32BIT" >> "* >> - if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) >> - { >> - arm_ccfsm_state += 2; >> - return \"\"; >> - } >> - return \"b%d1\\t%l0\"; >> + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) >> + { >> + arm_ccfsm_state += 2; >> + return \"\"; >> + } >> + switch (get_attr_length (insn)) >> + { >> + // Thumb2 16-bit b{cond} >> + case 2: >> + >> + // Thumb2 32-bit b{cond} >> + case 4: return \"b%d1\\t%l0\";break; >> + >> + // Thumb2 b{cond} out of range. Use unconditional branch. >> + case 8: return arm_gen_far_branch \ >> + (operands, 0, \"Lbcond\", \"b%D1\t\"); >> + break; >> + >> + // A32 b{cond} >> + defau
Re: [PATCH][gas] Implement .cfi_negate_ra_state directive
Sorry, wrong list address from my side, please ignore. Kyrill On 12/5/19 10:59 AM, Kyrill Tkachov wrote: Hi all, This patch implements the .cfi_negate_ra_state to be consistent with LLVM (https://reviews.llvm.org/D50136). The relevant DWARF code DW_CFA_AARCH64_negate_ra_state is multiplexed on top of DW_CFA_GNU_window_save, as per https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00753.html I believe this is the simplest patch implementing this and is needed to allow users to build, for example, the Linux kernel with Armv8.3-A pointer authentication support with Clang while using gas as the assembler, which is a common usecase. Tested gas aarch64-none-elf. Ok for master and the release branches? Thanks, Kyrill gas/ 2019-12-05 Kyrylo Tkachov * dw2gencfi.c (cfi_pseudo_table): Add cfi_negate_ra_state. * testsuite/gas/aarch64/pac_negate_ra_state.s: New file. * testsuite/gas/aarch64/pac_negate_ra_state.d: Likewise.
[PATCH][gas] Implement .cfi_negate_ra_state directive
Hi all, This patch implements the .cfi_negate_ra_state to be consistent with LLVM (https://reviews.llvm.org/D50136). The relevant DWARF code DW_CFA_AARCH64_negate_ra_state is multiplexed on top of DW_CFA_GNU_window_save, as per https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00753.html I believe this is the simplest patch implementing this and is needed to allow users to build, for example, the Linux kernel with Armv8.3-A pointer authentication support with Clang while using gas as the assembler, which is a common usecase. Tested gas aarch64-none-elf. Ok for master and the release branches? Thanks, Kyrill gas/ 2019-12-05 Kyrylo Tkachov * dw2gencfi.c (cfi_pseudo_table): Add cfi_negate_ra_state. * testsuite/gas/aarch64/pac_negate_ra_state.s: New file. * testsuite/gas/aarch64/pac_negate_ra_state.d: Likewise. diff --git a/gas/dw2gencfi.c b/gas/dw2gencfi.c index 6c0478a72063801f1f91441a11350daa94605843..707830cbe82f860d21c3b9b8f7cbe1999568398b 100644 --- a/gas/dw2gencfi.c +++ b/gas/dw2gencfi.c @@ -726,6 +726,7 @@ const pseudo_typeS cfi_pseudo_table[] = { "cfi_remember_state", dot_cfi, DW_CFA_remember_state }, { "cfi_restore_state", dot_cfi, DW_CFA_restore_state }, { "cfi_window_save", dot_cfi, DW_CFA_GNU_window_save }, +{ "cfi_negate_ra_state", dot_cfi, DW_CFA_AARCH64_negate_ra_state }, { "cfi_escape", dot_cfi_escape, 0 }, { "cfi_signal_frame", dot_cfi, CFI_signal_frame }, { "cfi_personality", dot_cfi_personality, 0 }, diff --git a/gas/testsuite/gas/aarch64/pac_negate_ra_state.d b/gas/testsuite/gas/aarch64/pac_negate_ra_state.d new file mode 100644 index ..7ab0f2369dece1a71fc064ae38f6e273128bf074 --- /dev/null +++ b/gas/testsuite/gas/aarch64/pac_negate_ra_state.d @@ -0,0 +1,26 @@ +#objdump: --dwarf=frames + +.+: file .+ + +Contents of the .eh_frame section: + + 0010 CIE + Version: 1 + Augmentation: "zR" + Code alignment factor: 4 + Data alignment factor: -8 + Return address column: 30 + Augmentation data: 1b + DW_CFA_def_cfa: r31 \(sp\) ofs 0 + +0014 0018 0018 FDE cie= pc=..0008 + DW_CFA_advance_loc: 4 to 0004 + DW_CFA_GNU_window_save + DW_CFA_advance_loc: 4 to 0008 + DW_CFA_def_cfa_offset: 16 + DW_CFA_offset: r29 \(x29\) at cfa-16 + DW_CFA_offset: r30 \(x30\) at cfa-8 + DW_CFA_nop + DW_CFA_nop + + diff --git a/gas/testsuite/gas/aarch64/pac_negate_ra_state.s b/gas/testsuite/gas/aarch64/pac_negate_ra_state.s new file mode 100644 index ..36ddbeb43b7002a68eb6787a21599eb20d2b965e --- /dev/null +++ b/gas/testsuite/gas/aarch64/pac_negate_ra_state.s @@ -0,0 +1,20 @@ + .arch armv8-a + .text + .align 2 + .global _Z5foo_av + .type _Z5foo_av, %function +_Z5foo_av: +.LFB0: + .cfi_startproc + hint 25 // paciasp + .cfi_negate_ra_state + stp x29, x30, [sp, -16]! + .cfi_def_cfa_offset 16 + .cfi_offset 29, -16 + .cfi_offset 30, -8 + .cfi_endproc +.LFE0: + .size _Z5foo_av, .-_Z5foo_av + .align 2 + .global _Z5foo_bv + .type _Z5foo_bv, %function
Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores
On 12/3/19 1:45 PM, Wilco Dijkstra wrote: Hi, Part 2, split off from https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00399.html To enable cores to use the correct max_cond_insns setting, use the core-specific tuning when a CPU/tune is selected unless -mrestrict-it is explicitly set. On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a 0.4% codesize reduction. Bootstrapped on armhf. OK for commit? Ok. Thanks, Kyrill ChangeLog: 2019-12-03 Wilco Dijkstra * config/arm/arm.c (arm_option_override_internal): Use max_cond_insns from CPU tuning unless -mrestrict-it is used. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index daebe76352d62ad94556762b4e3bc3d0532ad411..5ed9046988996e56f754c5588e4d25d5ecdd6b03 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3041,6 +3041,11 @@ arm_option_override_internal (struct gcc_options *opts, if (!TARGET_THUMB2_P (opts->x_target_flags) || !arm_arch_notm) opts->x_arm_restrict_it = 0; + /* Use the IT size from CPU specific tuning unless -mrestrict-it is used. */ + if (!opts_set->x_arm_restrict_it + && (opts_set->x_arm_cpu_string || opts_set->x_arm_tune_string)) + opts->x_arm_restrict_it = 0; + /* Enable -munaligned-access by default for - all ARMv6 architecture-based processors when compiling for a 32-bit ISA i.e. Thumb2 and ARM state only.
Re: [PATCH][GCC8][AArch64] Backport Cortex-A76, Ares and Neoverse N1 cpu names
On 12/2/19 12:14 PM, Wilco Dijkstra wrote: Add support for Cortex-A76, Ares and Neoverse N1 cpu names in GCC8 branch. 2019-11-29 Wilco Dijkstra * config/aarch64/aarch64-cores.def (ares): Define. (cortex-a76): Likewise. (neoverse-n1): Likewise. * config/aarch64/aarch64-tune.md: Regenerate. * doc/invoke.texi (AArch64 Options): Document ares, cortex-a76 and neoverse-n1. Ok as it's very non-invasive and provides a convenience to users of that branch. Thanks, Kyrill -- diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 33b96ca2861dce506a854cff19cfcaa32f0db23a..f48b7c22b2d261203ac25c010a054e47c291ddfc 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -85,6 +85,9 @@ AARCH64_CORE("thunderx2t99", thunderx2t99, thunderx2t99, 8_1A, AARCH64_FL_FOR /* ARM ('A') cores. */ AARCH64_CORE("cortex-a55", cortexa55, cortexa53, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1) AARCH64_CORE("cortex-a75", cortexa75, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1) +AARCH64_CORE("cortex-a76", cortexa76, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0b, -1) +AARCH64_CORE("ares", ares, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0c, -1) +AARCH64_CORE("neoverse-n1", neoversen1,cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0c, -1) /* ARMv8.3-A Architecture Processors. */ diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index 7b3a7460561ee87e13799f726919c3f870781f6d..f08b7e44b27beeb41df928cf3aa09e59e734b5d2 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" -"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55" +"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,neoversen1,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c63f5611afb52b2358207a458dd6c275403a5a45..57340cea31df315ce37cfd57e084844da78df9fe 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14747,6 +14747,7 @@ Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55}, @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75}, +@samp{cortex-a76}, @samp{ares}, @samp{neoverse-n1} @samp{exynos-m1}, @samp{falkor}, @samp{qdf24xx}, @samp{saphira}, @samp{xgene1}, @samp{vulcan}, @samp{thunderx}, @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},
Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores
Hi Wilco, On 11/19/19 3:11 PM, Wilco Dijkstra wrote: ping Various CPUs have max_cond_insns set to 5 due to historical reasons. Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A cores, so change it to 2 for all Cortex-A cores. Hmm, I'm not too confident on that. I'd support such a change for the generic arm_cortex_tune, definitely, and the Armv8-a based ones, but I don't think the argument is as strong for Cortex-A7, Cortex-A8, Cortex-A9. So let's make the change for the Armv8-A-based cores now. If you get benchmarking data for the older ones (such systems may or may not be easy to get a hold of) we can update those separately. Set max_cond_insns to 4 on Thumb-2 architectures given it's already limited to that by MAX_INSN_PER_IT_BLOCK. Also use the CPU tuning setting when a CPU/tune is selected if -mrestrict-it is not explicitly set. This can go in as a separate patch from the rest, thanks. On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a 0.4% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-08-19 Wilco Dijkstra * gcc/config/arm/arm.c (arm_option_override_internal): Use max_cond_insns from CPU tuning unless -mrestrict-it is used. (arm_v6t2_tune): set max_cond_insns to 4. (arm_cortex_tune): set max_cond_insns to 2. (arm_cortex_a8_tune): Likewise. (arm_cortex_a7_tune): Likewise. (arm_cortex_a35_tune): Likewise. (arm_cortex_a53_tune): Likewise. (arm_cortex_a5_tune): Likewise. (arm_cortex_a9_tune): Likewise. (arm_v6m_tune): set max_cond_insns to 4. No "gcc/" in the ChangeLog path. Thanks, Kyrill --- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 628cf02f23fb29392a63d87f561c3ee2fb73a515..38ac16ad1def91ca78ccfa98fd1679b2b5114851 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1943,7 +1943,7 @@ const struct tune_params arm_v6t2_tune = arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 4, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1968,7 +1968,7 @@ const struct tune_params arm_cortex_tune = arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -1991,7 +1991,7 @@ const struct tune_params arm_cortex_a8_tune = arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2014,7 +2014,7 @@ const struct tune_params arm_cortex_a7_tune = arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2060,7 +2060,7 @@ const struct tune_params arm_cortex_a35_tune = arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 1, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2083,7 +2083,7 @@ const struct tune_params arm_cortex_a53_tune = arm_default_branch_cost, &arm_default_vec_cost, 1, /* Constant limit. */ - 5, /* Max cond insns. */ + 2, /* Max cond insns. */ 8, /* Memset max inline. */ 2, /* Issue rate. */ ARM_PREFETCH_NOT_BENEFICIAL, @@ -2167,9 +2167,6 @@ const
Re: [GCC][ARM]: Fix the failing ACLE testcase with correct test directive.
Hi Srinath, On 11/21/19 4:32 PM, Srinath Parvathaneni wrote: Hello, This patch fixes arm acle testcase crc_hf_1.c by modifying the compiler options directive. Regression tested on arm-none-eabi and found no regressions. Ok for trunk? If ok, please commit on my behalf, I don't have the commit rights. This is ok. I see Matthew has already committed it, which is fine. It's an obvious patch. Thanks, Kyrill Thanks, Srinath. gcc/testsuite/ChangeLog: 2019-11-21 Srinath Parvathaneni * gcc.target/arm/acle/crc_hf_1.c: Modify the compiler options directive from dg-options to dg-additional-options. ### Attachment also inlined for ease of reply ### diff --git a/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c b/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c index e6cbfc0b33e56e4275b96978ca1823d7682792fb..f1de2bdffee41a0f3259e2bf00296e9c3218f548 100644 --- a/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c +++ b/gcc/testsuite/gcc.target/arm/acle/crc_hf_1.c @@ -3,7 +3,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_hard_vfp_ok } */ -/* { dg-options "-mfloat-abi=hard -march=armv8-a+simd+crc" } */ +/* { dg-additional-options "-mfloat-abi=hard -march=armv8-a+simd+crc" } */ #include
Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics
On 11/19/19 1:41 PM, Dennis Zhang wrote: Hi Kyrill, On 19/11/2019 11:21, Kyrill Tkachov wrote: Hi Dennis, On 11/12/19 5:32 PM, Dennis Zhang wrote: Hi Kyrill, On 12/11/2019 15:57, Kyrill Tkachov wrote: On 11/12/19 3:50 PM, Dennis Zhang wrote: Hi Kyrill, On 12/11/2019 09:40, Kyrill Tkachov wrote: Hi Dennis, On 11/7/19 1:48 PM, Dennis Zhang wrote: Hi Kyrill, I have rebased the patch on top of current truck. For resolve_overloaded, I redefined my memtag overloading function to fit the latest resolve_overloaded_builtin interface. Regression tested again and survived for aarch64-none-linux-gnu. Please reply inline rather than top-posting on gcc-patches. Cheers Dennis Changelog is updated as following: gcc/ChangeLog: 2019-11-07 Dennis Zhang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG, AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP, AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG, AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END. (aarch64_init_memtag_builtins): New. (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro. (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins. (aarch64_expand_builtin_memtag): New. (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag. (AARCH64_BUILTIN_SUBCODE): New macro. (aarch64_resolve_overloaded_memtag): New. (aarch64_resolve_overloaded_builtin_general): New hook. Call aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define __ARM_FEATURE_MEMORY_TAGGING when enabled. (aarch64_resolve_overloaded_builtin): Call aarch64_resolve_overloaded_builtin_general. * config/aarch64/aarch64-protos.h (aarch64_resolve_overloaded_builtin_general): New declaration. * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro. (TARGET_MEMTAG): Likewise. * config/aarch64/aarch64.md (define_c_enum "unspec"): Add UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE. (irg, gmi, subp, addg, ldg, stg): New instructions. * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro. (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise. (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise. * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New. (aarch64_granule16_uimm6, aarch64_granule16_simm9): New. * config/arm/types.md (memtag): New. * doc/invoke.texi (-memtag): Update description. gcc/testsuite/ChangeLog: 2019-11-07 Dennis Zhang * gcc.target/aarch64/acle/memtag_1.c: New test. * gcc.target/aarch64/acle/memtag_2.c: New test. * gcc.target/aarch64/acle/memtag_3.c: New test. On 04/11/2019 16:40, Kyrill Tkachov wrote: Hi Dennis, On 10/17/19 11:03 AM, Dennis Zhang wrote: Hi, Arm Memory Tagging Extension (MTE) is published with Armv8.5-A. It can be used for spatial and temporal memory safety detection and lightweight lock and key system. This patch enables new intrinsics leveraging MTE instructions to implement functionalities of creating tags, setting tags, reading tags, and manipulating tags. The intrinsics are part of Arm ACLE extension: https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics The MTE ISA specification can be found at https://developer.arm.com/docs/ddi0487/latest chapter D6. Bootstraped and regtested for aarch64-none-linux-gnu. Please help to check if it's OK for trunk. This looks mostly ok to me but for further review this needs to be rebased on top of current trunk as there are some conflicts with the SVE ACLE changes that recently went in. Most conflicts looks trivial to resolve but one that needs more attention is the definition of the TARGET_RESOLVE_OVERLOADED_BUILTIN hook. Thanks, Kyrill Many Thanks Dennis gcc/ChangeLog: 2019-10-16 Dennis Zhang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG, AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP, AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG, AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END. (aarch64_init_memtag_builtins): New. (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro. (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins. (aarch64_expand_builtin_memtag): New. (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag. (AARCH64_BUILTIN_SUBCODE): New macro. (aarch64_resolve_overloaded_memtag): New. (aarch64_resolve_overloaded_builtin):
Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics
Hi Dennis, On 11/12/19 5:32 PM, Dennis Zhang wrote: Hi Kyrill, On 12/11/2019 15:57, Kyrill Tkachov wrote: On 11/12/19 3:50 PM, Dennis Zhang wrote: Hi Kyrill, On 12/11/2019 09:40, Kyrill Tkachov wrote: Hi Dennis, On 11/7/19 1:48 PM, Dennis Zhang wrote: Hi Kyrill, I have rebased the patch on top of current truck. For resolve_overloaded, I redefined my memtag overloading function to fit the latest resolve_overloaded_builtin interface. Regression tested again and survived for aarch64-none-linux-gnu. Please reply inline rather than top-posting on gcc-patches. Cheers Dennis Changelog is updated as following: gcc/ChangeLog: 2019-11-07 Dennis Zhang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG, AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP, AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG, AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END. (aarch64_init_memtag_builtins): New. (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro. (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins. (aarch64_expand_builtin_memtag): New. (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag. (AARCH64_BUILTIN_SUBCODE): New macro. (aarch64_resolve_overloaded_memtag): New. (aarch64_resolve_overloaded_builtin_general): New hook. Call aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define __ARM_FEATURE_MEMORY_TAGGING when enabled. (aarch64_resolve_overloaded_builtin): Call aarch64_resolve_overloaded_builtin_general. * config/aarch64/aarch64-protos.h (aarch64_resolve_overloaded_builtin_general): New declaration. * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro. (TARGET_MEMTAG): Likewise. * config/aarch64/aarch64.md (define_c_enum "unspec"): Add UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE. (irg, gmi, subp, addg, ldg, stg): New instructions. * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro. (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise. (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise. * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New. (aarch64_granule16_uimm6, aarch64_granule16_simm9): New. * config/arm/types.md (memtag): New. * doc/invoke.texi (-memtag): Update description. gcc/testsuite/ChangeLog: 2019-11-07 Dennis Zhang * gcc.target/aarch64/acle/memtag_1.c: New test. * gcc.target/aarch64/acle/memtag_2.c: New test. * gcc.target/aarch64/acle/memtag_3.c: New test. On 04/11/2019 16:40, Kyrill Tkachov wrote: Hi Dennis, On 10/17/19 11:03 AM, Dennis Zhang wrote: Hi, Arm Memory Tagging Extension (MTE) is published with Armv8.5-A. It can be used for spatial and temporal memory safety detection and lightweight lock and key system. This patch enables new intrinsics leveraging MTE instructions to implement functionalities of creating tags, setting tags, reading tags, and manipulating tags. The intrinsics are part of Arm ACLE extension: https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics The MTE ISA specification can be found at https://developer.arm.com/docs/ddi0487/latest chapter D6. Bootstraped and regtested for aarch64-none-linux-gnu. Please help to check if it's OK for trunk. This looks mostly ok to me but for further review this needs to be rebased on top of current trunk as there are some conflicts with the SVE ACLE changes that recently went in. Most conflicts looks trivial to resolve but one that needs more attention is the definition of the TARGET_RESOLVE_OVERLOADED_BUILTIN hook. Thanks, Kyrill Many Thanks Dennis gcc/ChangeLog: 2019-10-16 Dennis Zhang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG, AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP, AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG, AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END. (aarch64_init_memtag_builtins): New. (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro. (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins. (aarch64_expand_builtin_memtag): New. (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag. (AARCH64_BUILTIN_SUBCODE): New macro. (aarch64_resolve_overloaded_memtag): New. (aarch64_resolve_overloaded_builtin): New hook. Call aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins. * config/aarch64/aarch64-c.c (aarch64_update_cpp_bu
Re: [GCC][PATCH][AArch64] Update hwcap string for fp16fml in aarch64-option-extensions.def
On 11/18/19 12:54 PM, Tamar Christina wrote: OK to backport to GCC 9? Yes. Thanks, Kyrill Thanks, Tamar -Original Message- From: gcc-patches-ow...@gcc.gnu.org On Behalf Of Kyrill Tkachov Sent: Tuesday, September 24, 2019 14:32 To: Stam Markianos-Wright ; gcc- patc...@gcc.gnu.org Cc: nd ; Richard Earnshaw ; James Greenhalgh ; Marcus Shawcroft Subject: Re: [GCC][PATCH][AArch64] Update hwcap string for fp16fml in aarch64-option-extensions.def Hi all, On 9/10/19 1:34 PM, Stam Markianos-Wright wrote: Hi all, This is a minor patch that fixes the entry for the fp16fml feature in GCC's aarch64-option-extensions.def. As can be seen in the Linux sources here https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cpuinf o.c#L69 the correct string is "asimdfhm", not "asimdfml". Cross-compiled and tested on aarch64-none-linux-gnu. Is this ok for trunk? Also, I don't have commit rights, so could someone commit it on my behalf? James approved it offline so I've committed it on Stam's behalf as r276097 with a slightly adjusted ChangeLog: 2019-09-24 Stamatis Markianos-Wright * config/aarch64/aarch64-option-extensions.def (fp16fml): Update hwcap string for fp16fml. Thanks, Kyrill Thanks, Stam Markianos-Wright The diff is: diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 9919edd43d0..60e8f28fff5 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -135,7 +135,7 @@ AARCH64_OPT_EXTENSION("sm4", AARCH64_FL_SM4, AARCH64_FL_SIMD, \ /* Enabling "fp16fml" also enables "fp" and "fp16". Disabling "fp16fml" just disables "fp16fml". */ AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, \ - AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfml") + AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfhm") /* Enabling "sve" also enables "fp16", "fp" and "simd". Disabling "sve" disables "sve", "sve2", "sve2-aes", "sve2-sha3", "sve2-sm4" gcc/ChangeLog: 2019-09-09 Stamatis Markianos-Wright * config/aarch64/aarch64-option-extensions.def: Updated hwcap string for fp16fml.
Re: [SVE] PR89007 - Implement generic vector average expansion
Hi Prathamesh, On 11/14/19 6:47 PM, Prathamesh Kulkarni wrote: Hi, As suggested in PR, the attached patch falls back to distributing rshift over plus_expr instead of fallback widening -> arithmetic -> narrowing sequence, if target support is not available. Bootstrap+tested on x86_64-unknown-linux-gnu and aarch64-linux-gnu. OK to commit ? Thanks, Prathamesh diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c new file mode 100644 index 000..b682f3f3b74 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +#define N 1024 +unsigned char dst[N]; +unsigned char in1[N]; +unsigned char in2[N]; + +void +foo () +{ + for( int x = 0; x < N; x++ ) +dst[x] = (in1[x] + in2[x] + 1) >> 1; +} + +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */ +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */ I think you'll want to make the test a bit strong to test the actual instructions expected here. You'll also want to test the IFN_AVG_FLOOR case, as your patch adds support for it too. Thanks, Kyrill diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 8ebbcd76b64..7025a3b4dc2 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info last_stmt_info, tree *type_out) /* Check for target support. */ tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type); - if (!new_vectype - || !direct_internal_fn_supported_p (ifn, new_vectype, - OPTIMIZE_FOR_SPEED)) + + if (!new_vectype) return NULL; + bool ifn_supported += direct_internal_fn_supported_p (ifn, new_vectype, OPTIMIZE_FOR_SPEED); + /* The IR requires a valid vector type for the cast result, even though it's likely to be discarded. */ *type_out = get_vectype_for_scalar_type (vinfo, type); if (!*type_out) return NULL; - /* Generate the IFN_AVG* call. */ tree new_var = vect_recog_temp_ssa_var (new_type, NULL); tree new_ops[2]; vect_convert_inputs (last_stmt_info, 2, new_ops, new_type, unprom, new_vectype); + + if (!ifn_supported) +{ + /* If there is no target support available, generate code +to distribute rshift over plus and add one depending +upon floor or ceil rounding. */ + + tree one_cst = build_one_cst (new_type); + + tree tmp1 = vect_recog_temp_ssa_var (new_type, NULL); + gassign *g1 = gimple_build_assign (tmp1, RSHIFT_EXPR, new_ops[0], one_cst); + + tree tmp2 = vect_recog_temp_ssa_var (new_type, NULL); + gassign *g2 = gimple_build_assign (tmp2, RSHIFT_EXPR, new_ops[1], one_cst); + + tree tmp3 = vect_recog_temp_ssa_var (new_type, NULL); + gassign *g3 = gimple_build_assign (tmp3, PLUS_EXPR, tmp1, tmp2); + + tree tmp4 = vect_recog_temp_ssa_var (new_type, NULL); + tree_code c = (ifn == IFN_AVG_CEIL) ? BIT_IOR_EXPR : BIT_AND_EXPR; + gassign *g4 = gimple_build_assign (tmp4, c, new_ops[0], new_ops[1]); + + tree tmp5 = vect_recog_temp_ssa_var (new_type, NULL); + gassign *g5 = gimple_build_assign (tmp5, BIT_AND_EXPR, tmp4, one_cst); + + gassign *g6 = gimple_build_assign (new_var, PLUS_EXPR, tmp3, tmp5); + + append_pattern_def_seq (last_stmt_info, g1, new_vectype); + append_pattern_def_seq (last_stmt_info, g2, new_vectype); + append_pattern_def_seq (last_stmt_info, g3, new_vectype); + append_pattern_def_seq (last_stmt_info, g4, new_vectype); + append_pattern_def_seq (last_stmt_info, g5, new_vectype); + return vect_convert_output (last_stmt_info, type, g6, new_vectype); +} + + /* Generate the IFN_AVG* call. */ gcall *average_stmt = gimple_build_call_internal (ifn, 2, new_ops[0], new_ops[1]); gimple_call_set_lhs (average_stmt, new_var);
Re: [PATCH v2 0/6] Implement asm flag outputs for arm + aarch64
Hi Richard, On 11/14/19 10:07 AM, Richard Henderson wrote: I've put the implementation into config/arm/aarch-common.c, so that it can be shared between the two targets. This required a little bit of cleanup to the CC modes and constraints to get the two targets to match up. Changes for v2: * Document overflow flags. * Add "hs" and "lo" as aliases of "cs" and "cc". * Add unsigned cmp tests to asm-flag-6.c. Richard Sandiford has given his ack for the aarch64 side. I'm still looking for an ack for the arm side. The arm parts look good to me, there's not too much arm-specific stuff that's not shared with aarch64 thankfully. Thanks, Kyrill r~ Richard Henderson (6): aarch64: Add "c" constraint arm: Fix the "c" constraint arm: Rename CC_NOOVmode to CC_NZmode arm, aarch64: Add support for __GCC_ASM_FLAG_OUTPUTS__ arm: Add testsuite checks for asm-flag aarch64: Add testsuite checks for asm-flag gcc/config/arm/aarch-common-protos.h | 6 + gcc/config/aarch64/aarch64-c.c | 2 + gcc/config/aarch64/aarch64.c | 3 + gcc/config/arm/aarch-common.c | 136 + gcc/config/arm/arm-c.c | 1 + gcc/config/arm/arm.c | 15 +- gcc/testsuite/gcc.target/aarch64/asm-flag-1.c | 35 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c | 38 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c | 30 +++ gcc/testsuite/gcc.target/aarch64/asm-flag-6.c | 62 ++ gcc/testsuite/gcc.target/arm/asm-flag-1.c | 36 gcc/testsuite/gcc.target/arm/asm-flag-3.c | 38 gcc/testsuite/gcc.target/arm/asm-flag-5.c | 30 +++ gcc/testsuite/gcc.target/arm/asm-flag-6.c | 62 ++ gcc/config/aarch64/constraints.md | 4 + gcc/config/arm/arm-modes.def | 4 +- gcc/config/arm/arm.md | 186 +- gcc/config/arm/constraints.md | 5 +- gcc/config/arm/predicates.md | 2 +- gcc/config/arm/thumb1.md | 8 +- gcc/config/arm/thumb2.md | 34 ++-- gcc/doc/extend.texi | 39 22 files changed, 651 insertions(+), 125 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-6.c create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-1.c create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-3.c create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-5.c create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-6.c -- 2.17.1
Re: [PATCH v2 2/6] arm: Fix the "c" constraint
On 11/14/19 10:07 AM, Richard Henderson wrote: The existing definition using register class CC_REG does not work because CC_REGNUM does not support normal modes, and so fails to match register_operand. Use a non-register constraint and the cc_register predicate instead. * config/arm/constraints.md (c): Use cc_register predicate. Ok. Does this need a backport to the branches? Thanks, Kyrill --- gcc/config/arm/constraints.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index b76de81b85c..e02b678d26d 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -94,8 +94,9 @@ "@internal Thumb only. The union of the low registers and the stack register.") -(define_register_constraint "c" "CC_REG" - "@internal The condition code register.") +(define_constraint "c" + "@internal The condition code register." + (match_operand 0 "cc_register")) (define_register_constraint "Cs" "CALLER_SAVE_REGS" "@internal The caller save registers. Useful for sibcalls.") -- 2.17.1
Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics
On 11/12/19 3:50 PM, Dennis Zhang wrote: Hi Kyrill, On 12/11/2019 09:40, Kyrill Tkachov wrote: Hi Dennis, On 11/7/19 1:48 PM, Dennis Zhang wrote: Hi Kyrill, I have rebased the patch on top of current truck. For resolve_overloaded, I redefined my memtag overloading function to fit the latest resolve_overloaded_builtin interface. Regression tested again and survived for aarch64-none-linux-gnu. Please reply inline rather than top-posting on gcc-patches. Cheers Dennis Changelog is updated as following: gcc/ChangeLog: 2019-11-07 Dennis Zhang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG, AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP, AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG, AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END. (aarch64_init_memtag_builtins): New. (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro. (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins. (aarch64_expand_builtin_memtag): New. (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag. (AARCH64_BUILTIN_SUBCODE): New macro. (aarch64_resolve_overloaded_memtag): New. (aarch64_resolve_overloaded_builtin_general): New hook. Call aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define __ARM_FEATURE_MEMORY_TAGGING when enabled. (aarch64_resolve_overloaded_builtin): Call aarch64_resolve_overloaded_builtin_general. * config/aarch64/aarch64-protos.h (aarch64_resolve_overloaded_builtin_general): New declaration. * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro. (TARGET_MEMTAG): Likewise. * config/aarch64/aarch64.md (define_c_enum "unspec"): Add UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE. (irg, gmi, subp, addg, ldg, stg): New instructions. * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro. (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise. (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise. * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New. (aarch64_granule16_uimm6, aarch64_granule16_simm9): New. * config/arm/types.md (memtag): New. * doc/invoke.texi (-memtag): Update description. gcc/testsuite/ChangeLog: 2019-11-07 Dennis Zhang * gcc.target/aarch64/acle/memtag_1.c: New test. * gcc.target/aarch64/acle/memtag_2.c: New test. * gcc.target/aarch64/acle/memtag_3.c: New test. On 04/11/2019 16:40, Kyrill Tkachov wrote: Hi Dennis, On 10/17/19 11:03 AM, Dennis Zhang wrote: Hi, Arm Memory Tagging Extension (MTE) is published with Armv8.5-A. It can be used for spatial and temporal memory safety detection and lightweight lock and key system. This patch enables new intrinsics leveraging MTE instructions to implement functionalities of creating tags, setting tags, reading tags, and manipulating tags. The intrinsics are part of Arm ACLE extension: https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics The MTE ISA specification can be found at https://developer.arm.com/docs/ddi0487/latest chapter D6. Bootstraped and regtested for aarch64-none-linux-gnu. Please help to check if it's OK for trunk. This looks mostly ok to me but for further review this needs to be rebased on top of current trunk as there are some conflicts with the SVE ACLE changes that recently went in. Most conflicts looks trivial to resolve but one that needs more attention is the definition of the TARGET_RESOLVE_OVERLOADED_BUILTIN hook. Thanks, Kyrill Many Thanks Dennis gcc/ChangeLog: 2019-10-16 Dennis Zhang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG, AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP, AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG, AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END. (aarch64_init_memtag_builtins): New. (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro. (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins. (aarch64_expand_builtin_memtag): New. (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag. (AARCH64_BUILTIN_SUBCODE): New macro. (aarch64_resolve_overloaded_memtag): New. (aarch64_resolve_overloaded_builtin): New hook. Call aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define __ARM_FEATURE_MEMORY_TAGGING when enabled. * config/aarch64/aarch64-protos.h (aarch64_resolve_overloaded_builtin): Add declaration. *
Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics
Hi Christophe, On 11/12/19 10:29 AM, Christophe Lyon wrote: On Thu, 7 Nov 2019 at 11:26, Kyrill Tkachov wrote: Hi all, This patch adds the plumbing for and an implementation of the saturation intrinsics from ACLE [1], in particular the __ssat, __usat intrinsics. These intrinsics set the Q sticky bit in APSR if an overflow occurred. ACLE allows the user to read that bit (within the same function, it's not defined across function boundaries) using the __saturation_occurred intrinsic and reset it using __set_saturation_occurred. Thus, if the user cares about the Q bit they would be using a flow such as: __set_saturation_occurred (0); // reset the Q bit ... __ssat (...) // Do some calculations involving __ssat ... if (__saturation_occurred ()) // if Q bit set handle overflow ... For the implementation this has a few implications: * We must track the Q-setting side-effects of these instructions to make sure saturation reading/writing intrinsics are ordered properly. This is done by introducing a new "apsrq" register (and associated APSRQ_REGNUM) in a similar way to the "fake"" cc register. * The RTL patterns coming out of these intrinsics can have two forms: one where they set the APSRQ_REGNUM and one where they don't. Which one is used depends on whether the function cares about reading the Q flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the __saturation_occurred, __set_saturation_occurred occurrences. If no Q-flag read is present in the function we'll use the simpler non-Q-setting form to allow for more aggressive scheduling and such. If a Q-bit read is present then the Q-setting form is emitted. To avoid adding two patterns for each intrinsic to the MD file we make use of define_subst to auto-generate the Q-setting forms * Some existing patterns already produce instructions that may clobber the Q bit, but they don't model it (as we didn't care about that bit up till now). Since these patterns can be generated from straight-line C code they can affect the Q-bit reads from intrinsics. Therefore they have to be disabled when a Q-bit read is present. These are mostly patterns in arm-fixed.md that are not very common anyway, but there are also a couple of widening multiply-accumulate patterns in arm.md that can set the Q-bit during accumulation. There are more Q-setting intrinsics in ACLE, but these will be implemented in a more mechanical fashion once the infrastructure in this patch goes in. Bootstrapped and tested on arm-none-linux-gnueabihf. Committing to trunk. Thanks, Kyrill 2019-11-07 Kyrylo Tkachov * config/arm/aout.h (REGISTER_NAMES): Add apsrq. * config/arm/arm.md (APSRQ_REGNUM): Define. (add_setq): New define_subst. (add_clobber_q_name): New define_subst_attr. (add_clobber_q_pred): Likewise. (maddhisi4): Change to define_expand. Split into mult and add if ARM_Q_BIT_READ. (arm_maddhisi4): New define_insn. (*maddhisi4tb): Disable for ARM_Q_BIT_READ. (*maddhisi4tt): Likewise. (arm_ssat): New define_expand. (arm_usat): Likewise. (arm_get_apsr): New define_insn. (arm_set_apsr): Likewise. (arm_saturation_occurred): New define_expand. (arm_set_saturation): Likewise. (*satsi_): Rename to... (satsi_): ... This. (*satsi__shift): Disable for ARM_Q_BIT_READ. * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed. (CALL_USED_REGISTERS): Mark apsrq. (FIRST_PSEUDO_REGISTER): Update value. (REG_ALLOC_ORDER): Add APSRQ_REGNUM. (machine_function): Add q_bit_access. (ARM_Q_BIT_READ): Define. * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define. (arm_conditional_register_usage): Clear APSRQ_REGNUM from operand_reg_set. (arm_q_bit_access): Define. * config/arm/arm-builtins.c: Include stringpool.h. (arm_sat_binop_imm_qualifiers, arm_unsigned_sat_binop_unsigned_imm_qualifiers, arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define. (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS, SET_SAT_QUALIFIERS): Likewise. (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK. (arm_init_acle_builtins): Initialize __builtin_sat_imm_check. Handle 0 argument expander. (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK. (arm_check_builtin_call): Define. * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3, arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ. * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype. (arm_q_bit_access): Likewise. * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation, __saturation_occurred, __set_saturation_occurred): Define. * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat, saturation_occurred, set_
Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)
Hi Christophe, On 10/18/19 2:18 PM, Christophe Lyon wrote: Hi, This patch extends support for -mpure-code to all thumb-1 processors, by removing the need for MOVT. Symbol addresses are built using upper8_15, upper0_7, lower8_15 and lower0_7 relocations, and constants are built using sequences of movs/adds and lsls instructions. The extension of the *thumb1_movhf pattern uses always the same size (6) although it can emit a shorter sequence when possible. This is similar to what *arm32_movhf already does. CASE_VECTOR_PC_RELATIVE is now false with -mpure-code, to avoid generating invalid assembly code with differences from symbols from two different sections (the difference cannot be computed by the assembler). Tests pr45701-[12].c needed a small adjustment to avoid matching upper8_15 when looking for the r8 register. Test no-literal-pool.c is augmented with __fp16, so it now uses -mfp16-format=ieee. Test thumb1-Os-mult.c generates an inline code sequence with -mpure-code and computes the multiplication by using a sequence of add/shift rather than using the multiply instruction, so we skip it in presence of -mpure-code. With -mcpu=cortex-m0, the pure-code/no-literal-pool.c fails because code like: static char *p = "Hello World"; char * testchar () { return p + 4; } generates 2 indirections (I removed non-essential directives/code) .section .rodata .LC0: .ascii "Hello World\000" .data p: .word .LC0 .section .rodata .LC2: .word p .section .text,"0x2006",%progbits testchar: push {r7, lr} add r7, sp, #0 movs r3, #:upper8_15:#.LC2 lsls r3, #8 adds r3, #:upper0_7:#.LC2 lsls r3, #8 adds r3, #:lower8_15:#.LC2 lsls r3, #8 adds r3, #:lower0_7:#.LC2 ldr r3, [r3] ldr r3, [r3] adds r3, r3, #4 movs r0, r3 mov sp, r7 @ sp needed pop {r7, pc} By contrast, when using -mcpu=cortex-m4, the code looks like: .section .rodata .LC0: .ascii "Hello World\000" .data p: .word .LC0 testchar: push {r7} add r7, sp, #0 movw r3, #:lower16:p movt r3, #:upper16:p ldr r3, [r3] adds r3, r3, #4 mov r0, r3 mov sp, r7 pop {r7} bx lr I haven't found yet how to make code for cortex-m0 apply upper/lower relocations to "p" instead of .LC2. The current code looks functional, but could be improved. OK as-is? Thanks, Christophe diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index f995974..beb8411 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -66,6 +66,7 @@ extern bool arm_small_register_classes_for_mode_p (machine_mode); extern int const_ok_for_arm (HOST_WIDE_INT); extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code); extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code); +extern void thumb1_gen_const_int (rtx, HOST_WIDE_INT); extern int arm_split_constant (RTX_CODE, machine_mode, rtx, HOST_WIDE_INT, rtx, rtx, int); extern int legitimate_pic_operand_p (rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9f0975d..836f147 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2882,13 +2882,19 @@ arm_option_check_internal (struct gcc_options *opts) { const char *flag = (target_pure_code ? "-mpure-code" : "-mslow-flash-data"); + bool not_supported = arm_arch_notm || flag_pic || TARGET_NEON; - /* We only support -mpure-code and -mslow-flash-data on M-profile targets -with MOVT. */ - if (!TARGET_HAVE_MOVT || arm_arch_notm || flag_pic || TARGET_NEON) + /* We only support -mslow-flash-data on M-profile targets with +MOVT. */ + if (target_slow_flash_data && (!TARGET_HAVE_MOVT || not_supported)) error ("%s only supports non-pic code on M-profile targets with the " "MOVT instruction", flag); + /* We only support -mpure-code-flash-data on M-profile +targets. */ Typo in the option name. + if (target_pure_code && not_supported) + error ("%s only supports non-pic code on M-profile targets", flag); + /* Cannot load addresses: -mslow-flash-data forbids literal pool and -mword-relocations forbids relocation of MOVT/MOVW. */ if (target_word_relocations) @@ -4400,6 +4406,38 @@ const_ok_for_dimode_op (HOST_WIDE_INT i, enum rtx_code code) } } +/* Emit a sequence of movs/adds/shift to produce a 32-bit constant. + Avoid generating useless code when one of the bytes is zero. */ +void +thumb1_gen_const_int (rtx op0, HOST_WIDE_INT op1) +{ + bool mov_done_p = false; + int i; + + /* Emit upper 3 bytes if needed. */ + for (i = 0; i < 3; i++) +{ + int byte = (op1 >> (8 * (3 - i))) & 0xff; + + if (byte) + { + emit_set_insn (op0, mov_done_p +? gen_rtx_PLUS (SImode,op0, GEN_INT (byte)) +: GEN_INT (byte)); + mov_done_p = tr
Re: [PATCH, GCC/ARM, 10/10] Enable -mcmse
On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 10/10] Enable -mcmse Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to enable the -mcmse option now that support for Armv8.1-M Security Extension is complete. === Patch description === The patch is straightforward: it redefines ARMv8_1m_main as having the same features as ARMv8m_main (and thus as having the cmse feature) with the extra features represented by armv8_1m_main. It also removes the error for using -mcmse on Armv8.1-M Mainline. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm-cpus.in (ARMv8_1m_main): Redefine as an extension to Armv8-M Mainline. * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Remove error for using -mcmse when targeting Armv8.1-M Mainline. Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Ok once the rest of the series is in. Does this need some documentation though? Thanks, Kyrill Best regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index 652f2a4be9388fd7a74f0ec4615a292fd1cfcd36..a845dd2f83a38519a1387515a2d4646761fb405f 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -259,10 +259,7 @@ define fgroup ARMv8_5a ARMv8_4a armv8_5 sb predres define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv define fgroup ARMv8m_main ARMv7m armv8 cmse define fgroup ARMv8r ARMv8a -# Feature cmse is omitted to disable Security Extensions support while secure -# code compiled by GCC does not preserve FP context as allowed by Armv8.1-M -# Mainline. -define fgroup ARMv8_1m_main ARMv7m armv8 armv8_1m_main +define fgroup ARMv8_1m_main ARMv8m_main armv8_1m_main # Useful combinations. define fgroup VFPv2 vfpv2 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index cabcce8c8bd11c5ff3516c3102c0305b865b00cb..0f19b4eb4ec4fcca2df10e1b8e0b79d1a1e0a93d 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3742,9 +3742,6 @@ arm_options_perform_arch_sanity_checks (void) if (!arm_arch4 && arm_fp16_format != ARM_FP16_FORMAT_NONE) sorry ("__fp16 and no ldrh"); - if (use_cmse && arm_arch8_1m_main) - error ("ARMv8.1-M Mainline Security Extensions is unsupported"); - if (use_cmse && !arm_arch_cmse) error ("target CPU does not support ARMv8-M Security Extensions");
Re: [PATCH, GCC/ARM, 9/10] Call nscall function with blxns
On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 9/10] Call nscall function with blxns Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to call functions with the cmse_nonsecure_call attribute directly using blxns with no undue restriction on the register used for that. === Patch description === This change to use BLXNS to call a nonsecure function from secure directly (not using a libcall) is made in 2 steps: - change nonsecure_call patterns to use blxns instead of calling __gnu_cmse_nonsecure_call - loosen requirement for function address to allow any register when doing BLXNS. The former is a straightforward check over whether instructions added in Armv8.1-M Mainline are available while the latter consist in making the nonsecure call pattern accept any register by using match_operand and changing the nonsecure_call_internal expander to no force r4 when targeting Armv8.1-M Mainline. The tricky bit is actually in the test update, specifically how to check that register lists for CLRM have all registers except for the one holding parameters (already done) and the one holding the address used by BLXNS. This is achieved with 3 scan-assembler directives. 1) The first one lists all registers that can appear in CLRM but make each of them optional. Property guaranteed: no wrong register is cleared and none appears twice in the register list. 2) The second directive check that the CLRM is made of a fixed number of the right registers to be cleared. The number used is the number of registers that could contain a secret minus one (used to hold the address of the function to call. Property guaranteed: register list has the right number of registers Cumulated property guaranteed: only registers with a potential secret are cleared and they are all listed but ont 3) The last directive checks that we cannot find a CLRM with a register in it that also appears in BLXNS. This is check via the use of a back-reference on any of the allowed register in CLRM, the back-reference enforcing that whatever register match in CLRM must be the same in the BLXNS. Property guaranteed: register used for BLXNS is different from registers cleared in CLRM. Some more care needs to happen for the gcc.target/arm/cmse/cmse-1.c testcase due to there being two CLRM generated. To ensure the third directive match the right CLRM to the BLXNS, a negative lookahead is used between the CLRM register list and the BLXNS. The way negative lookahead work is by matching the *position* where a given regular expression does not match. In this case, since it comes after the CLRM register list it is requesting that what comes after the register list does not have a CLRM again followed by BLXNS. This guarantees that the .*blxns after only matches a blxns without another CLRM before. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm.md (nonsecure_call_internal): Do not force memory address in r4 when targeting Armv8.1-M Mainline. (nonsecure_call_value_internal): Likewise. * config/arm/thumb2.md (nonsecure_call_reg_thumb2): Make memory address a register match_operand again. Emit BLXNS when targeting Armv8.1-M Mainline. (nonsecure_call_value_reg_thumb2): Likewise. *** gcc/testsuite/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/cmse-1.c: Add check for BLXNS when instructions introduced in Armv8.1-M Mainline Security Extensions are available and restrict checks for libcall to __gnu_cmse_nonsecure_call to Armv8-M targets only. Adapt CLRM check to verify register used for BLXNS is not in the CLRM register list. * gcc.target/arm/cmse/cmse-14.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise and adapt check for LSB clearing bit to be using the same register as BLXNS when targeting Armv8.1-M Mainline. * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainlin
Re: [PATCH, GCC/ARM, 8/10] Do lazy store & load inline when calling nscall function
Hi Mihail, On 10/23/19 3:24 PM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 8/10] Do lazy store & load inline when calling nscall function Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to generate lazy store and load instruction inline when calling a function with the cmse_nonsecure_call attribute with the soft or softfp floating-point ABI. === Patch description === This patch adds two new patterns for the VLSTM and VLLDM instructions. cmse_nonsecure_call_inline_register_clear is then modified to generate VLSTM and VLLDM respectively before and after calls to functions with the cmse_nonsecure_call attribute in order to have lazy saving, clearing and restoring of VFP registers. Since these instructions do not do writeback of the base register, the stack is adjusted prior the lazy store and after the lazy load with appropriate frame debug notes to describe the effect on the CFA register. As with CLRM, VSCCLRM and VSTR/VLDR, the instruction is modeled as an unspecified operation to the memory pointed to by the base register. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm.c (arm_add_cfa_adjust_cfa_note): Declare early. (cmse_nonsecure_call_inline_register_clear): Define new lazy_fpclear variable as true when floating-point ABI is not hard. Replace check against TARGET_HARD_FLOAT_ABI by checks against lazy_fpclear. Generate VLSTM and VLLDM instruction respectively before and after a function call to cmse_nonsecure_call function. * config/arm/unspecs.md (VUNSPEC_VLSTM): Define unspec. (VUNSPEC_VLLDM): Likewise. * config/arm/vfp.md (lazy_store_multiple_insn): New define_insn. (lazy_load_multiple_insn): Likewise. *** gcc/testsuite/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Add check for VLSTM and VLLDM. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise. Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Best regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index bcc86d50a10f11d9672258442089a0aa5c450b2f..b10f996c023e830ca24ff83fcbab335caf85d4cb 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -186,6 +186,7 @@ static int arm_register_move_cost (machine_mode, reg_class_t, reg_class_t); static int arm_memory_move_cost (machine_mode, reg_class_t, bool); static void emit_constant_insn (rtx cond, rtx pattern); static rtx_insn *emit_set_insn (rtx, rtx); +static void arm_add_cfa_adjust_cfa_note (rtx, int, rtx, rtx); static rtx emit_multi_reg_push (unsigned long, unsigned long); static void arm_emit_multi_reg_pop (unsigned long); static int vfp_emit_fstmd (int, int); @@ -17830,6 +17831,9 @@ cmse_nonsecure_call_inline_register_clear (void) FOR_BB_INSNS (bb, insn) { bool clear_callee_saved = TARGET_HAVE_FPCTX_CMSE; + /* frame = VFP regs + FPSCR + VPR. */ + unsigned lazy_store_stack_frame_size = + (LAST_VFP_REGNUM - FIRST_VFP_REGNUM + 1 + 2) * UNITS_PER_WORD; unsigned long callee_saved_mask = ((1 << (LAST_HI_REGNUM + 1)) - 1) & ~((1 << (LAST_ARG_REGNUM + 1)) - 1); @@ -17847,7 +17851,7 @@ cmse_nonsecure_call_inline_register_clear (void) CUMULATIVE_ARGS args_so_far_v; cumulative_args_t args_so_far; tree arg_type, fntype; - bool first_param = true; + bool first_param = true, lazy_fpclear = !TARGET_HARD_FLOAT_ABI; function_args_iterator args_iter; uint32_t padding_bits_to_clear[4] = {0U, 0U, 0U, 0U}; @@ -17881,7 +17885,7 @@ cmse_nonsecure_call_inline_register_clear (void) -mfloat-abi=hard. For -mfloat-abi=softfp we will be using the lazy store and loads which clear both caller- and callee-saved registers. */ - if (TARGET_HARD_FLOAT_ABI) + if (!lazy_fpclear) { auto_sbitmap float_bitmap (maxregno + 1); @@ -17965,8 +17969,23 @@ cmse_nonsecure_call_inline_register_clear (void) disabled for pop (see below). */ RTX_FRAME_RELATED_P (push
Re: [PATCH, GCC/ARM, 7/10] Clear all VFP regs inline in hardfloat nscall functions
On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 7/10] Clear all VFP regs inline in hardfloat nscall functions Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to generate inline instructions to save, clear and restore callee-saved VFP registers before doing a call to a function with the cmse_nonsecure_call attribute. === Patch description === The patch is fairly straightforward in its approach and consist of the following 3 logical changes: - abstract the number of floating-point register to clear in max_fp_regno - use max_fp_regno to decide how many registers to clear so that the same code works for Armv8-M and Armv8.1-M Mainline - emit vpush and vpop instruction respectively before and after a nonsecure call Note that as in the patch to clear GPRs inline, debug information has to be disabled for VPUSH and VPOP due to VPOP adding CFA adjustment note for SP when R7 is sometimes used as CFA. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm.c (vfp_emit_fstmd): Declare early. (arm_emit_vfp_multi_reg_pop): Likewise. (cmse_nonsecure_call_inline_register_clear): Abstract number of VFP registers to clear in max_fp_regno. Emit VPUSH and VPOP to save and restore callee-saved VFP registers. *** gcc/testsuite/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Add check for VPUSH and VPOP and update expectation for VSCCLRM. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise. Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Ok. Thanks, Kyrill Best regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c24996897eb21c641914326f7064a26bbb363411..bcc86d50a10f11d9672258442089a0aa5c450b2f 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -188,6 +188,8 @@ static void emit_constant_insn (rtx cond, rtx pattern); static rtx_insn *emit_set_insn (rtx, rtx); static rtx emit_multi_reg_push (unsigned long, unsigned long); static void arm_emit_multi_reg_pop (unsigned long); +static int vfp_emit_fstmd (int, int); +static void arm_emit_vfp_multi_reg_pop (int, int, rtx); static int arm_arg_partial_bytes (cumulative_args_t, const function_arg_info &); static rtx arm_function_arg (cumulative_args_t, const function_arg_info &); @@ -17834,8 +17836,10 @@ cmse_nonsecure_call_inline_register_clear (void) unsigned address_regnum, regno; unsigned max_int_regno = clear_callee_saved ? IP_REGNUM : LAST_ARG_REGNUM; + unsigned max_fp_regno = + TARGET_HAVE_FPCTX_CMSE ? LAST_VFP_REGNUM : D7_VFP_REGNUM; unsigned maxregno = - TARGET_HARD_FLOAT_ABI ? D7_VFP_REGNUM : max_int_regno; + TARGET_HARD_FLOAT_ABI ? max_fp_regno : max_int_regno; auto_sbitmap to_clear_bitmap (maxregno + 1); rtx_insn *seq; rtx pat, call, unspec, clearing_reg, ip_reg, shift; @@ -17883,7 +17887,7 @@ cmse_nonsecure_call_inline_register_clear (void) bitmap_clear (float_bitmap); bitmap_set_range (float_bitmap, FIRST_VFP_REGNUM, - D7_VFP_REGNUM - FIRST_VFP_REGNUM + 1); + max_fp_regno - FIRST_VFP_REGNUM + 1); bitmap_ior (to_clear_bitmap, to_clear_bitmap, float_bitmap); } @@ -17960,6 +17964,16 @@ cmse_nonsecure_call_inline_register_clear (void) /* Disable frame debug info in push because it needs to be disabled for pop (see below). */ RTX_FRAME_RELATED_P (push_insn) = 0; + + /* Save VFP callee-saved registers. */ + if (TARGET_HARD_FLOAT_ABI) + { + vfp_emit_fstmd (D7_VFP_REGNUM + 1, + (max_fp_regno - D7_VFP_REGNUM) / 2); + /* Disable frame debug info in push because it needs to be + disabled for vpop (see below). */ + RTX_FRAME_RELATED_P (get_last_insn ()) = 0; + } } /* Clear caller-saved registers that leak before doing a non-secure @@ -17974,9 +17988,25 @@ cmse_nonsecure_call_inline_register_clear (void) if (TARGET_HAVE_FPCTX_CMSE)
Re: [PATCH, GCC/ARM, 6/10] Clear GPRs inline when calling nscall function
Hi Mihail, On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 6/10] Clear GPRs inline when calling nscall function Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to generate inline callee-saved register clearing when calling a function with the cmse_nonsecure_call attribute with the ultimate goal of having the whole call sequence inline. === Patch description === Besides changing the set of registers that needs to be cleared inline, this patch also generates the push and pop to save and restore callee-saved registers without trusting the callee inline. To make the code more future-proof, this (currently) Armv8.1-M specific behavior is expressed in terms of clearing of callee-saved registers rather than directly based on the targets. The patch contains 1 subtlety: Debug information is disabled for push and pop because the REG_CFA_RESTORE notes used to describe popping of registers do not stack. Instead, they just reset the debug state for the register to the one at the beginning of the function, which is incorrect for a register that is pushed twice (in prologue and before nonsecure call) and then popped for the first time. In particular, this occasionally trips CFI note creation code when there are two codepaths to the epilogue, one of which does not go through the nonsecure call. Obviously this mean that debugging between the push and pop is not reliable. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm.c (arm_emit_multi_reg_pop): Declare early. (cmse_nonsecure_call_clear_caller_saved): Rename into ... (cmse_nonsecure_call_inline_register_clear): This. Save and clear callee-saved GPRs as well as clear ip register before doing a nonsecure call then restore callee-saved GPRs after it when targeting Armv8.1-M Mainline. (arm_reorg): Adapt to function rename. *** gcc/testsuite/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/cmse-1.c: Add check for PUSH and POP and update CLRM check. * gcc.target/arm/cmse/cmse-14.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/union-1.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/union-2.c: Likewise. Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? This is ok. I think you should get commit access to GCC by now. Please fill in the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi listing me as the approver (using my details from the MAINTAINERS file). Of course, only commit this once the whole series is approved ;) Thanks, Kyrill Best regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index fca10801c87c5e635d573c0fbdc47a1ae229d0ef..12b4b42a66b0c5589690d9a2d8cf8e42712ca2c0 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -187,6 +187,7 @@ static int arm_memory_move_cost (machine_mode, reg_class_t, bool); static void emit_constant_insn (rtx cond, rtx pattern); static rtx_insn *emit_set_insn (rtx, rtx); static rtx emit_multi_reg_push (unsigned long, unsigned long); +static void arm_emit_multi_reg_pop (unsigned long); static int arm_arg_partial_bytes (cum
Re: [PATCH, GCC/ARM, 5/10] Clear VFP registers with VSCCLRM
Hi Mihail, On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 5/10] Clear VFP registers with VSCCLRM Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to improve code density of functions with the cmse_nonsecure_entry attribute and when calling function with the cmse_nonsecure_call attribute by using VSCCLRM to do all the VFP register clearing as well as clearing the VFP register. === Patch description === This patch adds a new pattern for the VSCCLRM instruction. cmse_clear_registers () is then modified to use the new VSCCLRM instruction when targeting Armv8.1-M Mainline, thus, making the Armv8-M register clearing code specific to Armv8-M. Since the VSCCLRM instruction mandates VPR in the register list, the pattern is encoded with a parallel which only requires an unspecified VUNSPEC_CLRM_VPR constant modelling the APSR clearing. Other expression in the parallel are expected to be set expression for clearing the VFP registers. I see we don't represent the VPR here as a register and use an UNSPEC to represent its clearing. That's okay for now but when we do add support for it for MVE we'll need to adjust the RTL representation here to show its clobbering. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm-protos.h (clear_operation_p): Adapt prototype. * config/arm/arm.c (clear_operation_p): Extend to be able to check a clear_vfp_multiple pattern based on a new vfp parameter. (cmse_clear_registers): Generate VSCCLRM to clear VFP registers when targeting Armv8.1-M Mainline. (cmse_nonsecure_entry_clear_before_return): Clear VFP registers unconditionally when targeting Armv8.1-M Mainline architecture. Check whether VFP registers are available before looking call_used_regs for a VFP register. * config/arm/predicates.md (clear_multiple_operation): Adapt to change of prototype of clear_operation_p. (clear_vfp_multiple_operation): New predicate. * config/arm/unspecs.md (VUNSPEC_VSCCLRM_VPR): New volatile unspec. * config/arm/vfp.md (clear_vfp_multiple): New define_insn. Ok. Thanks, kyrill *** gcc/testsuite/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/bitfield-1.c: Add check for VSCCLRM. * gcc.target/arm/cmse/bitfield-2.c: Likewise. * gcc.target/arm/cmse/bitfield-3.c: Likewise. * gcc.target/arm/cmse/cmse-1.c: Likewise. * gcc.target/arm/cmse/struct-1.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise. Testing: Bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Best regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 1a948d2c97526ad7e67e8d4a610ac74cfdb13882..37a46982bbc1a8f17abe2fc76ba3cb7d65257c0d 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -77,7 +77,7 @@ extern int thumb_legitimate_offset_p (machine_mode, HOST_WIDE_INT); extern int thumb1_legitimate_address_p (machine_mode, rtx, int); extern bool ldm_stm_operation_p (rtx, bool, machine_mode mode, bool, bool); -extern bool clear_operation_p (rtx); +extern bool clear_operation_p (rtx, bool); extern int arm_const_double_rtx (rtx); extern int vfp3_const_double_rtx (rtx); extern int neon_immediate_valid_for_move (rtx, machine_mode, rtx *, int *); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1f730cecff0fb3da7115ea1147dc8b9ab7076b7..5f3ce5c4605f609d1a0e31c0f697871266bdf835 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -13499,8 +13499,9 @@ ldm_stm_operation_p (rtx op, bool load, machine_mode mode, return true; } -/* Checks whether OP is a valid parallel pattern for a CLRM insn. To be a - valid CLRM pattern, OP must have the following form: +/* Checks whether OP is a valid parallel pattern for a CLRM (if VFP is fa
Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM
Hi Mihail, On 10/23/19 10:26 AM, Mihail Ionescu wrote: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM Hi, === Context === This patch is part of a patch series to add support for Armv8.1-M Mainline Security Extensions architecture. Its purpose is to improve code density of functions with the cmse_nonsecure_entry attribute and when calling function with the cmse_nonsecure_call attribute by using CLRM to do all the general purpose registers clearing as well as clearing the APSR register. === Patch description === This patch adds a new pattern for the CLRM instruction and guards the current clearing code in output_return_instruction() and thumb_exit() on Armv8.1-M Mainline instructions not being present. cmse_clear_registers () is then modified to use the new CLRM instruction when targeting Armv8.1-M Mainline while keeping Armv8-M register clearing code for VFP registers. For the CLRM instruction, which does not mandated APSR in the register list, checking whether it is the right volatile unspec or a clearing register is done in clear_operation_p. Note that load/store multiple were deemed sufficiently different in terms of RTX structure compared to the CLRM pattern for a different function to be used to validate the match_parallel. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * config/arm/arm-protos.h (clear_operation_p): Declare. * config/arm/arm.c (clear_operation_p): New function. (cmse_clear_registers): Generate clear_multiple instruction pattern if targeting Armv8.1-M Mainline or successor. (output_return_instruction): Only output APSR register clearing if Armv8.1-M Mainline instructions not available. (thumb_exit): Likewise. * config/arm/predicates.md (clear_multiple_operation): New predicate. * config/arm/thumb2.md (clear_apsr): New define_insn. (clear_multiple): Likewise. * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile unspec. *** gcc/testsuite/ChangeLog *** 2019-10-23 Mihail-Calin Ionescu 2019-10-23 Thomas Preud'homme * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM. * gcc.target/arm/cmse/bitfield-2.c: Likewise. * gcc.target/arm/cmse/bitfield-3.c: Likewise. * gcc.target/arm/cmse/struct-1.c: Likewise. * gcc.target/arm/cmse/cmse-14.c: Likewise. * gcc.target/arm/cmse/cmse-1.c: Likewise. Restrict checks for Armv8-M GPR clearing when CLRM is not available. * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise. * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise. * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise. Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Best regards, Mihail ### Attachment also inlined for ease of reply ### diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index f995974f9bb89ab3c7ff0888c394b0dfaf7da60c..1a948d2c97526ad7e67e8d4a610ac74cfdb13882 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -77,6 +77,7 @@ extern int thumb_legitimate_offset_p (machine_mode, HOST_WIDE_INT); extern int t
Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics
Hi Dennis, On 11/7/19 1:48 PM, Dennis Zhang wrote: Hi Kyrill, I have rebased the patch on top of current truck. For resolve_overloaded, I redefined my memtag overloading function to fit the latest resolve_overloaded_builtin interface. Regression tested again and survived for aarch64-none-linux-gnu. Please reply inline rather than top-posting on gcc-patches. Cheers Dennis Changelog is updated as following: gcc/ChangeLog: 2019-11-07 Dennis Zhang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG, AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP, AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG, AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END. (aarch64_init_memtag_builtins): New. (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro. (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins. (aarch64_expand_builtin_memtag): New. (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag. (AARCH64_BUILTIN_SUBCODE): New macro. (aarch64_resolve_overloaded_memtag): New. (aarch64_resolve_overloaded_builtin_general): New hook. Call aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define __ARM_FEATURE_MEMORY_TAGGING when enabled. (aarch64_resolve_overloaded_builtin): Call aarch64_resolve_overloaded_builtin_general. * config/aarch64/aarch64-protos.h (aarch64_resolve_overloaded_builtin_general): New declaration. * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro. (TARGET_MEMTAG): Likewise. * config/aarch64/aarch64.md (define_c_enum "unspec"): Add UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE. (irg, gmi, subp, addg, ldg, stg): New instructions. * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro. (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise. (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise. * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New. (aarch64_granule16_uimm6, aarch64_granule16_simm9): New. * config/arm/types.md (memtag): New. * doc/invoke.texi (-memtag): Update description. gcc/testsuite/ChangeLog: 2019-11-07 Dennis Zhang * gcc.target/aarch64/acle/memtag_1.c: New test. * gcc.target/aarch64/acle/memtag_2.c: New test. * gcc.target/aarch64/acle/memtag_3.c: New test. On 04/11/2019 16:40, Kyrill Tkachov wrote: Hi Dennis, On 10/17/19 11:03 AM, Dennis Zhang wrote: Hi, Arm Memory Tagging Extension (MTE) is published with Armv8.5-A. It can be used for spatial and temporal memory safety detection and lightweight lock and key system. This patch enables new intrinsics leveraging MTE instructions to implement functionalities of creating tags, setting tags, reading tags, and manipulating tags. The intrinsics are part of Arm ACLE extension: https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics The MTE ISA specification can be found at https://developer.arm.com/docs/ddi0487/latest chapter D6. Bootstraped and regtested for aarch64-none-linux-gnu. Please help to check if it's OK for trunk. This looks mostly ok to me but for further review this needs to be rebased on top of current trunk as there are some conflicts with the SVE ACLE changes that recently went in. Most conflicts looks trivial to resolve but one that needs more attention is the definition of the TARGET_RESOLVE_OVERLOADED_BUILTIN hook. Thanks, Kyrill Many Thanks Dennis gcc/ChangeLog: 2019-10-16 Dennis Zhang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG, AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP, AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG, AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END. (aarch64_init_memtag_builtins): New. (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro. (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins. (aarch64_expand_builtin_memtag): New. (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag. (AARCH64_BUILTIN_SUBCODE): New macro. (aarch64_resolve_overloaded_memtag): New. (aarch64_resolve_overloaded_builtin): New hook. Call aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define __ARM_FEATURE_MEMORY_TAGGING when enabled. * config/aarch64/aarch64-protos.h (aarch64_resolve_overloaded_builtin): Add declaration. * config/aa
Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics
Hi Richard, On 11/9/19 12:44 PM, Richard Henderson wrote: On 11/7/19 11:26 AM, Kyrill Tkachov wrote: -;; The code sequence emitted by this insn pattern uses the Q flag, which GCC -;; doesn't generally know about, so we don't bother expanding to individual -;; instructions. It may be better to just use an out-of-line asm libcall for -;; this. +;; The code sequence emitted by this insn pattern uses the Q flag, so we need +;; to bail out when ARM_Q_BIT_READ and resort to a library sequence instead. + +(define_expand "ssmulsa3" + [(parallel [(set (match_operand:SA 0 "s_register_operand") + (ss_mult:SA (match_operand:SA 1 "s_register_operand") + (match_operand:SA 2 "s_register_operand"))) + (clobber (match_scratch:DI 3)) + (clobber (match_scratch:SI 4)) + (clobber (reg:CC CC_REGNUM))])] + "TARGET_32BIT && arm_arch6" + { +if (ARM_Q_BIT_READ) + FAIL; + } +) Coming back to this, why would you not just represent the update of the Q bit? This is not generated by generic pattern matching, but by the __ssmulsa3 builtin function. It seems easy to me to simply describe how this older builtin operates in conjunction with the new acle builtins. I recognize that ssadd3 etc are more difficult, because they can be generated by arithmetic operations on TYPE_SATURATING. Although again it seems weird to generate expensive out-of-line code for TYPE_SATURATING when used in conjunction with acle builtins. I think it would be better to merely expand the documentation. Even if only so far as to say "unsupported to mix these". I'm tempted to agree, as this part of the patch is quite ugly. Thank you for the comments on these patches, I wasn't aware of some of the mechanisms. I guess I should have posted the series as an RFC first... I'll send patches to fix up the issues. Thanks, Kyrill +(define_expand "maddhisi4" + [(set (match_operand:SI 0 "s_register_operand") + (plus:SI (mult:SI (sign_extend:SI + (match_operand:HI 1 "s_register_operand")) + (sign_extend:SI + (match_operand:HI 2 "s_register_operand"))) +(match_operand:SI 3 "s_register_operand")))] + "TARGET_DSP_MULTIPLY" + { +/* If this function reads the Q bit from ACLE intrinsics break up the + multiplication and accumulation as an overflow during accumulation will + clobber the Q flag. */ +if (ARM_Q_BIT_READ) + { + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_mulhisi3 (tmp, operands[1], operands[2])); + emit_insn (gen_addsi3 (operands[0], tmp, operands[3])); + DONE; + } + } +) + +(define_insn "*arm_maddhisi4" [(set (match_operand:SI 0 "s_register_operand" "=r") (plus:SI (mult:SI (sign_extend:SI (match_operand:HI 1 "s_register_operand" "r")) (sign_extend:SI (match_operand:HI 2 "s_register_operand" "r"))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlabb%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] I think this case would be better represented with a single define_insn_and_split and a peephole2. It is easy to notice during peep2 whether or not the Q bit is actually live at the exact place we want to expand this operation. If it is live, then use two insns; if it isn't, use one. r~