[Patch Arm] Add neon_fcmla and neon_fcadd as neon_type instructions.

2022-11-20 Thread Ramana Radhakrishnan via Gcc-patches
[AArch64 folks CC'd fyi as this is common between both backends.]

Hi,

The design in the backend used to be that advanced simd types are
generally added to is_neon_type in the backend. It appears that
neon_fcmla and neon_fcadd aren't added in as  neon_type instructions.

Applying this to the tree later this week after having built armhf and
a bootstrap and test run on aarch64-linux-gnu.

Thanks,
Ramana
commit 7dd15fae0ac1455f5818a1fc0078e35d85e1e250
Author: Ramana Radhakrishnan 
Date:   Wed Nov 16 10:32:04 2022 +

[Patch Arm] Add neon_fcadd and neon_fcmla to is_neon_type.

Appears to have been an oversight.

gcc/
* config/arm/types.md: Update comment.
(is_neon_type): Add neon_fcmla, neon_fcadd.

Signed-off-by: Ramana Radhakrishnan 

diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index 7d0504bdd94..d0d9997efd2 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -248,7 +248,8 @@ (define_attr "autodetect_type"
 ; wmmx_wunpckil
 ; wmmx_wxor
 ;
-; The classification below is for NEON instructions.
+; The classification below is for NEON instructions. If a new neon type is
+; added, please ensure this is added to the is_neon_type attribute below too.
 ;
 ; neon_add
 ; neon_add_q
@@ -1281,6 +1282,7 @@ (define_attr "is_neon_type" "yes,no"
   neon_fp_mla_d_q, neon_fp_mla_d_scalar_q, neon_fp_sqrt_s,\
   neon_fp_sqrt_s_q, neon_fp_sqrt_d, neon_fp_sqrt_d_q,\
   neon_fp_div_s, neon_fp_div_s_q, neon_fp_div_d, neon_fp_div_d_q, 
crypto_aese,\
+  neon_fcadd, neon_fcmla, \
   crypto_aesmc, crypto_sha1_xor, crypto_sha1_fast, crypto_sha1_slow,\
   crypto_sha256_fast, crypto_sha256_slow")
 (const_string "yes")


Re: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]

2022-11-20 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Nov 18, 2022 at 4:59 PM Kyrylo Tkachov via Gcc-patches
 wrote:
>
>
>
> > -Original Message-
> > From: Andrea Corallo 
> > Sent: Thursday, November 17, 2022 4:38 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; Richard Earnshaw
> > ; Stam Markianos-Wright  > wri...@arm.com>
> > Subject: [PATCH 15/35] arm: Explicitly specify other float types for 
> > _Generic
> > overloading [PR107515]
> >
> > From: Stam Markianos-Wright 
> >
> > This patch adds explicit references to other float types
> > to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515
> >
> > gcc/ChangeLog:
> > PR 107515
> > * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.
>
> Argh, I'm looking forward to when we move away from this _Generic business, 
> but for now ok.
> The ChangeLog should say "PR target/107515" for the git hook to recognize it 
> IIRC.

and the PR is against 11.x - is there a plan to back port this and
dependent patches to relevant branches ?

Ramana

> Thanks,
> Kyrill
>
> > ---
> >  gcc/config/arm/arm_mve.h | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> > index fd1876b57a0..f6b42dc3fab 100644
> > --- a/gcc/config/arm/arm_mve.h
> > +++ b/gcc/config/arm/arm_mve.h
> > @@ -35582,6 +35582,9 @@ enum {
> >   short: __ARM_mve_type_int_n, \
> >   int: __ARM_mve_type_int_n, \
> >   long: __ARM_mve_type_int_n, \
> > + _Float16: __ARM_mve_type_fp_n, \
> > + __fp16: __ARM_mve_type_fp_n, \
> > + float: __ARM_mve_type_fp_n, \
> >   double: __ARM_mve_type_fp_n, \
> >   long long: __ARM_mve_type_int_n, \
> >   unsigned char: __ARM_mve_type_int_n, \
> > --
> > 2.25.1
>


Re: [PATCH][GCC] arm: Add support for new frame unwinding instruction "0xb5".

2022-11-20 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Nov 18, 2022 at 9:33 AM Srinath Parvathaneni
 wrote:
>
> Hi,
>
> > -Original Message-
> > From: Ramana Radhakrishnan 
> > Sent: Thursday, November 17, 2022 8:27 PM
> > To: Srinath Parvathaneni 
> > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> > ; Kyrylo Tkachov 
> > Subject: Re: [PATCH][GCC] arm: Add support for new frame unwinding
> > instruction "0xb5".
> >
> > On Thu, Nov 10, 2022 at 10:38 AM Srinath Parvathaneni via Gcc-patches  > patc...@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > >
> > > This patch adds support for Arm frame unwinding instruction "0xb5"
> > > [1]. When an exception is taken and "0xb5" instruction is encounter
> > > during runtime stack-unwinding, we use effective vsp as modifier in 
> > > pointer
> > authentication.
> > > On completion of stack unwinding if "0xb5" instruction is not
> > > encountered then CFA will be used as modifier in pointer authentication.
> > >
> > > [1]
> > > https://github.com/ARM-software/abi-
> > aa/releases/download/2022Q3/ehabi3
> > > 2.pdf
> > >
> > > Regression tested on arm-none-eabi target and found no regressions.
> > >
> > > Ok for master?
> > >
> >
> > No, not yet.
> >
> > Presumably the logic to produce 0xb5 is in the source base and this was
> > tested with suitable options that produce said opcode ? I see no logic in 
> > place
> > to produce the said opcode in the backend in a quick read as the pacbti
> > patches still seem to be in review. ?
> >
> > So what was the test suite run actually testing ?
>
> Sorry for the late response, the patch supporting the said opcode (directive 
> ".pacspval)" is here:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605524.html (still 
> under upstream review)
>
> and the patch to encode ".pacspval" with the mentioned opcode "0xb5" in 
> binutils is here:
> https://sourceware.org/pipermail/binutils/2022-November/124328.html (approved 
> and committed to binutils).

Thanks for the answer but perhaps I should make my question more
explicit - are you saying that this patch was tested in combination
with those and other dependent patches on a suitable simulator with
suitable multilibs and C++ to test for this presumably for frame
unwinding ?

For the future , it would certainly be worth being explicit about this
in your patch submission :)

regards
Ramana

>
> Regards,
> Srinath.
>
> > regards
> > Ramana
> >
> >
> > > Regards,
> > > Srinath.
> > >
> > > gcc/ChangeLog:
> > >
> > > 2022-11-09  Srinath Parvathaneni  
> > >
> > > * libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
> > opcode
> > > "0xb5".
> > >
> > >
> > > ### Attachment also inlined for ease of reply
> > ###
> > >
> > >
> > > diff --git a/libgcc/config/arm/pr-support.c
> > > b/libgcc/config/arm/pr-support.c index
> > >
> > e48854587c667a959aa66ccc4982231f6ecc..73e4942a39b34a83c2da85de
> > f6b1
> > > 3e82ec501552 100644
> > > --- a/libgcc/config/arm/pr-support.c
> > > +++ b/libgcc/config/arm/pr-support.c
> > > @@ -107,7 +107,9 @@ __gnu_unwind_execute (_Unwind_Context *
> > context, __gnu_unwind_state * uws)
> > >_uw op;
> > >int set_pc;
> > >int set_pac = 0;
> > > +  int set_pac_sp = 0;
> > >_uw reg;
> > > +  _uw sp;
> > >
> > >set_pc = 0;
> > >for (;;)
> > > @@ -124,10 +126,11 @@ __gnu_unwind_execute (_Unwind_Context *
> > context,
> > > __gnu_unwind_state * uws)  #if defined(TARGET_HAVE_PACBTI)
> > >   if (set_pac)
> > > {
> > > - _uw sp;
> > >   _uw lr;
> > >   _uw pac;
> > > - _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP,
> > _UVRSD_UINT32, );
> > > + if (!set_pac_sp)
> > > +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP,
> > _UVRSD_UINT32,
> > > +);
> > >   _Unwind_VRS_Get (context, _UVRSC_CORE, R_LR, _UVRSD_UINT32,
> > );
> > >   _Unwind_VRS_Get (context, _UVRSC_PAC, R_IP,
> > >_UVRSD_UINT32, ); @@ -259,7 +262,19
> > > @@ __gnu_unwind_execute (_Unwind_Context * context,
> > __gnu_unwind_state * uws)
> > >   continue;
> > > }
> > >
> > > - if ((op & 0xfc) == 0xb4)  /* Obsolete FPA.  */
> > > + /* Use current VSP as modifier in PAC validation.  */
> > > + if (op == 0xb5)
> > > +   {
> > > + if (set_pac)
> > > +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP,
> > _UVRSD_UINT32,
> > > +);
> > > + else
> > > +   return _URC_FAILURE;
> > > + set_pac_sp = 1;
> > > + continue;
> > > +   }
> > > +
> > > + if ((op & 0xfd) == 0xb6)  /* Obsolete FPA.  */
> > > return _URC_FAILURE;
> > >
> > >   /* op & 0xf8 == 0xb8.  */
> > >
> > >
> > >


Re: [PATCH] ARM: Make ARMv8-M attribute cmse_nonsecure_call work in Ada

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Oct 24, 2022 at 9:55 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> until most other machine attributes, this one does not work in Ada because,
> while it applies to pointer-to-function types, it is explicitly marked as
> requiring declarations in the implementation.
>
> Now, in Ada, machine attributes are specified like this:
>
>   type Non_Secure is access procedure;
>   pragma Machine_Attribute (Non_Secure, "cmse_nonsecure_call");
>
> i.e. not attached to the declaration of Non_Secure (testcase attached).
>
> So the attached patch extends the support to Ada by also accepting
> pointer-to-function types in the handler.
>
> Tested on arm-eabi, OK for the mainline?
>


Ok if no regressions, perhaps the test needs to be in the ada test suite ?

regards

Ramana


>
> 2022-10-24  Eric Botcazou  
>
> * config/arm/arm.cc (arm_attribute_table) : 
> Change
> decl_required field to false.
> (arm_handle_cmse_nonsecure_call): Deal with a TYPE node.
>
>
> --
> Eric Botcazou


Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 17, 2022 at 5:30 PM Richard Sandiford via Gcc-patches
 wrote:
>
> Wilco Dijkstra  writes:
> > Hi Richard,
> >
> >> Can you go into more detail about:
> >>
> >>Use :option:`-mdirect-extern-access` either in shared libraries or in
> >>executables, but not in both.  Protected symbols used both in a shared
> >>library and executable may cause linker errors or fail to work correctly
> >>
> >> If this is LLVM's default for PIC (and by assumption shared libraries),
> >> is it then invalid to use -mdirect-extern-access for any PIEs that
> >> are linked against those shared libraries and use protected symbols
> >> from those libraries?  How would a user know that one of the shared
> >> libraries they're linking against was built in this way?
> >
> > Yes, the usage model is that you'd either use it for static PIE or only on
> > data that is not shared. If you get it wrong them you'll get the copy
> > relocation error.
>
> Thanks.  I think I'm still missing something though.  If, for the
> non-executable case, people should only use the feature on data that
> is not shared, why do we need to relax the binds-local condition for
> protected symbols on -fPIC?  Oughtn't the symbol to be hidden rather
> than protected if the data isn't shared?
>
> I can understand the reasoning for the PIE changes but I'm still
> struggling with the PIC-but-not-PIE bits.

I think I'm with Richard S on hidden vs protected on first reading. I
can see why this works out of the box and can even be default for
static-pie.

Any reason why this is not on by default - it's early enough in the
stage3 cycle and we can always flip the defaults if there are more
problems found.

You probably need a rebase for the documentation bits,.

regards
Ramana


Ramana


Re: [PATCH][GCC] arm: Add support for new frame unwinding instruction "0xb5".

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 10, 2022 at 10:38 AM Srinath Parvathaneni via Gcc-patches
 wrote:
>
> Hi,
>
> This patch adds support for Arm frame unwinding instruction "0xb5" [1]. When
> an exception is taken and "0xb5" instruction is encounter during runtime
> stack-unwinding, we use effective vsp as modifier in pointer authentication.
> On completion of stack unwinding if "0xb5" instruction is not encountered
> then CFA will be used as modifier in pointer authentication.
>
> [1] 
> https://github.com/ARM-software/abi-aa/releases/download/2022Q3/ehabi32.pdf
>
> Regression tested on arm-none-eabi target and found no regressions.
>
> Ok for master?
>

No, not yet.

Presumably the logic to produce 0xb5 is in the source base and this
was tested with suitable options that produce said opcode ? I see no
logic in place to produce the said opcode in the backend in a quick
read as the pacbti patches still seem to be in review. ?

So what was the test suite run actually testing ?

regards
Ramana


> Regards,
> Srinath.
>
> gcc/ChangeLog:
>
> 2022-11-09  Srinath Parvathaneni  
>
> * libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode opcode
> "0xb5".
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/libgcc/config/arm/pr-support.c b/libgcc/config/arm/pr-support.c
> index 
> e48854587c667a959aa66ccc4982231f6ecc..73e4942a39b34a83c2da85def6b13e82ec501552
>  100644
> --- a/libgcc/config/arm/pr-support.c
> +++ b/libgcc/config/arm/pr-support.c
> @@ -107,7 +107,9 @@ __gnu_unwind_execute (_Unwind_Context * context, 
> __gnu_unwind_state * uws)
>_uw op;
>int set_pc;
>int set_pac = 0;
> +  int set_pac_sp = 0;
>_uw reg;
> +  _uw sp;
>
>set_pc = 0;
>for (;;)
> @@ -124,10 +126,11 @@ __gnu_unwind_execute (_Unwind_Context * context, 
> __gnu_unwind_state * uws)
>  #if defined(TARGET_HAVE_PACBTI)
>   if (set_pac)
> {
> - _uw sp;
>   _uw lr;
>   _uw pac;
> - _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32, 
> );
> + if (!set_pac_sp)
> +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32,
> +);
>   _Unwind_VRS_Get (context, _UVRSC_CORE, R_LR, _UVRSD_UINT32, 
> );
>   _Unwind_VRS_Get (context, _UVRSC_PAC, R_IP,
>_UVRSD_UINT32, );
> @@ -259,7 +262,19 @@ __gnu_unwind_execute (_Unwind_Context * context, 
> __gnu_unwind_state * uws)
>   continue;
> }
>
> - if ((op & 0xfc) == 0xb4)  /* Obsolete FPA.  */
> + /* Use current VSP as modifier in PAC validation.  */
> + if (op == 0xb5)
> +   {
> + if (set_pac)
> +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32,
> +);
> + else
> +   return _URC_FAILURE;
> + set_pac_sp = 1;
> + continue;
> +   }
> +
> + if ((op & 0xfd) == 0xb6)  /* Obsolete FPA.  */
> return _URC_FAILURE;
>
>   /* op & 0xf8 == 0xb8.  */
>
>
>


Re: [Patch Arm] Fix PR 92999

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Nov 11, 2022 at 9:50 PM Ramana Radhakrishnan
 wrote:
>
> On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
>  wrote:
> >
> > On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
> >  wrote:
> > >
> > >
> > >
> > > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > > >
> > > >
> > > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > > >> PR92999 is a case where the VFP calling convention does not allocate
> > > >> enough FP registers for a homogenous aggregate containing FP16 values.
> > > >> I believe this is the complete fix but would appreciate another set of
> > > >> eyes on this.
> > > >>
> > > >> Could I get a hand with a regression test run on an armhf environment
> > > >> while I fix my environment ?
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >> PR target/92999
> > > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> > > >> aggregates with elements smaller than SFmode.
> > > >>
> > > >> gcc/testsuite/ChangeLog:
> > > >>
> > > >> * gcc.target/arm/pr92999.c: New test.
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Ramana
> > > >>
> > > >> Signed-off-by: Ramana Radhakrishnan 
> > > >
> > > > I'm not sure about this.  The AAPCS does not mention a base type of a
> > > > half-precision FP type as an appropriate homogeneous aggregate for using
> > > > VFP registers for either calling or returning.
> >
> > Ooh interesting, thanks for taking a look and poking at the AAPCS and
> > that's a good catch. BF16 should also have the same behaviour as FP16
> > , I suspect ?
>
> I suspect I got caught out by the definition of the Homogenous
> aggregate from Section 5.3.5
> ((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
> which simply suggests it's an aggregate of fundamental types which
> lists half precision floating point .
>
> FTR, ideally I should have read 7.1.2.1
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
> :)
>
>
>
> >
> > > >
> > > > So perhaps the bug is that we try to treat this as a homogeneous
> > > > aggregate at all.
> >
> > Yep I agree - I'll take a look again tomorrow and see if I can get a fix.
> >
> > (And thanks Alex for the test run, I might trouble you again while I
> > still (slowly) get some of my boards back up)
>
>
> and as promised take 2. I'd really prefer another review on this one
> to see if I've not missed anything in the cases below.

Ping  ?

Ramana

>
> regards
> Ramana
>
>
> >
> > regards,
> > Ramana
> >
> >
> > >
> > > R.


Re: [PATCH 3/9][GCC][AArch64] Add autovectorization support for Complex instructions

2022-11-16 Thread Ramana Radhakrishnan via Gcc-patches


>
> OK with the comment typos fixed.
>
> > > > gcc/ChangeLog:
> > > >
> > > > 2018-11-11  Tamar Christina  
> > > >
> > > > * config/aarch64/aarch64-simd.md (aarch64_fcadd,
> > > > fcadd3, aarch64_fcmla,
> > > > fcmla4): New.
> > > > * config/aarch64/aarch64.h (TARGET_COMPLEX): New.
> > > > * config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
> > > > UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, 
> > > > UNSPEC_FCMLA270): New.
> > > > (FCADD, FCMLA): New.
> > > > (rot, rotsplit1, rotsplit2): New.
> > > > * config/arm/types.md (neon_fcadd, neon_fcmla): New.
>
> Should we push these to an existing class for now, and split them later when
> someone provides a scheduling model which makes use of them?

Sorry about a blast from the past :P . Perhaps these should just have
been added to is_neon_type that allows for the belts and braces
handling in cortex-a57.md which is the basis for many a scheduler
description in the backend :)

Ramana


Re: [Patch Arm] Fix PR 92999

2022-11-11 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
 wrote:
>
> On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
>  wrote:
> >
> >
> >
> > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > >
> > >
> > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > >> PR92999 is a case where the VFP calling convention does not allocate
> > >> enough FP registers for a homogenous aggregate containing FP16 values.
> > >> I believe this is the complete fix but would appreciate another set of
> > >> eyes on this.
> > >>
> > >> Could I get a hand with a regression test run on an armhf environment
> > >> while I fix my environment ?
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >> PR target/92999
> > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> > >> aggregates with elements smaller than SFmode.
> > >>
> > >> gcc/testsuite/ChangeLog:
> > >>
> > >> * gcc.target/arm/pr92999.c: New test.
> > >>
> > >>
> > >> Thanks,
> > >> Ramana
> > >>
> > >> Signed-off-by: Ramana Radhakrishnan 
> > >
> > > I'm not sure about this.  The AAPCS does not mention a base type of a
> > > half-precision FP type as an appropriate homogeneous aggregate for using
> > > VFP registers for either calling or returning.
>
> Ooh interesting, thanks for taking a look and poking at the AAPCS and
> that's a good catch. BF16 should also have the same behaviour as FP16
> , I suspect ?

I suspect I got caught out by the definition of the Homogenous
aggregate from Section 5.3.5
((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
which simply suggests it's an aggregate of fundamental types which
lists half precision floating point .

FTR, ideally I should have read 7.1.2.1
https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
:)



>
> > >
> > > So perhaps the bug is that we try to treat this as a homogeneous
> > > aggregate at all.
>
> Yep I agree - I'll take a look again tomorrow and see if I can get a fix.
>
> (And thanks Alex for the test run, I might trouble you again while I
> still (slowly) get some of my boards back up)


and as promised take 2. I'd really prefer another review on this one
to see if I've not missed anything in the cases below.

regards
Ramana


>
> regards,
> Ramana
>
>
> >
> > R.
commit c2ed018d10328c5cf93aa56b00ba4caf5dace539
Author: Ramana Radhakrishnan 
Date:   Fri Nov 11 21:39:22 2022 +

[Patch Arm] Fix PR92999

PR target/92999 is a case where the VFP PCS implementation is
incorrectly considering homogenous floating point aggregates with FP16
and BF16 values.

Can someone help me with a bootstrap and regression test on an armhf
environment ?

Signed-off-by: Ramana Radhakrishnan 
Tested-by: Alex Coplan  
Reviewed-by: Richard Earnshaw  

PR target/92999

gcc/ChangeLog:

* config/arm/arm.cc (aapcs_vfp_is_invalid_scalar_in_ha): New
(aapcs_vfp_sub_candidate): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr92999.c: New test.

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 2eb4d51e4a3..cd3e1ffe777 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -6281,6 +6281,31 @@ const unsigned int WARN_PSABI_EMPTY_CXX17_BASE = 1U << 0;
 const unsigned int WARN_PSABI_NO_UNIQUE_ADDRESS = 1U << 1;
 const unsigned int WARN_PSABI_ZERO_WIDTH_BITFIELD = 1U << 2;
 
+
+/* The AAPCS VFP ABI allows homogenous aggregates with scalar
+   FP32 and FP64 members.
+   Return
+ true if this is a scalar that is not a proper candidate
+ false if this is a scalar that is an acceptable scalar data
+ type in a homogenous aggregate or if this is not a scalar allowing
+ the tree walk in aapcs_vfp_sub_candidate to continue.
+ */
+static bool
+aapcs_vfp_is_invalid_scalar_in_ha (const_tree inner_type)
+{
+
+  machine_mode mode = TYPE_MODE (inner_type);
+  if (TREE_CODE (inner_type) == REAL_TYPE)
+{
+  if (mode == DFmode && mode == SFmode)
+   return false;
+  else
+   return true;
+}
+  else
+return false;
+}
+
 /* Walk down the type tree of TYPE counting consecutive base elements.
If *MODEP is VOIDmode, then set it to the first valid floating point
type.  If a non-floating point type is found, or if a floating point
@@ -6372,6 +6397,10 @@ aapcs_vfp_sub_candidate (const_tree ty

Re: [PATCH][GCC] arm: Add support for Cortex-X1C CPU.

2022-11-10 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 10, 2022 at 10:24 AM Srinath Parvathaneni via Gcc-patches
 wrote:
>
> Hi,
>
> This patch adds the -mcpu support for the Arm Cortex-X1C CPU.
>
> Regression tested on arm-none-eabi and bootstrapped on 
> arm-none-linux-gnueabihf.
>
> Ok for GCC master?


Ok
Ramana
>
> Regards,
> Srinath.
>
> gcc/ChangeLog:
>
> 2022-11-09  Srinath Parvathaneni  
>
>* config/arm/arm-cpus.in (cortex-x1c): Define new CPU.
>* config/arm/arm-tables.opt: Regenerate.
>* config/arm/arm-tune.md: Likewise.
>* 
> doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst:
>Document Cortex-X1C CPU.
>
>gcc/testsuite/ChangeLog:
>
> 2022-11-09  Srinath Parvathaneni  
>
>* gcc.target/arm/multilib.exp: Add tests for Cortex-X1C.
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index 
> 5a63bc548e54dbfdce5d1df425bd615d81895d80..5ed4db340bc5d7c9a41e6d1a3f660bf2a97b058b
>  100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -1542,6 +1542,17 @@ begin cpu cortex-x1
>   part d44
>  end cpu cortex-x1
>
> +begin cpu cortex-x1c
> + cname cortexx1c
> + tune for cortex-a57
> + tune flags LDSCHED
> + architecture armv8.2-a+fp16+dotprod
> + option crypto add FP_ARMv8 CRYPTO
> + costs cortex_a57
> + vendor 41
> + part d4c
> +end cpu cortex-x1c
> +
>  begin cpu neoverse-n1
>   cname neoversen1
>   alias !ares
> diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
> index 
> e6461abcc57cd485025f3e18535267c454662cbe..a10a09e36cd004165b6f1efddeb3bfc29d8337ac
>  100644
> --- a/gcc/config/arm/arm-tables.opt
> +++ b/gcc/config/arm/arm-tables.opt
> @@ -255,6 +255,9 @@ Enum(processor_type) String(cortex-a710) Value( 
> TARGET_CPU_cortexa710)
>  EnumValue
>  Enum(processor_type) String(cortex-x1) Value( TARGET_CPU_cortexx1)
>
> +EnumValue
> +Enum(processor_type) String(cortex-x1c) Value( TARGET_CPU_cortexx1c)
> +
>  EnumValue
>  Enum(processor_type) String(neoverse-n1) Value( TARGET_CPU_neoversen1)
>
> diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
> index 
> abc290edd094179379f3856a3f8f64781e0c33f2..8af8c936abe31fb60e3de2fd713f4c6946c2a752
>  100644
> --- a/gcc/config/arm/arm-tune.md
> +++ b/gcc/config/arm/arm-tune.md
> @@ -46,7 +46,7 @@
> cortexa73cortexa53,cortexa55,cortexa75,
> cortexa76,cortexa76ae,cortexa77,
> cortexa78,cortexa78ae,cortexa78c,
> -   cortexa710,cortexx1,neoversen1,
> +   cortexa710,cortexx1,cortexx1c,neoversen1,
> cortexa75cortexa55,cortexa76cortexa55,neoversev1,
> neoversen2,cortexm23,cortexm33,
> cortexm35p,cortexm55,starmc1,
> diff --git 
> a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst 
> b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst
> index 
> 3315114969381995d47162b53abeb9bfc442fd28..d531eced20cbb583ecaba2ab3927937faf69b9de
>  100644
> --- 
> a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst
> +++ 
> b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst
> @@ -594,7 +594,7 @@ These :samp:`-m` options are defined for the ARM port:
>:samp:`cortex-r7`, :samp:`cortex-r8`, :samp:`cortex-r52`, 
> :samp:`cortex-r52plus`,
>:samp:`cortex-m0`, :samp:`cortex-m0plus`, :samp:`cortex-m1`, 
> :samp:`cortex-m3`,
>:samp:`cortex-m4`, :samp:`cortex-m7`, :samp:`cortex-m23`, 
> :samp:`cortex-m33`,
> -  :samp:`cortex-m35p`, :samp:`cortex-m55`, :samp:`cortex-x1`,
> +  :samp:`cortex-m35p`, :samp:`cortex-m55`, :samp:`cortex-x1`, 
> :samp:`cortex-x1c`,
>:samp:`cortex-m1.small-multiply`, :samp:`cortex-m0.small-multiply`,
>:samp:`cortex-m0plus.small-multiply`, :samp:`exynos-m1`, 
> :samp:`marvell-pj4`,
>:samp:`neoverse-n1`, :samp:`neoverse-n2`, :samp:`neoverse-v1`, 
> :samp:`xscale`,
> diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
> b/gcc/testsuite/gcc.target/arm/multilib.exp
> index 
> 2fa648c61dafebb663969198bf7849400a7547f6..f903f028a83f884bdc1521f810f7e70e4130a715
>  100644
> --- a/gcc/testsuite/gcc.target/arm/multilib.exp
> +++ b/gcc/testsuite/gcc.target/arm/multilib.exp
> @@ -450,6 +450,9 @@ if {[multilib_config "aprofile"] } {
> {-march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -mthumb} 
> "thumb/v8-a+simd/hard"
> {-march=armv7-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp 
> -mthumb} "thumb/v7-a+simd/softfp"
> {-march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp 
> -mthumb} "thumb/v8-a+simd/softfp"
> +   {-mcpu=cortex-x1c -mfpu=auto -mfloat-abi=softfp -mthumb} 
> "thumb/v8-a+simd/softfp"
> +   {-mcpu=cortex-x1c -mfpu=auto -mfloat-abi=hard -mthumb} 
> "thumb/v8-a+simd/hard"
> +   {-mcpu=cortex-x1c -mfpu=auto -mfloat-abi=soft -mthumb} 
> "thumb/v8-a/nofp"
>  } {
> check_multi_dir $opts $dir
>  }
>
>
>


Re: [Patch Arm] Fix PR 92999

2022-11-10 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
 wrote:
>
>
>
> On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> >
> >
> > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> >> PR92999 is a case where the VFP calling convention does not allocate
> >> enough FP registers for a homogenous aggregate containing FP16 values.
> >> I believe this is the complete fix but would appreciate another set of
> >> eyes on this.
> >>
> >> Could I get a hand with a regression test run on an armhf environment
> >> while I fix my environment ?
> >>
> >> gcc/ChangeLog:
> >>
> >> PR target/92999
> >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> >> aggregates with elements smaller than SFmode.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/arm/pr92999.c: New test.
> >>
> >>
> >> Thanks,
> >> Ramana
> >>
> >> Signed-off-by: Ramana Radhakrishnan 
> >
> > I'm not sure about this.  The AAPCS does not mention a base type of a
> > half-precision FP type as an appropriate homogeneous aggregate for using
> > VFP registers for either calling or returning.

Ooh interesting, thanks for taking a look and poking at the AAPCS and
that's a good catch. BF16 should also have the same behaviour as FP16
, I suspect ?

> >
> > So perhaps the bug is that we try to treat this as a homogeneous
> > aggregate at all.

Yep I agree - I'll take a look again tomorrow and see if I can get a fix.

(And thanks Alex for the test run, I might trouble you again while I
still (slowly) get some of my boards back up)

regards,
Ramana


>
> R.


[Patch Arm] Fix PR 92999

2022-11-08 Thread Ramana Radhakrishnan via Gcc-patches
PR92999 is a case where the VFP calling convention does not allocate
enough FP registers for a homogenous aggregate containing FP16 values.
I believe this is the complete fix but would appreciate another set of
eyes on this.

Could I get a hand with a regression test run on an armhf environment
while I fix my environment ?

gcc/ChangeLog:

PR target/92999
*  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
aggregates with elements smaller than SFmode.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr92999.c: New test.


Thanks,
Ramana

Signed-off-by: Ramana Radhakrishnan 
---
 gcc/config/arm/arm.cc  |  6 -
 gcc/testsuite/gcc.target/arm/pr92999.c | 31 ++
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr92999.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 2eb4d51e4a3..03f4057f717 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -6740,7 +6740,11 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant 
ATTRIBUTE_UNUSED,
  count *= 2;
}
}
-  shift = GET_MODE_SIZE(ag_mode) / GET_MODE_SIZE(SFmode);
+
+  /* Aggregates can contain FP16 or BF16 values which would need to
+be passed in via FP registers.  */
+  shift = (MAX(GET_MODE_SIZE(ag_mode), GET_MODE_SIZE(SFmode))
+  / GET_MODE_SIZE(SFmode));
   par = gen_rtx_PARALLEL (mode, rtvec_alloc (count));
   for (i = 0; i < count; i++)
{
diff --git a/gcc/testsuite/gcc.target/arm/pr92999.c 
b/gcc/testsuite/gcc.target/arm/pr92999.c
new file mode 100644
index 000..faa21fdb7d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr92999.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-mfp16-format=ieee" } */
+
+//
+// Compile with gcc -mfp16-format=ieee
+// Any optimization level is fine.
+//
+// Correct output should be
+// "y.first = 1, y.second = -99"
+//
+// Buggy output is
+// "y.first = -99, y.second = -99"
+//
+#include 
+struct phalf {
+__fp16 first;
+__fp16 second;
+};
+
+struct phalf phalf_copy(struct phalf* src) __attribute__((noinline));
+struct phalf phalf_copy(struct phalf* src) {
+return *src;
+}
+
+int main() {
+struct phalf x = { 1.0, -99.0};
+struct phalf y = phalf_copy();
+if (y.first != 1.0 && y.second != -99.0)
+   abort();
+return 0;
+}
-- 
2.34.1


Update Affiliation.

2022-11-04 Thread Ramana Radhakrishnan via Gcc-patches
Update affiliation as I'm moving on from Arm. More from me in a month or so.

Pushed to trunk

regards
Ramana
commit 4e5f373406ed5da42c4a7c4b7f650d92195f2984
Author: Ramana Radhakrishnan 
Date:   Fri Nov 4 09:30:00 2022 +

Update affiliation

diff --git a/htdocs/steering.html b/htdocs/steering.html
index 3a38346d..95d6a4a8 100644
--- a/htdocs/steering.html
+++ b/htdocs/steering.html
@@ -38,7 +38,7 @@ place to reach them is the gcc mailing 
list.
 Toon Moene (Koninklijk Nederlands Meteorologisch Instituut)
 Joseph Myers (CodeSourcery / Mentor Graphics) [co-Release Manager]
 Gerald Pfeifer (SUSE)
-Ramana Radhakrishnan (ARM)
+Ramana Radhakrishnan 
 Joel Sherrill (OAR Corporation)
 Ian Lance Taylor (Google)
 Jim Wilson (SiFive)


Update email address

2022-10-31 Thread Ramana Radhakrishnan via Gcc-patches
As $subject. Pushed to trunk.

Regards,
Ramana


diff --git a/MAINTAINERS b/MAINTAINERS
index e4e7349a6d9..55c5ef95806 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -60,7 +60,7 @@ arc port  Joern Rennecke  

 arc port   Claudiu Zissulescu  
 arm port   Nick Clifton
 arm port   Richard Earnshaw
-arm port   Ramana Radhakrishnan
+arm port   Ramana Radhakrishnan
 arm port   Kyrylo Tkachov  
 avr port   Denis Chertykov 
 bfin port  Jie Zhang   


Re: [PATCH V2] arm: add -static-pie support

2022-08-10 Thread Ramana Radhakrishnan via Gcc-patches
Hi Lance,

Thanks for your contribution - looks like your first one to GCC ?

The patch looks good to me, though it should probably go through a
full test suite run on arm-linux-gnueabihf and get a ChangeLog - see
here for more https://gcc.gnu.org/contribute.html#patches.

This is probably small enough to go under the 10 line rule but since
you've used Signed-off-by in your patch, is that indicating you are
contributing under DCO rules -
https://gcc.gnu.org/contribute.html#legal ?

regards
Ramana


On Thu, Aug 4, 2022 at 5:48 PM Lance Fredrickson via Gcc-patches
 wrote:
>
> Just a follow up trying to get some eyes on my static-pie patch
> submission for arm.
> Feedback welcome.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598610.html
>
> thanks,
> Lance Fredrickson


Re: [PATCH v4 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-11-17 Thread Ramana Radhakrishnan via Gcc-patches
Thanks Ard and Qing.

I have been busy with other things in the last few weeks and I don’t work on 
GCC as part of my day job : however I’ll try to find some time to review this 
patch set in the coming days.

Sorry about the delay.

Regards,
Ramana

From: Ard Biesheuvel 
Date: Tuesday, 9 November 2021 at 22:03
To: Qing Zhao 
Cc: Ramana Radhakrishnan , 
linux-harden...@vger.kernel.org , kees Cook 
, Keith Packard , 
thomas.preudho...@celest.fr , 
adhemerval.zane...@linaro.org , Richard 
Sandiford , gcc-patches@gcc.gnu.org 

Subject: Re: [PATCH v4 1/1] [ARM] Add support for TLS register based stack 
protector canary access
On Tue, 9 Nov 2021 at 21:45, Qing Zhao  wrote:
>
> Hi, Ard,
>
> Sorry for the late reply (since I don’t have the right to approve a patch, I 
> has been waiting for any arm port maintainer to review this patch).
> The following is the arm port maintainer information I got from MAINTAINERS 
> file (you might want to explicitly cc’ing one of them for a review)
>
> arm portNick Clifton
> arm portRichard Earnshaw
> arm portRamana Radhakrishnan
> arm portKyrylo Tkachov  
>
> I see that Ramana implemented the similar patch for aarch64 (commit 
> cd0b2d361df82c848dc7e1c3078651bb0624c3c6), So, I am CCing him with this 
> email. Hopefully he will review this patch.
>

Thank you Qing. But I know Ramana well, and I know he no longer works
on GCC. I collaborated with him on the AArch64 implementation at the
time (but he wrote all the code)

> Anyway, I briefly read your patch (version 4), and have the following 
> questions and comments:
>
> 1.  When the option -mstack-protector-guard=tls presents,  should the option 
> mstack-protector-guard-offset=.. be required to present?
>  If it’s required to present, you might want to add such requirement to 
> the documentation, and also issue errors when it’s not present.
>  It’s not clear right now from the current implementation, so, you might 
> need to update both "arm_option_override_internal “ in arm.c
>  and doc/invoke.texi to make this clear.
>

An  offset of 0x0 is a reasonable default, so I don't think it is
necessary to require the offset param to be passed in that case.

> 2. For arm, is there only one system register can be used for this purpose?
>

There are other registers that might be used in the same way, but the
TLS register is the obvious choice. On AArch64, we decided to use
'sysreg' and permit the user to specify the register because the Linux
kernel uses the user space stack pointer (SP_EL0), which is kind of
odd so we did not want to hard code that.

> 3. For the functionality you added, I didn’t see any testing cases added, I 
> think testing cases are needed.
>

Yes, I am aware of that. I'm just not sure I know how to proceed here:
any pointers?

> More comments are embedded below:
>
> > On Oct 28, 2021, at 6:27 AM, Ard Biesheuvel  wrote:
> >
> > Add support for accessing the stack canary value via the TLS register,
> > so that multiple threads running in the same address space can use
> > distinct canary values. This is intended for the Linux kernel running in
> > SMP mode, where processes entering the kernel are essentially threads
> > running the same program concurrently: using a global variable for the
> > canary in that context is problematic because it can never be rotated,
> > and so the OS is forced to use the same value as long as it remains up.
> >
> > Using the TLS register to index the stack canary helps with this, as it
> > allows each CPU to context switch the TLS register along with the rest
> > of the process, permitting each process to use its own value for the
> > stack canary.
> >
> > 2021-10-28 Ard Biesheuvel 
> >
> >   * config/arm/arm-opts.h (enum stack_protector_guard): New
> >   * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem):
> >   New
> >   * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define
> >   (arm_option_override_internal): Handle and put in error checks
> >   for stack protector guard options.
> >   (arm_option_reconfigure_globals): Likewise
> >   (arm_stack_protect_tls_canary_mem): New
> >   (arm_stack_protect_guard): New
> >   * config/arm/arm.md (stack_protect_set): New
> >   (stack_protect_set_tls): Likewise
> >   (stack_protect_test): Likewise
> >   (stack_protect_test_tls): Likewise
> >   (reload_tp_hard): Likewise
> >   * config/arm/arm.opt (-mstack-protector-guard): New
> >   (-mstack-protector-guard-offset): New.
> >   * doc/invoke.texi: Document new options
> >
> > Signed-off-by: Ard Biesheuvel 
> > ---
> > gcc/config/arm/arm-opts.h   |  6 ++
> > gcc/config/arm/arm-protos.h |  2 +
> > gcc/config/arm/arm.c| 55 +++
> > gcc/config/arm/arm.md   | 71 +++-
> > gcc/config/arm/arm.opt  | 22 ++
> > gcc/doc/invoke.texi |  9 +++
> > 6 files changed, 163 

Re: [PATCH][GCC] arm: add armv9-a architecture to -march

2021-11-16 Thread Ramana Radhakrishnan via Gcc-patches
Hi There,

I think for AArch32 mapping it back to armv8-a sounds sufficient.  Unless we 
have string or math routines in newlib that make use of any ACLE guards that 
are beyond armv8-a …

Ramana


From: Richard Earnshaw 
Date: Tuesday, 16 November 2021 at 11:48
To: Christophe Lyon , Przemyslaw Wirkus 

Cc: Ramana Radhakrishnan , 
gcc-patches@gcc.gnu.org , Richard Earnshaw 

Subject: Re: [PATCH][GCC] arm: add armv9-a architecture to -march
You can't make an omelette without breaking eggs, as they say.  New
architectures need new assemblers.

However, I wonder if there's anything in v9-a that significantly affects
the quality of the base multilib code needed for building the libraries.
  It might be that we can deal with v9-a by just mapping it to the v8-a
equivalents.  That would then avoid the need for an updated assembler,
and reduce the build time and install footprint.

R.


On 16/11/2021 08:03, Christophe Lyon via Gcc-patches wrote:
> Hi,
>
>
> On Tue, Nov 9, 2021 at 12:36 PM Przemyslaw Wirkus via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
> -Original Message-
> From: Przemyslaw Wirkus
> Sent: 18 October 2021 10:37
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Ramana
> Radhakrishnan ; Kyrylo Tkachov
> ; ni...@redhat.com
> Subject: [PATCH][GCC] arm: add armv9-a architecture to -march
>
> Hi,
>
> This patch is adding `armv9-a` to -march in Arm GCC.
>
> In this patch:
>+ Add `armv9-a` to -march.
>+ Update multilib with armv9-a and armv9-a+simd.
>
> After this patch three additional multilib directories are available:
>
> $ arm-none-eabi-gcc --print-multi-lib .; [...vanilla multi-lib
> dirs...] thumb/v9-a/nofp;@mthumb@march=armv9-a@mfloat-abi=soft
> thumb/v9-a+simd/softfp;@mthumb@march=armv9-a+simd@mfloat-
> abi=softfp
> thumb/v9-a+simd/hard;@mthumb@march=armv9-a+simd@mfloat-
> abi=hard
>
>>
>
> This is causing a GCC build failure when using "old" binutils (I'm using
> 2.36.1),
> because the new -march=armv9-a option is not supported. This breaks the
> multilib support.
>
> I don't remember how we handled similar cases in the past? Is that just
> "expected", and
> "current" GCC needs "current" binutils, or should we have a multilib list
> dependent on
> the actual binutils support? (I think this is not the case, and it sounds
> like an undesirable
> extra complication in an already overcrowded mutilib-Makefile)
>
> Christophe
>
 New multi-lib directories under
> $GCC_INSTALL_DIE/lib/gcc/arm-none-eabi/12.0.0/thumb are created:
>
> thumb/
> +--- v9-a
> ||--- nofp
> |
> +--- v9-a+simd
>   |--- hard
>   |--- softfp
>
> Regtested on arm-none-eabi cross and no issues.
>
> OK for master?
>>
>> Thanks.
>>
>> commit 32ba7860ccaddd5219e6dae94a3d0653e124c9dd
>>
>>> Ok.
>>> Thanks,
>>> Kyrill
>>>
>>>
>
> gcc/ChangeLog:
>
>* config/arm/arm-cpus.in (armv9): New define.
>(ARMv9a): New group.
>(armv9-a): New arch definition.
>* config/arm/arm-tables.opt: Regenerate.
>* config/arm/arm.h (BASE_ARCH_9A): New arch enum value.
>* config/arm/t-aprofile: Added armv9-a and armv9+simd.
>* config/arm/t-arm-elf: Added arm9-a, v9_fps and all_v9_archs
>to MULTILIB_MATCHES.
>* config/arm/t-multilib: Added v9_a_nosimd_variants and
>v9_a_simd_variants to MULTILIB_MATCHES.
>* doc/invoke.texi: Update docs.
>
> gcc/testsuite/ChangeLog:
>
>* gcc.target/arm/multilib.exp: Update test with armv9-a entries.
>* lib/target-supports.exp (v9a): Add new armflag.
>(__ARM_ARCH_9A__): Add new armdef.
>
> --
> kind regards,
> Przemyslaw Wirkus
>>
>>


Re: [PATCH][GCC] arm: Enable Cortex-R52+ CPU

2021-09-22 Thread Ramana Radhakrishnan via Gcc-patches
This is OK

Ramana

On 22/09/2021, 09:45, "Przemyslaw Wirkus"  wrote:

Patch is adding Cortex-R52+ as 'cortex-r52plus' command line
flag for -mcpu option.

See: https://www.arm.com/products/silicon-ip-cpu/cortex-r/cortex-r52-plus

OK for master?

gcc/ChangeLog:

2021-09-22  Przemyslaw Wirkus  

* config/arm/arm-cpus.in: Add Cortex-R52+ CPU.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.



Re: [PATCH] arm: Fix ICE on glibc compilation after my DIVMOD optimization [PR97322]

2020-10-08 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Oct 8, 2020 at 10:22 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> The arm target hook for divmod wasn't prepared to handle constants passed to
> the function.
>
> Fixed thusly, bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk?
>
> 2020-10-08  Jakub Jelinek  
>
> PR target/97322
> * config/arm/arm.c (arm_expand_divmod_libfunc): Pass mode instead of
> GET_MODE (op0) or GET_MODE (op1) to emit_library_call_value.
>
> * gcc.dg/pr97322.c: New test.

Ok.

Ramana

>
> --- gcc/config/arm/arm.c.jj 2020-10-07 10:47:46.892985596 +0200
> +++ gcc/config/arm/arm.c2020-10-07 20:19:25.524367665 +0200
> @@ -33275,9 +33275,7 @@ arm_expand_divmod_libfunc (rtx libfunc,
>  = smallest_int_mode_for_size (2 * GET_MODE_BITSIZE (mode));
>
>rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
> -   libval_mode,
> -   op0, GET_MODE (op0),
> -   op1, GET_MODE (op1));
> +   libval_mode, op0, mode, op1, mode);
>
>rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
>rtx remainder = simplify_gen_subreg (mode, libval, libval_mode,
> --- gcc/testsuite/gcc.dg/pr97322.c.jj   2020-10-07 20:19:54.071961807 +0200
> +++ gcc/testsuite/gcc.dg/pr97322.c  2020-10-07 20:19:16.897490309 +0200
> @@ -0,0 +1,17 @@
> +/* PR target/97322 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void
> +foo (unsigned long long x, unsigned long long *y)
> +{
> +  y[0] = x / 10;
> +  y[1] = x % 10;
> +}
> +
> +void
> +bar (unsigned int x, unsigned int *y)
> +{
> +  y[0] = x / 10;
> +  y[1] = x % 10;
> +}
>
> Jakub
>


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-03 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Sep 3, 2020 at 6:13 PM Kees Cook via Gcc-patches
 wrote:
>
> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> > On average, all the options starting with “used_…”  (i.e, only the 
> > registers that are used in the routine will be zeroed) have very low 
> > runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP 
> > benchmarks.
> > If all the registers will be zeroed, the runtime overhead is bigger, 
> > all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks 
> > on average.
> > Looks like the overhead of zeroing vector registers is much bigger.
> >
> > For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, 
> > the runtime overhead with this is very small.
>
> That looks great; thanks for doing those tests!
>
> (And it seems like these benchmarks are kind of a "worst case" scenario
> with regard to performance, yes? As in it's mostly tight call loops?)


That's true of some of them but definitely not all - the GCC benchmark
springs to mind in SPEC as having quite a flat profile, so I'd take a
look there and probe a bit more in that one to see what happens. Don't
ask me what else , that's all I have in my cache this evening :)

I'd also query the "average" slowdown metric in those numbers as
something that's being measured in a different way here. IIRC the SPEC
scores for int and FP are computed with a geometric mean of the
individual ratios of each of the benchmark. Thus I don't think the
average of the slowdowns is enough to talk about slowdowns for the
benchmark suite. A quick calculation of the arithmetic mean of column
B in my head suggests that it's the arithmetic mean of all the
slowdowns ?

i.e. Slowdown (Geometric Mean (x, y, z, ))  != Arithmetic mean (
Slowdown (x), Slowdown (y) .)

So another metric to look at would be to look at the Slowdown of your
estimated (probably non-reportable) SPEC scores as well to get a more
"spec like" metric.

regards
Ramana
>
> --
> Kees Cook


Re: [PATCH v3] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-28 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Aug 26, 2020 at 12:08 PM Richard Biener via Gcc-patches
 wrote:
>
> On Tue, Aug 25, 2020 at 6:32 PM Maciej W. Rozycki  wrote:
> >
> > Hi Kito,
> >
> > > I just found the mail thread about div mod with -fnon-call-exceptions,
> > > I think keeping the default LIB2_DIVMOD_EXCEPTION_FLAGS unchanged
> > > should be the best way to go.
> > >
> > > Non-call exceptions and libcalls
> > > https://gcc.gnu.org/legacy-ml/gcc/2001-06/msg01108.html
> > >
> > > Non-call exceptions and libcalls Part 2
> > > https://gcc.gnu.org/legacy-ml/gcc/2001-07/msg00402.html
> >
> >  Thank you for your input.  I believe I had a look at these commits before
> > I posted my original proposal.  Please note however that they both predate
> > the addition of `-fasynchronous-unwind-tables', so clearly the option
> > could not have been considered at the time the changes were accepted into
> > GCC.
> >
> >  Please note that, as observed by Andreas and Richard here:
> >  in no case we
> > want to have full exception handling here, so we clearly need no
> > `-fexceptions'; this libcall code won't itself ever call `throw'.
> >
> >  Now it might be a bit unclear from documentation as to whether we want
> > `-fnon-call-exceptions' or `-fasynchronous-unwind-tables', as it says that
> > the former option makes GCC:
> >
> > "Generate code that allows trapping instructions to throw
> >  exceptions.  Note that this requires platform-specific runtime
> >  support that does not exist everywhere.  Moreover, it only allows
> >  _trapping_ instructions to throw exceptions, i.e. memory references
> >  or floating-point instructions.  It does not allow exceptions to be
> >  thrown from arbitrary signal handlers such as 'SIGALRM'."
> >
> > Note the observation that arbitrary signal handlers (invoked at more inner
> > a frame level, and necessarily built with `-fexceptions') are still not
> > allowed to throw exceptions.  For that, as far as I understand it, you
> > actually need `-fasynchronous-unwind-tables', which makes GCC:
> >
> > "Generate unwind table in DWARF format, if supported by target
> >  machine.  The table is exact at each instruction boundary, so it
> >  can be used for stack unwinding from asynchronous events (such as
> >  debugger or garbage collector)."
> >
> > and therefore allows arbitrary signal handlers to throw exceptions,
> > effectively making the option a superset of `-fexceptions'.  As libcall
> > code can generally be implicitly invoked everywhere, we want people not to
> > be restrained by it and let a exception thrown by e.g. a user-supplied
> > SIGALRM handler propagate through the relevant libcall's stack frame,
> > rather than just those exceptions the libcall itself might indirectly
> > cause.
> >
> >  Maybe I am missing something here, especially as `-fexceptions' mentions
> > code generation, while `-fasynchronous-unwind-tables' only refers to
> > unwind table generation, but then what would be the option to allow
> > exceptions to be thrown from arbitrary signal handlers rather than those
> > for memory references or floating-point instructions (where by a special
> > provision integer division falls as well)?
> >
> >  My understanding has been it is `-fasynchronous-unwind-tables', but I'll
> > be gladly straightened out otherwise.  If I am indeed right, then perhaps
> > the documentation could be clarified and expanded a bit.
> >
> >  Barring evidence to the contrary I maintain the change I have proposed is
> > correct, and not only removes the RISC-V `ld.so' build issue, but it fixes
> > the handling of asynchronous events arriving in the middle of the relevant
> > libcalls for all platforms as well.
> >
> >  Please let me know if you have any further questions, comments or
> > concerns.
>
> You only need -fexceptions for that, then you can throw; from a signal handler
> for example.  If you want to be able to catch the exception somewhere up
> the call chain all intermediate code needs to be compiled so that unwinding
> from asynchronous events is possible - -fasynchronous-unwind-tables.
>
> So -fasynchronous-unwind-tables is about unwinding.  -f[non-call]-exceptions
> is about throw/catch.  Clearly libgcc does neither throw nor catch but with
> async events we might need to unwind from inside it.
>
> Now I don't know about the arm situation but if arm cannot do async unwinding
> then even -fexceptions won't help it here - libgcc still does not throw.

On Arm as in the AArch32 port,  async unwinding will not work as those
can't be expressed in the EH format tables.

regards
Ramana

>
> Richard.
>
> >
> >   Maciej


Re: [PATCH] arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]

2020-08-27 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Aug 24, 2020 at 4:35 PM Christophe Lyon
 wrote:
>
> On Mon, 24 Aug 2020 at 11:09, Christophe Lyon
>  wrote:
> >
> > On Sat, 22 Aug 2020 at 00:44, Ramana Radhakrishnan
> >  wrote:
> > >
> > > On Wed, Aug 19, 2020 at 10:32 AM Christophe Lyon via Gcc-patches
> > >  wrote:
> > > >
> > > > armv8-m.base (cortex-m23) has the movt instruction, so we need to
> > > > disable the define_split to generate a constant in this case,
> > > > otherwise we get incorrect insn constraints as described in PR94538.
> > > >
> > > > We also need to fix the pure-code alternative for thumb1_movsi_insn
> > > > because the assembler complains with instructions like
> > > > movs r0, #:upper8_15:1234
> > > > (Internal error in md_apply_fix)
> > > > We now generate movs r0, 4 instead.
> > > >
> > > > 2020-08-19  Christophe Lyon  
> > > > gcc/ChangeLog:
> > > >
> > > > * config/arm/thumb1.md: Disable set-constant splitter when
> > > > TARGET_HAVE_MOVT.
> > > > (thumb1_movsi_insn): Fix -mpure-code
> > > > alternative.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/arm/pure-code/pr94538-1.c: New test.
> > > > * gcc.target/arm/pure-code/pr94538-2.c: New test.
> > >
> >
> > Hi Ramana,
> >
> > >  I take it that this fixes the ICE rather than addressing the code
> > > generation / performance bits that Wilco was referring to ? Really the
> > > other code quality / performance issues listed in the PR should really
> > > be broken down into separate PRs while we take this as a fix for
> > > fixing the ICE,
> > >
> > > Under that assumption OK.
> > >
> >
> > Yes, that's correct, this patch just fixes the ICE, as it is cleaner
> > to handle perf issues in a different patch.
> >
>
> The patch applies cleanly to gcc-9 and gcc-10: OK to backport there?

Yes, all good.

Thanks,
Ramana

>
> (I tested that the offending testcase passes on the branches with the
> backport, and fails without).
>
> Thanks,
>
> Christophe
>
> > I'll open a new PR to track the perf issue.
> >
> > Thanks
> >
> >
> > > Ramana
> > >
> > > > ---
> > > >  gcc/config/arm/thumb1.md   | 66 
> > > > ++
> > > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c | 13 +
> > > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c | 12 
> > > >  3 files changed, 79 insertions(+), 12 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
> > > >
> > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > > > index 0ff8190..f0129db 100644
> > > > --- a/gcc/config/arm/thumb1.md
> > > > +++ b/gcc/config/arm/thumb1.md
> > > > @@ -70,6 +70,7 @@ (define_split
> > > >"TARGET_THUMB1
> > > > && arm_disable_literal_pool
> > > > && GET_CODE (operands[1]) == CONST_INT
> > > > +   && !TARGET_HAVE_MOVT
> > > > && !satisfies_constraint_I (operands[1])"
> > > >[(clobber (const_int 0))]
> > > >"
> > > > @@ -696,18 +697,59 @@ (define_insn "*thumb1_movsi_insn"
> > > >"TARGET_THUMB1
> > > > && (   register_operand (operands[0], SImode)
> > > > || register_operand (operands[1], SImode))"
> > > > -  "@
> > > > -   movs%0, %1
> > > > -   movs%0, %1
> > > > -   movw%0, %1
> > > > -   #
> > > > -   #
> > > > -   ldmia\\t%1, {%0}
> > > > -   stmia\\t%0, {%1}
> > > > -   movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, #:upper0_7:%1; 
> > > > lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, #8; adds\\t%0, 
> > > > #:lower0_7:%1
> > > > -   ldr\\t%0, %1
> > > > -   str\\t%1, %0
> > > > -   mov\\t%0, %1"
> > > > +{
> > > > +  switch (which_alternative)
> > > > +{
> > > > +  default:
> > > > +  case 0: return "movs\t%0, %1";
> > > > +  case 1: return "movs\t%0, %1";
> > > > +  case 2: return "movw\t%0, %1";
> > > > +  case 3: return "#";
> > > > +  case 4: return "#";
> > > > +  case 5: return "ldmia\t%1, {%0}";
> > > > +  case 6: return "stmia\t%0, {%1}";
> > > > +  case 7:
> > > > +  /* pure-code alternative: build the constant byte by byte,
> > > > +instead of loading it from a constant pool.  */
> > > > +   {
> > > > + int i;
> > > > + HOST_WIDE_INT op1 = INTVAL (operands[1]);
> > > > + bool mov_done_p = false;
> > > > + rtx ops[2];
> > > > + ops[0] = operands[0];
> > > > +
> > > > + /* Emit upper 3 bytes if needed.  */
> > > > + for (i = 0; i < 3; i++)
> > > > +   {
> > > > +  int byte = (op1 >> (8 * (3 - i))) & 0xff;
> > > > +
> > > > + if (byte)
> > > > +   {
> > > > + ops[1] = GEN_INT (byte);
> > > > + if (mov_done_p)
> > > > +   output_asm_insn ("adds\t%0, %1", ops);
> > > > + else
> > > > +   output_asm_insn ("movs\t%0, %1", ops);
> > > > +   

Re: [PATCH][GCC][GCC-10 backport] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-21 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Aug 21, 2020 at 2:28 PM Joe Ramsay  wrote:
>
> From: Joe Ramsay 
>
> Hi,
>
> Previously, the machine description patterns for vst1q accepted a generic 
> memory
> operand for the destination, which could lead to an unrecognised builtin when
> expanding vst1q* intrinsics. This change fixes the pattern to only accept MVE
> memory operands.
>
> Tested on arm-none-eabi, clean w.r.t. gcc and CMSIS-DSP testsuites. Backports
> cleanly onto gcc-10 branch. OK for backport?
>

OK.

Ramana



> Thanks,
> Joe
>
> gcc/ChangeLog:
>
> PR target/96683
> * config/arm/mve.md (mve_vst1q_f): Require MVE memory operand 
> for
> destination.
> (mve_vst1q_): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/96683
> * gcc.target/arm/mve/intrinsics/vst1q_f16.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_s16.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_s8.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_u16.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_u8.c: New test.
>
> (cherry picked from commit 91d206adfe39ce063f6a5731b92a03c05e82e94a)
> ---
>  gcc/config/arm/mve.md   |  4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
>  6 files changed, 37 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 9758862..465b39a 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9330,7 +9330,7 @@
>[(set_attr "length" "4")])
>
>  (define_expand "mve_vst1q_f"
> -  [(match_operand: 0 "memory_operand")
> +  [(match_operand: 0 "mve_memory_operand")
> (unspec: [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
>]
>"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> @@ -9340,7 +9340,7 @@
>  })
>
>  (define_expand "mve_vst1q_"
> -  [(match_operand:MVE_2 0 "memory_operand")
> +  [(match_operand:MVE_2 0 "mve_memory_operand")
> (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
>]
>"TARGET_HAVE_MVE"
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> index 363b4ca..312b746 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
>vst1q_f16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (float16_t * addr, float16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (float16_t a, float16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> index 37c4713..cd14e2c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> @@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
>vst1q_s16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (int16_t * addr, int16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (int16_t a, int16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> index fe5edea..0004c80 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> @@ -10,12 +10,16 @@ foo (int8_t * addr, int8x16_t value)
>vst1q_s8 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> -
>  void
>  foo1 (int8_t * addr, int8x16_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> +/* { dg-final { scan-assembler-times "vstrb.8" 2 }  } */
> +
> +void
> +foo2 (int8_t a, int8x16_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> index a4c8c1a..248e7ce 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> @@ -10,12 +10,16 @@ foo (uint16_t * addr, uint16x8_t value)
>vst1q_u16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  

Re: [PATCH] arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]

2020-08-21 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Aug 19, 2020 at 10:32 AM Christophe Lyon via Gcc-patches
 wrote:
>
> armv8-m.base (cortex-m23) has the movt instruction, so we need to
> disable the define_split to generate a constant in this case,
> otherwise we get incorrect insn constraints as described in PR94538.
>
> We also need to fix the pure-code alternative for thumb1_movsi_insn
> because the assembler complains with instructions like
> movs r0, #:upper8_15:1234
> (Internal error in md_apply_fix)
> We now generate movs r0, 4 instead.
>
> 2020-08-19  Christophe Lyon  
> gcc/ChangeLog:
>
> * config/arm/thumb1.md: Disable set-constant splitter when
> TARGET_HAVE_MOVT.
> (thumb1_movsi_insn): Fix -mpure-code
> alternative.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/pure-code/pr94538-1.c: New test.
> * gcc.target/arm/pure-code/pr94538-2.c: New test.

 I take it that this fixes the ICE rather than addressing the code
generation / performance bits that Wilco was referring to ? Really the
other code quality / performance issues listed in the PR should really
be broken down into separate PRs while we take this as a fix for
fixing the ICE,

Under that assumption OK.

Ramana

> ---
>  gcc/config/arm/thumb1.md   | 66 
> ++
>  gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c | 13 +
>  gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c | 12 
>  3 files changed, 79 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
>
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index 0ff8190..f0129db 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -70,6 +70,7 @@ (define_split
>"TARGET_THUMB1
> && arm_disable_literal_pool
> && GET_CODE (operands[1]) == CONST_INT
> +   && !TARGET_HAVE_MOVT
> && !satisfies_constraint_I (operands[1])"
>[(clobber (const_int 0))]
>"
> @@ -696,18 +697,59 @@ (define_insn "*thumb1_movsi_insn"
>"TARGET_THUMB1
> && (   register_operand (operands[0], SImode)
> || register_operand (operands[1], SImode))"
> -  "@
> -   movs%0, %1
> -   movs%0, %1
> -   movw%0, %1
> -   #
> -   #
> -   ldmia\\t%1, {%0}
> -   stmia\\t%0, {%1}
> -   movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, #:upper0_7:%1; 
> lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, #8; adds\\t%0, 
> #:lower0_7:%1
> -   ldr\\t%0, %1
> -   str\\t%1, %0
> -   mov\\t%0, %1"
> +{
> +  switch (which_alternative)
> +{
> +  default:
> +  case 0: return "movs\t%0, %1";
> +  case 1: return "movs\t%0, %1";
> +  case 2: return "movw\t%0, %1";
> +  case 3: return "#";
> +  case 4: return "#";
> +  case 5: return "ldmia\t%1, {%0}";
> +  case 6: return "stmia\t%0, {%1}";
> +  case 7:
> +  /* pure-code alternative: build the constant byte by byte,
> +instead of loading it from a constant pool.  */
> +   {
> + int i;
> + HOST_WIDE_INT op1 = INTVAL (operands[1]);
> + bool mov_done_p = false;
> + rtx ops[2];
> + ops[0] = operands[0];
> +
> + /* Emit upper 3 bytes if needed.  */
> + for (i = 0; i < 3; i++)
> +   {
> +  int byte = (op1 >> (8 * (3 - i))) & 0xff;
> +
> + if (byte)
> +   {
> + ops[1] = GEN_INT (byte);
> + if (mov_done_p)
> +   output_asm_insn ("adds\t%0, %1", ops);
> + else
> +   output_asm_insn ("movs\t%0, %1", ops);
> + mov_done_p = true;
> +   }
> +
> + if (mov_done_p)
> +   output_asm_insn ("lsls\t%0, #8", ops);
> +   }
> +
> + /* Emit lower byte if needed.  */
> + ops[1] = GEN_INT (op1 & 0xff);
> + if (!mov_done_p)
> +   output_asm_insn ("movs\t%0, %1", ops);
> + else if (op1 & 0xff)
> +   output_asm_insn ("adds\t%0, %1", ops);
> + return "";
> +   }
> +  case 8: return "ldr\t%0, %1";
> +  case 9: return "str\t%1, %0";
> +  case 10: return "mov\t%0, %1";
> +}
> +}
>[(set_attr "length" "2,2,4,4,4,2,2,14,2,2,2")
> (set_attr "type" 
> "mov_reg,mov_imm,mov_imm,multiple,multiple,load_4,store_4,alu_sreg,load_4,store_4,mov_reg")
> (set_attr "pool_range" "*,*,*,*,*,*,*, *,1018,*,*")
> diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c 
> b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> new file mode 100644
> index 000..31061d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "skip override" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
> +/* { dg-options "-mpure-code -mcpu=cortex-m23 -march=armv8-m.base -mthumb 
> -mfloat-abi=soft" } */
> +
> +typedef int 

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-08-21 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Aug 17, 2020 at 7:42 PM Dennis Zhang  wrote:
>
>
> Hi all,
>
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub3 optab. The sub3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
>
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>

Hi Dennis,

Thanks for this patch . However a quick read suggests  at first glance
that it could do with some refactoring or indeed further breaking
down.

1. The refactor for TARGET_NEON_IWWMMXT and friends which I don't get
the motivation for obviously on a quick read. I'll try and read that
again. Please document why these complex TARGET_ macros exist and how
they are expected to be used in the machine description and what they
are indicated to do.
2. It seems odd that we would have
 "&& ((mode != V2SFmode && mode != V4SFmode)
+|| flag_unsafe_math_optimizations))" apply to TARGET_NEON but not
apply this to TARGET_MVE_FLOAT in the sub3 expander. The point
is that if it isn't safe to vectorize a subtract for Neon, why is it
safe to do the same for MVE ? This was done in 2010 by Julian to fix
PR target/43703 - isn't this applicable on MVE as well ?
3. I'm also going to quibble a bit about the use of VSEL as the name
of an iterator as that conflates it with the instruction vsel and it's
not obvious what's going on here.


> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>

I'm a bit confused as to why this got exposed because of the new MVE
vector modes exposed by this patch.

> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
>
Bootstrapped and regression tested for arm-none-linux-gnueabihf with a
--with-fpu=neon in the configuration ?


> Is it OK for trunk please?



Ramana

>
> Thanks
> Dennis
>
> gcc/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
> * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
> * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
> (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
> (TARGET_NEON_MVE_HFP): Likewise.
> * config/arm/iterators.md (VSEL): New mode iterator to select modes
> for corresponding targets.
> * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
> using expression 'minus'.
> (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
> * config/arm/neon.md (sub3): Removed here. Integrated in the
> sub3 in vec-common.md
> * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
> to select available modes. Exclude TARGET_NEON_FP16INST from
> TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
> originally in neon.md.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
> * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> option -fno-ipa-icf and change the instruction count from 8 to 16.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
> * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
> * gcc.target/arm/mve/vect/vect_sub_1.c: New test.


Re: [PATCH] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-20 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Aug 20, 2020 at 3:31 PM Joe Ramsay  wrote:
>
> Hi Ramana,
>
> Thanks for the review.
>
> On 18/08/2020, 18:37, "Ramana Radhakrishnan"  
> wrote:
>
> On Thu, Aug 13, 2020 at 2:18 PM Joe Ramsay  wrote:
> >
> > From: Joe Ramsay 
> >
> > Hi,
> >
> > Previously, the machine description patterns for vst1q accepted a 
> generic memory
> > operand for the destination, which could lead to an unrecognised 
> builtin when
> > expanding vst1q* intrinsics. This change fixes the patterns to only 
> accept MVE
> > memory operands.
>
> This is OK though I suspect this needs a PR and a backport request for 
> GCC 10.
>
> There's now a PR for this, 96683. I've attached an updated patch file, the 
> only change is
> that I've included the PR number in the changelog. Please let me know if this 
> is OK for
> trunk.

Yep absolutely fine - FTR such fixes to changelogs with PR numbers and
administrativia count as obvious and can just be applied.

Ramana

>
> Thanks,
> Joe
>
> regards
> Ramana
>
> >
> > Thanks,
> > Joe
> >
> > gcc/ChangeLog:
> >
> > 2020-08-13  Joe Ramsay 
> >
> > * config/arm/mve.md (mve_vst1q_f): Require MVE memory 
> operand for
> > destination.
> > (mve_vst1q_): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-08-13  Joe Ramsay 
> >
> > * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Add test that only 
> MVE
> > memory operand is accepted.
> > * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> > ---
> >  gcc/config/arm/mve.md   |  4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
> >  6 files changed, 37 insertions(+), 17 deletions(-)
> >
> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > index 9758862..465b39a 100644
> > --- a/gcc/config/arm/mve.md
> > +++ b/gcc/config/arm/mve.md
> > @@ -9330,7 +9330,7 @@
> >[(set_attr "length" "4")])
> >
> >  (define_expand "mve_vst1q_f"
> > -  [(match_operand: 0 "memory_operand")
> > +  [(match_operand: 0 "mve_memory_operand")
> > (unspec: [(match_operand:MVE_0 1 "s_register_operand")] 
> VST1Q_F)
> >]
> >"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> > @@ -9340,7 +9340,7 @@
> >  })
> >
> >  (define_expand "mve_vst1q_"
> > -  [(match_operand:MVE_2 0 "memory_operand")
> > +  [(match_operand:MVE_2 0 "mve_memory_operand")
> > (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
> >]
> >"TARGET_HAVE_MVE"
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > index 363b4ca..312b746 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
> >vst1q_f16 (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > -
> >  void
> >  foo1 (float16_t * addr, float16x8_t value)
> >  {
> >vst1q (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> > +
> > +void
> > +foo2 (float16_t a, float16x8_t x)
> > +{
> > +  vst1q (, x);
> > +}
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > index 37c4713..cd14e2c 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> > @@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
> >vst1q_s16 (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > -
> >  void
> >  foo1 (int16_t * addr, int16x8_t value)
> >  {
> >vst1q (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> > +
> > +void
> > +foo2 (int16_t a, int16x8_t x)
> > +{
> > +  vst1q (, x);
> > +}
> > diff 

Re: [PATCH] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-18 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Aug 13, 2020 at 2:18 PM Joe Ramsay  wrote:
>
> From: Joe Ramsay 
>
> Hi,
>
> Previously, the machine description patterns for vst1q accepted a generic 
> memory
> operand for the destination, which could lead to an unrecognised builtin when
> expanding vst1q* intrinsics. This change fixes the patterns to only accept MVE
> memory operands.

This is OK though I suspect this needs a PR and a backport request for GCC 10.


regards
Ramana

>
> Thanks,
> Joe
>
> gcc/ChangeLog:
>
> 2020-08-13  Joe Ramsay 
>
> * config/arm/mve.md (mve_vst1q_f): Require MVE memory operand 
> for
> destination.
> (mve_vst1q_): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-13  Joe Ramsay 
>
> * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Add test that only MVE
> memory operand is accepted.
> * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> ---
>  gcc/config/arm/mve.md   |  4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
>  6 files changed, 37 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 9758862..465b39a 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9330,7 +9330,7 @@
>[(set_attr "length" "4")])
>
>  (define_expand "mve_vst1q_f"
> -  [(match_operand: 0 "memory_operand")
> +  [(match_operand: 0 "mve_memory_operand")
> (unspec: [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
>]
>"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> @@ -9340,7 +9340,7 @@
>  })
>
>  (define_expand "mve_vst1q_"
> -  [(match_operand:MVE_2 0 "memory_operand")
> +  [(match_operand:MVE_2 0 "mve_memory_operand")
> (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
>]
>"TARGET_HAVE_MVE"
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> index 363b4ca..312b746 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
>vst1q_f16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (float16_t * addr, float16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (float16_t a, float16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> index 37c4713..cd14e2c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> @@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
>vst1q_s16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (int16_t * addr, int16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (int16_t a, int16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> index fe5edea..0004c80 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> @@ -10,12 +10,16 @@ foo (int8_t * addr, int8x16_t value)
>vst1q_s8 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> -
>  void
>  foo1 (int8_t * addr, int8x16_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> +/* { dg-final { scan-assembler-times "vstrb.8" 2 }  } */
> +
> +void
> +foo2 (int8_t a, int8x16_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> index a4c8c1a..248e7ce 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> @@ -10,12 +10,16 @@ foo (uint16_t * addr, uint16x8_t value)
>vst1q_u16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (uint16_t * addr, uint16x8_t value)
>  {
>vst1q 

Re: [PATCH][GCC][Arm]: Fix bootstrap failure with rtl-checking

2020-05-19 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Apr 27, 2020 at 2:22 PM Andre Vieira (lists)
 wrote:
>
> Hi,
>
> The code change that caused this regression was not meant to affect neon
> code-gen, however I missed the REG fall through.  This patch makes sure
> we only get the left-hand of the PLUS if it is indeed a PLUS expr.
>
> I suggest that in gcc-11 this code is cleaned up, as I do not think we
> even need the overlap checks, NEON only loads from or stores to FP
> registers and these can't be used in its addressing modes.
>
> Bootstrapped arm-linux-gnueabihf with '--enable-checking=yes,rtl' for
> armv7-a and amrv8-a.
>
> Is this OK for trunk?
>

Also for GCC-10  ?

Ramana

> gcc/ChangeLog:
> 2020-04-27  Andre Vieira  
>
>  * config/arm/arm.c (output_move_neon): Only get the first operand,
>  if addr is PLUS.
>


Re: [ARM][wwwdocs]: Document Armv8.1-M Mainline Security Extensions changes.

2020-05-15 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, May 15, 2020 at 12:31 PM Srinath Parvathaneni
 wrote:
>
> Armv8.1-M Mainline Security Extensions related changes in GCC-10.
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
> index 
> 57ca749da72ed64da37b3eb5404cf5cde8be44dd..10bf3b78c7769b73c808bd2c2fe60ebbfc9c887e
>  100644
> --- a/htdocs/gcc-10/changes.html
> +++ b/htdocs/gcc-10/changes.html
> @@ -747,6 +747,9 @@ typedef svbool_t pred512 
> __attribute__((arm_sve_vector_bits(512)));
>Support for the Custom Datapath Extension beta ACLE
> href="https://developer.arm.com/docs/101028/0010/custom-datapath-extension;>
>intrinsics has been added.
> +  Support for Armv8.1-M Mainline Security Extensions architecture has 
> been added. The -mcmse option,
> +  when used in combination with an Armv8.1-M Mainline architecture (for 
> example, -march=armv8.1.m.main -mcmse),
> +  now leads to the generation of improved code sequences when changing 
> security states.
>  
>

Ok.

Ramana

>
>


Re: [PATCH 2/2] arm: Add support for interrupt routines to reg_needs_saving_p

2020-05-15 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, May 15, 2020 at 7:36 AM Christophe Lyon
 wrote:
>
> On Thu, 14 May 2020 at 17:58, Ramana Radhakrishnan
>  wrote:
> >
> > >  static bool reg_needs_saving_p (unsigned reg)
> > >  {
> > >unsigned long func_type = arm_current_func_type ();
> >
> > Ah ok , you needed it here.
>
> Yes sorry.
> Is this patch (2/2) OK?
>

This looks ok to me as long as there are no regressions and you rejig
the hunks between 1/2 and 2/2

regards
Ramana

> Thanks,
>
> Christophe
>
> >
> > Ramana


Re: [PATCH 2/2] arm: Add support for interrupt routines to reg_needs_saving_p

2020-05-14 Thread Ramana Radhakrishnan via Gcc-patches
>  static bool reg_needs_saving_p (unsigned reg)
>  {
>unsigned long func_type = arm_current_func_type ();

Ah ok , you needed it here.

Ramana


Re: [PATCH 1/2] arm: Factorize several occurrences of the same code into reg_needs_saving_p

2020-05-14 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, May 14, 2020 at 3:58 PM Christophe Lyon via Gcc-patches
 wrote:
>
> The same code pattern occurs in several functions, so it seems cleaner
> to move it into a dedicated function.
>
> 2020-05-14  Christophe Lyon  
>
> gcc/
> * config/arm/arm.c (reg_needs_saving_p): New function.
> (use_return_insn): Use reg_needs_saving_p.
> (arm_get_vfp_saved_size): Likewise.
> (arm_compute_frame_layout): Likewise.
> (arm_save_coproc_regs): Likewise.
> (thumb1_expand_epilogue): Likewise.
> (arm_expand_epilogue_apcs_frame): Likewise.
> (arm_expand_epilogue): Likewise.
> ---
>  gcc/config/arm/arm.c | 46 --
>  1 file changed, 24 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index c88de3e..694c1bb 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -4188,6 +4188,18 @@ arm_trampoline_adjust_address (rtx addr)
>return addr;
>  }
>
> +/* Return 1 if REG needs to be saved.   */
> +static bool reg_needs_saving_p (unsigned reg)

static inline ?

> +{
> +  unsigned long func_type = arm_current_func_type ();

Why is this needed here when it's not used further ?

> +
> +  if (!df_regs_ever_live_p (reg)
> +  || call_used_or_fixed_reg_p (reg))
> +return false;
> +  else
> +return true;
> +}
> +
>  /* Return 1 if it is possible to return using a single instruction.
> If SIBLING is non-null, this is a test for a return before a sibling
> call.  SIBLING is the call insn, so we can examine its register usage.  */
> @@ -4317,12 +4329,12 @@ use_return_insn (int iscond, rtx sibling)
>   since this also requires an insn.  */
>if (TARGET_VFP_BASE)
>  for (regno = FIRST_VFP_REGNUM; regno <= LAST_VFP_REGNUM; regno++)
> -  if (df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno))
> +  if (reg_needs_saving_p (regno))
> return 0;
>
>if (TARGET_REALLY_IWMMXT)
>  for (regno = FIRST_IWMMXT_REGNUM; regno <= LAST_IWMMXT_REGNUM; regno++)
> -  if (df_regs_ever_live_p (regno) && ! call_used_or_fixed_reg_p (regno))
> +  if (reg_needs_saving_p (regno))
> return 0;
>
>return 1;
> @@ -20943,7 +20955,6 @@ thumb1_compute_save_core_reg_mask (void)
>return mask;
>  }
>
> -
>  /* Return the number of bytes required to save VFP registers.  */
>  static int
>  arm_get_vfp_saved_size (void)
> @@ -20961,10 +20972,7 @@ arm_get_vfp_saved_size (void)
>regno < LAST_VFP_REGNUM;
>regno += 2)
> {
> - if ((!df_regs_ever_live_p (regno)
> -  || call_used_or_fixed_reg_p (regno))
> - && (!df_regs_ever_live_p (regno + 1)
> - || call_used_or_fixed_reg_p (regno + 1)))
> + if (!reg_needs_saving_p (regno) && !reg_needs_saving_p (regno + 1))
> {
>   if (count > 0)
> {
> @@ -22489,8 +22497,7 @@ arm_compute_frame_layout (void)
>   for (regno = FIRST_IWMMXT_REGNUM;
>regno <= LAST_IWMMXT_REGNUM;
>regno++)
> -   if (df_regs_ever_live_p (regno)
> -   && !call_used_or_fixed_reg_p (regno))
> +   if (reg_needs_saving_p (regno))
>   saved += 8;
> }
>
> @@ -22711,8 +22718,9 @@ arm_save_coproc_regs(void)
>unsigned start_reg;
>rtx insn;
>
> +  if (TARGET_REALLY_IWMMXT)
>for (reg = LAST_IWMMXT_REGNUM; reg >= FIRST_IWMMXT_REGNUM; reg--)
> -if (df_regs_ever_live_p (reg) && !call_used_or_fixed_reg_p (reg))
> +if (reg_needs_saving_p (reg))
>{
> insn = gen_rtx_PRE_DEC (Pmode, stack_pointer_rtx);
> insn = gen_rtx_MEM (V2SImode, insn);
> @@ -22727,9 +22735,7 @@ arm_save_coproc_regs(void)
>
>for (reg = FIRST_VFP_REGNUM; reg < LAST_VFP_REGNUM; reg += 2)
> {
> - if ((!df_regs_ever_live_p (reg) || call_used_or_fixed_reg_p (reg))
> - && (!df_regs_ever_live_p (reg + 1)
> - || call_used_or_fixed_reg_p (reg + 1)))
> + if (!reg_needs_saving_p (reg) && !reg_needs_saving_p (reg + 1))
> {
>   if (start_reg != reg)
> saved_size += vfp_emit_fstmd (start_reg,
> @@ -27024,7 +27030,7 @@ thumb1_expand_epilogue (void)
>/* Emit a clobber for each insn that will be restored in the epilogue,
>   so that flow2 will get register lifetimes correct.  */
>for (regno = 0; regno < 13; regno++)
> -if (df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno))
> +if (reg_needs_saving_p (regno))
>emit_clobber (gen_rtx_REG (SImode, regno));
>
>if (! df_regs_ever_live_p (LR_REGNUM))
> @@ -27090,9 +27096,7 @@ arm_expand_epilogue_apcs_frame (bool really_return)
>
>for (i = FIRST_VFP_REGNUM; i < LAST_VFP_REGNUM; i += 2)
>  /* Look for a case where a reg does not need restoring.  */
> -if ((!df_regs_ever_live_p (i) || call_used_or_fixed_reg_p (i))
> - 

Re: [PATCH] arm.c: Clarify error message in thumb1_expand_prologue

2020-05-14 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, May 14, 2020 at 3:57 PM Christophe Lyon via Gcc-patches
 wrote:
>
> While running the tests with -march=armv5t -mthumb, I came across this
> error message which I think could be clearer.
>
> 2020-05-14  Christophe Lyon  
>
> gcc/
> * config/arm/arm.c (thumb1_expand_prologue): Update error message.
> ---
>  gcc/config/arm/arm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index d507819..dda8771 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -26502,7 +26502,7 @@ thumb1_expand_prologue (void)
>
>if (IS_INTERRUPT (func_type))
>  {
> -  error ("interrupt Service Routines cannot be coded in Thumb mode");
> +  error ("Interrupt Service Routines cannot be coded in Thumb-1 mode");
>return;
>  }
>
> --
> 2.7.4
>

OK.

Ramana


Re: [PATCH] arm: Remove duplicate entries in isr_attribute_args [PR target/57002]

2020-04-29 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Apr 29, 2020 at 4:19 PM Christophe Lyon via Gcc-patches
 wrote:
>
> Remove two duplicate entries in isr_attribute_args ("abort" and
> "ABORT").
>
> 2020-04-29  Christophe Lyon  
>
> PR target/57002
> gcc/
> * config/arm/arm.c (isr_attribute_args): Remove duplicate entries.


OK, this would count as obvious.

Ramana
> ---
>  gcc/config/arm/arm.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 30a2a3a..6a6e804 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3925,8 +3925,6 @@ static const isr_attribute_arg isr_attribute_args [] =
>{ "fiq",   ARM_FT_FIQ },
>{ "ABORT", ARM_FT_ISR },
>{ "abort", ARM_FT_ISR },
> -  { "ABORT", ARM_FT_ISR },
> -  { "abort", ARM_FT_ISR },
>{ "UNDEF", ARM_FT_EXCEPTION },
>{ "undef", ARM_FT_EXCEPTION },
>{ "SWI",   ARM_FT_EXCEPTION },
> --
> 2.7.4
>


Re: [PATCH] arm: Extend the PR94780 fix to arm

2020-04-29 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Apr 29, 2020 at 11:30 AM Richard Sandiford
 wrote:
>
> Essentially the same fix as for x86.
>
> Tested on arm-linux-gnueabihf and armeb-eabi.  Bordering on the obvious
> I guess, but OK to install?
>
> Richard
>

Ok.

Ramana

>
> 2020-04-29  Richard Sandiford  
>
> gcc/
> * config/arm/arm-builtins.c (arm_atomic_assign_expand_fenv): Use
> TARGET_EXPR instead of MODIFY_EXPR for the first assignments to
> fenv_var and new_fenv_var.
> ---
>  gcc/config/arm/arm-builtins.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index aee3fd6e2ff..f64742e6447 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -4167,8 +4167,9 @@ arm_atomic_assign_expand_fenv (tree *hold, tree *clear, 
> tree *update)
>mask = build_int_cst (unsigned_type_node,
> ~((ARM_FE_ALL_EXCEPT << ARM_FE_EXCEPT_SHIFT)
>   | ARM_FE_ALL_EXCEPT));
> -  ld_fenv = build2 (MODIFY_EXPR, unsigned_type_node,
> -   fenv_var, build_call_expr (get_fpscr, 0));
> +  ld_fenv = build4 (TARGET_EXPR, unsigned_type_node,
> +   fenv_var, build_call_expr (get_fpscr, 0),
> +   NULL_TREE, NULL_TREE);
>masked_fenv = build2 (BIT_AND_EXPR, unsigned_type_node, fenv_var, mask);
>hold_fnclex = build_call_expr (set_fpscr, 1, masked_fenv);
>*hold = build2 (COMPOUND_EXPR, void_type_node,
> @@ -4189,8 +4190,8 @@ arm_atomic_assign_expand_fenv (tree *hold, tree *clear, 
> tree *update)
> __atomic_feraiseexcept (new_fenv_var);  */
>
>new_fenv_var = create_tmp_var_raw (unsigned_type_node);
> -  reload_fenv = build2 (MODIFY_EXPR, unsigned_type_node, new_fenv_var,
> -   build_call_expr (get_fpscr, 0));
> +  reload_fenv = build4 (TARGET_EXPR, unsigned_type_node, new_fenv_var,
> +   build_call_expr (get_fpscr, 0), NULL_TREE, NULL_TREE);
>restore_fnenv = build_call_expr (set_fpscr, 1, fenv_var);
>atomic_feraiseexcept = builtin_decl_implicit 
> (BUILT_IN_ATOMIC_FERAISEEXCEPT);
>update_call = build_call_expr (atomic_feraiseexcept, 1,


Re: [PATCH v2] aarch64: Add TX3 machine model

2020-04-24 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Apr 22, 2020 at 8:25 PM Joel Jones  wrote:
>
> Yes, Bellsoft's contribution is to be covered under the Marvell copyright
>
> assignment, as this is a work for hire.


Thanks !

Ramana
>
>
>
> Joel
>
>
>
> >Yes, Bellsoft's contribution is to be covered under the Marvell copyright
>
> >assignment, as this is a work for hire.
>
> >
>
> >Joel
>
> >
>
> >>>Hi Anton,
>
> >>>
>
>  -Original Message-
>
>  From: Gcc-patches  On Behalf Of Anton
>
>  Youdkevitch
>
>  Sent: 20 April 2020 19:29
>
>  To: gcc-patches@gcc.gnu.org
>
>  Cc: jo...@marvell.com
>
>  Subject: [PATCH v2] aarch64: Add TX3 machine model
>
> 
>
>  Here is the patch introducing thunderxt311 maching model
>
>  for the scheduler. A name for the new chip was added to the
>
>  list of the names to be recognized as a valid parameter for mcpu
>
>  and mtune flags. The TX2 cost model was reused for TX3.
>
> 
>
>  The previously used "cryptic" name for the command line
>
>  parameter is replaced with the same "thunderxt311" name.
>
> 
>
>  Bootstrapped on AArch64.
>
> >>>
>
> >>>Thanks for the patch. I had meant to ask, do you have a copyright 
> >>>assignment in place?
>
> >>>We'd need one to accept a contribution of this size.
>
> >>>Thanks,
>
> >>>Kyrill
>
> >>>
>
> 
>
>  2020-04-20 Anton Youdkevitch 
>
> 
>
>  * config/aarch64/aarch64-cores.def: Add the chip name.
>
>  * config/aarch64/aarch64-tune.md: Regenerated.
>
>  * gcc/config/aarch64/aarch64.c: Add the cost tables for the chip.
>
>  * gcc/config/aarch64/thunderx3t11.md: New file: add the new
>
>  machine model for the scheduler
>
>  * gcc/config/aarch64/aarch64.md: Include the new model.
>
> 
>
>  ---
>
>   gcc/config/aarch64/aarch64-cores.def |   3 +
>
>   gcc/config/aarch64/aarch64-tune.md   |   2 +-
>
>   gcc/config/aarch64/aarch64.c |  27 +
>
>   gcc/config/aarch64/aarch64.md|   1 +
>
>   gcc/config/aarch64/thunderx3t11.md   | 686 +++
>
>   5 files changed, 718 insertions(+), 1 deletion(-)
>
> >>>
>
> >>>
>
> >>>
>
> >>>
>
> >


Re: [PATCH v2] aarch64: Add TX3 machine model

2020-04-22 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Apr 22, 2020 at 5:38 AM Joel Jones via Gcc-patches
 wrote:
>
> I just joined the gcc-patches list, so I hope the mail software can parse 
> this out with an "In-Reply-To" header.
>
> I work for Marvell, and Anton's work is approved for submittal. I wrote the 
> first version of the .md file. I'm certain we have a copyright assignment.in 
> place, as we've had employees in the past six months submit changes, for 
> example Steve Ellcey.
>
I can certainly see Marvell hold a copyright assignment from 2010 in
the copyright.list file.

For being clear, is that stating that Anton's work is also covered by
that copyright assignment ?

Ramana

> Joel Jones
>
> >Hi Anton,
> >
> >> -Original Message-
> >> From: Gcc-patches  On Behalf Of Anton
> >> Youdkevitch
> >> Sent: 20 April 2020 19:29
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: jo...@marvell.com
> >> Subject: [PATCH v2] aarch64: Add TX3 machine model
> >>
> >> Here is the patch introducing thunderxt311 maching model
> >> for the scheduler. A name for the new chip was added to the
> >> list of the names to be recognized as a valid parameter for mcpu
> >> and mtune flags. The TX2 cost model was reused for TX3.
> >>
> >> The previously used "cryptic" name for the command line
> >> parameter is replaced with the same "thunderxt311" name.
> >>
> >> Bootstrapped on AArch64.
> >
> >Thanks for the patch. I had meant to ask, do you have a copyright assignment 
> >in place?
> >We'd need one to accept a contribution of this size.
> >Thanks,
> >Kyrill
> >
> >>
> >> 2020-04-20 Anton Youdkevitch 
> >>
> >> * config/aarch64/aarch64-cores.def: Add the chip name.
> >> * config/aarch64/aarch64-tune.md: Regenerated.
> >> * gcc/config/aarch64/aarch64.c: Add the cost tables for the chip.
> >> * gcc/config/aarch64/thunderx3t11.md: New file: add the new
> >> machine model for the scheduler
> >> * gcc/config/aarch64/aarch64.md: Include the new model.
> >>
> >> ---
> >>  gcc/config/aarch64/aarch64-cores.def |   3 +
> >>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
> >>  gcc/config/aarch64/aarch64.c |  27 +
> >>  gcc/config/aarch64/aarch64.md|   1 +
> >>  gcc/config/aarch64/thunderx3t11.md   | 686 +++
> >>  5 files changed, 718 insertions(+), 1 deletion(-)
>