from:"Richard Earnshaw"

Re: [PATCH v3] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

2024-07-04 Thread Richard Earnshaw (lists)

On 04/07/2024 13:50, Siarhei Volkau wrote:
> чт, 4 июл. 2024 г. в 12:45, Richard Earnshaw (lists) 
> :
>>
>> On 20/06/2024 08:24, Siarhei Volkau wrote:
>>> If the address register is dead after load/store operation it looks
>>> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
>>> at least if optimizing for size.
>>>
>>> Changes v2 -> v3:
>>>  - switching to mixed approach (insn+peep2)
>>>  - keep memory attributes in peepholes
>>>  - handle stmia corner cases
>>>
>>> Changes v1 -> v2:
>>>  - switching to peephole2 approach
>>>  - added test case
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/arm/arm.cc (thumb_load_double_from_address): Emit ldmia
>>> when address reg rewritten by load.
>>>
>>> * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
>>> (peephole2 to rewrite DI/DF store): New.
>>>
>>>   * config/arm/iterators.md (DIDF): New.
>>>
>>> gcc/testsuite:
>>>
>>> * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
>>
>> I made a couple of cleanups and pushed this.  My testing of the cleanup also 
>> identified another corner case for the ldm instruciton: if the result of the 
>> load is not used (but it can't be eliminated because the address is marked 
>> volatile), then we could end up with
>> ldm r0!, {r0, r1}
>> Which of course is unpredictable.  So we need to test not only that r0 is 
>> dead but that it isn't written by the load either.
>>
> 
> Good catch.
> 
> Regarding thumb2, I investigated it a bit and found that it has little effort:
> 1. DI mode splitted into two insns during subreg1 pass, so it won't be
> easy to catch as it was for t1.

We need to enable the new pair fusion pass for thumb2 to address this.  But it 
will mostly only result in new ldrd/strd instructions I suspect.

> 2. DF on soft-float t2 targets should have its own rules to transform
> into LDM/STM.
> 3. LDM/STM are slower than LDRD/STRD on dual-issue cores. so,
> profitable only for -Os.

Speed tends to be micro-architecture (implementation) specific, but yes, it may 
well be the case that this will only be a win at -Os.

R.

Re: [PATCH v3] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

2024-07-04 Thread Richard Earnshaw (lists)

On 20/06/2024 08:24, Siarhei Volkau wrote:
> If the address register is dead after load/store operation it looks
> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> at least if optimizing for size.
> 
> Changes v2 -> v3:
>  - switching to mixed approach (insn+peep2)
>  - keep memory attributes in peepholes
>  - handle stmia corner cases
> 
> Changes v1 -> v2:
>  - switching to peephole2 approach
>  - added test case
> 
> gcc/ChangeLog:
> 
> * config/arm/arm.cc (thumb_load_double_from_address): Emit ldmia
> when address reg rewritten by load.
> 
> * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
> (peephole2 to rewrite DI/DF store): New.
> 
>   * config/arm/iterators.md (DIDF): New.
> 
> gcc/testsuite:
> 
> * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.

I made a couple of cleanups and pushed this.  My testing of the cleanup also 
identified another corner case for the ldm instruciton: if the result of the 
load is not used (but it can't be eliminated because the address is marked 
volatile), then we could end up with
ldm r0!, {r0, r1}
Which of course is unpredictable.  So we need to test not only that r0 is dead 
but that it isn't written by the load either.

Anyway, thanks for the patch.

R.

> 
> Signed-off-by: Siarhei Volkau 
> ---
>  gcc/config/arm/arm.cc |  8 ++--
>  gcc/config/arm/iterators.md   |  3 ++
>  gcc/config/arm/thumb1.md  | 37 ++-
>  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 
>  4 files changed, 58 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> 
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index b8c32db0a1d..2734ab3bce1 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -28368,15 +28368,13 @@ thumb_load_double_from_address (rtx *operands)
>switch (GET_CODE (addr))
>  {
>  case REG:
> -  operands[2] = adjust_address (operands[1], SImode, 4);
> -
> -  if (REGNO (operands[0]) == REGNO (addr))
> +  if (reg_overlap_mentioned_p (addr, operands[0]))
>   {
> -   output_asm_insn ("ldr\t%H0, %2", operands);
> -   output_asm_insn ("ldr\t%0, %1", operands);
> +   output_asm_insn ("ldmia\t%m1, {%0, %H0}", operands);
>   }
>else
>   {
> +   operands[2] = adjust_address (operands[1], SImode, 4);
> output_asm_insn ("ldr\t%0, %1", operands);
> output_asm_insn ("ldr\t%H0, %2", operands);
>   }
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 8d066fcf05d..09046bff83b 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
>  ;; A list of the 32bit and 64bit integer modes
>  (define_mode_iterator SIDI [SI DI])
>  
> +;; A list of the 64bit modes for thumb1.
> +(define_mode_iterator DIDF [DI DF])
> +
>  ;; A list of atomic compare and swap success return modes
>  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
>  
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index d7074b43f60..c9a4201fcc9 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -2055,4 +2055,39 @@ (define_insn "thumb1_stack_protect_test_insn"
> (set_attr "conds" "clob")
> (set_attr "type" "multiple")]
>  )
> -
> +
> +;; match patterns usable by ldmia/stmia
> +(define_peephole2
> +  [(set (match_operand:DIDF 0 "low_register_operand" "")
> + (match_operand:DIDF 1 "memory_operand" ""))]
> +  "TARGET_THUMB1
> +   && REG_P (XEXP (operands[1], 0))
> +   && REGNO_REG_CLASS (REGNO (XEXP (operands[1], 0))) == LO_REGS
> +   && peep2_reg_dead_p (1, XEXP (operands[1], 0))"
> +  [(set (match_dup 0)
> + (match_dup 1))]
> +  "
> +  {
> +operands[1] = change_address (operands[1], VOIDmode,
> +   gen_rtx_POST_INC (SImode,
> + XEXP (operands[1], 0)));
> +  }"
> +)
> +
> +(define_peephole2
> +  [(set (match_operand:DIDF 1 "memory_operand" "")
> + (match_operand:DIDF 0 "low_register_operand" ""))]
> +  "TARGET_THUMB1
> +   && REG_P (XEXP (operands[1], 0))
> +   && REGNO_REG_CLASS (REGNO (XEXP (operands[1], 0))) == LO_REGS
> +   && peep2_reg_dead_p (1, XEXP (operands[1], 0))
> +   && REGNO (XEXP (operands[1], 0)) != (REGNO (operands[0]) + 1)"
> +  [(set (match_dup 1)
> + (match_dup 0))]
> +  "
> +  {
> +operands[1] = change_address (operands[1], VOIDmode,
> +   gen_rtx_POST_INC (SImode,
> + XEXP (operands[1], 0)));
> +  }"
> +)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c 
> b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> new file mode 100644
> index 000..167fa9ec876
> --- /dev/null
> +++

Re: [PATCH v3] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-28 Thread Richard Earnshaw (lists)

On 27/06/2024 17:16, Wilco Dijkstra wrote:
> Hi Richard,
> 
>> Doing just this will mean that the register allocator will have to undo a 
>> pre/post memory operand that was accepted by the predicate (memory_operand). 
>>  I think we really need a tighter predicate (lets call it noautoinc_mem_op) 
>> here to avoid that.  Note that the existing uses of Uw also had another 
>> alternative that did permit 'm', so this wasn't previously practical, but 
>> they had alternative ways of being reloaded.
>>
>> No, sorry that won't work; there's another 'm' alternative here as well.
>> The correct fix is to add alternatives for T1, I think, similar to the one 
>> in thumb1_movsi_insn.
>>
>> Also, by observation I think there's a similar problem in the load 
>> operations.
> 
> Just using 'Uw' works fine, but restricting the memory operand too is better 
> indeed.
> I added 'restricted_memory_operand' that only disallows Thumb-1 postincrement.
> 
> There were also a few more cases in unaligned accesses where 'm' was used 
> incorrectly when
> emitting Thumb-1 LDR/STR alternatives (and where no LDM/STMis allowed), so 
> those also use
> 'Uw' and 'restricted_memory_operand'.
> 
> Long term it seems like a better idea is to remove support this odd 
> post-increment
> in general memory operand and only emit it from a peephole pass.
> 
> Cheers,
> Wilco
> 
> 
> v3: Use 'Uw' in a few more cases. Add 'restricted_memory_operand'.
> 
> A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
> printed as LDR/STR with writeback in unified syntax, resulting in strange
> assembler errors if writeback is selected.  To work around this, use the 'Uw'
> constraint that blocks writeback.  Also use a new 'restricted_memory_operand'
> which is a general memory operand that disallows writeback in Thumb-1.
> A few other patterns were using 'm' for Thumb-1 in a similar way, update these
> to also use 'restricted_memory_operand' and 'Uw'.
> 
> Passes bootstrap & regress, OK for commit (and backport to GCC14.2)?

I'm not a major fan of the name restricted_memory_operand as it doesn't 
describe which restriction is being applied and something like 
t1_restricted_memory_operand would not be any clearer.  Perhaps 
mem_and_no_t1_wback_op would be better?

OK with that change.

R.

> 
> gcc:
> PR target/115188
> * config/arm/arm.md (unaligned_loadsi): Use 'Uw' constraint and
> 'restricted_memory_operand'.
> (unaligned_loadhiu): Likewise.
> (unaligned_storesi): Likewise.
> (unaligned_storehi): Likewise.
> * config/arm/predicates.md (restricted_memory_operand): Add new 
> predicate.
> * config/arm/sync.md (arm_atomic_load): Use 'Uw' constraint.
> (arm_atomic_store): Likewise.
> 
> gcc/testsuite:
> PR target/115188
> * gcc.target/arm/pr115188.c: Add new test.
> 
> ---
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> f47e036a8034ed16c61bbd753c7a7cd3efb1ecbd..c962a9341779e4da38f4e1afb26d4a364fc5aee4
>  100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -5011,7 +5011,7 @@
>  
>  (define_insn "unaligned_loadsi"
>[(set (match_operand:SI 0 "s_register_operand" "=l,l,r")
> - (unspec:SI [(match_operand:SI 1 "memory_operand" "m,Uw,m")]
> + (unspec:SI [(match_operand:SI 1 "restricted_memory_operand" "Uw,Uw,m")]
>  UNSPEC_UNALIGNED_LOAD))]
>"unaligned_access"
>"@
> @@ -5041,7 +5041,7 @@
>  (define_insn "unaligned_loadhiu"
>[(set (match_operand:SI 0 "s_register_operand" "=l,l,r")
>   (zero_extend:SI
> -   (unspec:HI [(match_operand:HI 1 "memory_operand" "m,Uw,m")]
> +   (unspec:HI [(match_operand:HI 1 "restricted_memory_operand" 
> "Uw,Uw,m")]
>UNSPEC_UNALIGNED_LOAD)))]
>"unaligned_access"
>"@
> @@ -5066,7 +5066,7 @@
> (set_attr "type" "store_8")])
>  
>  (define_insn "unaligned_storesi"
> -  [(set (match_operand:SI 0 "memory_operand" "=m,Uw,m")
> +  [(set (match_operand:SI 0 "restricted_memory_operand" "=Uw,Uw,m")
>   (unspec:SI [(match_operand:SI 1 "s_register_operand" "l,l,r")]
>  UNSPEC_UNALIGNED_STORE))]
>"unaligned_access"
> @@ -5081,7 +5081,7 @@
> (set_attr "type" "store_4")])
>  
>  (define_insn "unaligned_storehi"
> -  [(set (match_operand:HI 0 "memory_operand" "=m,Uw,m")
> +  [(set (match_operand:HI 0 "restricted_memory_operand" "=Uw,Uw,m")
>   (unspec:HI [(match_operand:HI 1 "s_register_operand" "l,l,r")]
>  UNSPEC_UNALIGNED_STORE))]
>"unaligned_access"
> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
> index 
> 4994c0c57d6431117c16f7a05e800821dee93408..3dfe381c098c06517dca6026f8dafe87b46135ae
>  100644
> --- a/gcc/config/arm/predicates.md
> +++ b/gcc/config/arm/predicates.md
> @@ -907,3 +907,8 @@
>  ;; A special predicate that doesn't match a particular mode.
>  (define_special_predicate "arm_any_register_operand"
>(match_code

Re: [PATCH] arm: make arm_predict_doloop_p reject loops with calls

2024-06-25 Thread Richard Earnshaw (lists)

On 25/06/2024 12:53, Andre Vieira (lists) wrote:
> Hi,
> 
> With the introduction of low overhead loops in 
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3dfc28dbbd21b1d708aa40064380ef4c42c994d7
>  we defined arm_predict_doloop_p, this is meant to be a low-weight check to 
> rule out loops we are not considering for doloop optimization and it is used 
> by other passes to prevent optimizations that may hurt the doloop 
> optimization later on. The reason these are meant to be lightweight is 
> because it's used by pre-RTL optimizations, meaning we can't do the same 
> checks that doloop does.
> 
> After the definition of arm_predict_doloop_p, when testing for 
> armv8.1-m.main, tree-ssa/ivopts-3.c failed the scan-dump check as the dump 
> now matched an extra '!= 0' introduced by:
> Doloop cmp iv use: if (ivtmp_1 != 0)
> Predict loop 1 can perform doloop optimization later.
> 
> where previously we had:
> Predict doloop failure due to target specific checks.
> 
> and after this patch:
> Predict doloop failure due to call in loop.
> Predict doloop failure due to target specific checks.
> 
> Added a copy of the original tree-ssa/ivopts-3.c as a target specifc test to 
> check for the new dump message.
> 
> Ran a regression test for arm-none-eabi with 
> -march=armv8.1-m.main+mve/-mfpu=auto/-mthumb/-mfloat-abi=hard.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
> 
>     * confir/arm/arm.cc (arm_predict_doloop_p): Reject loops with 
> function calls that are not builtins.
> 
> gcc/testsuite/ChangeLog:
> 
>     * gcc.target/arm/mve/ivopts-3.c: New test.

OK.

R.

Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

2024-06-24 Thread Richard Earnshaw (lists)

On 24/06/2024 12:35, Alexandre Oliva wrote:
> On Jun 21, 2024, Christophe Lyon  wrote:
> 
>>> How about mentioning Christophe's simplification in the commit log?
> 
>> For the avoidance of doubt: it's OK for me (but you don't need to
>> mention my name in fact ;-)
> 
> Needing or not, I added it ;-)
> 
>>>> be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
>>>> before the fix).
> 
>>> This is OK, but you might wish to revisit this statement before
>>> committing.
> 
> Oh, right, sorry, I messed it up.  uint16_t was what I should have put
> in there.  int16_t would have overflown and invoked undefined behavior
> to begin with, and I see it misled you far down the wrong path.
> 
>>> I think the original bug was that we were losing the cast to short
> 
> The problem was that the shift count saturated at 15.  AFAIK sign
> extension was not relevant.  Hopefully the rewritten opening paragraph
> below makes that clearer.  I will put it in later this week barring
> objections or further suggestions of improvement.  Thanks,

A signed shift right on a 16-bit vector element by 15 would still yield -1; but 
...

> 
> 
> [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
> 
> The test was too optimistic, alas.  We used to vectorize shifts by
> clamping the shift counts below the bit width of the types (e.g. at 15
> for 16-bit vector elements), but (uint16_t)32768 >> (uint16_t)16 is
> well defined (because of promotion to 32-bit int) and must yield 0,
> not 1 (as before the fix).

That make more sense now.  Thanks.

> 
> Unfortunately, in the gimple model of vector units, such large shift
> counts wouldn't be well-defined, so we won't vectorize such shifts any
> more, unless we can tell they're in range or undefined.
> 
> So the test that expected the vectorization we no longer performed
> needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> suggested annotating the test with the expected ranges so as to enable
> the optimization, and Christophe Lyon suggested a further
> simplification.
> 
> 
> Co-Authored-By: Richard Earnshaw 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR tree-optimization/113281
>   * gcc.target/arm/simd/mve-vshr.c: Add expected ranges.

I think this is OK now.

R.

> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index 8c7adef9ed8f1..03078de49c65e 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -9,6 +9,8 @@
>void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
>  int i;   \
>  for (i=0; i +  if ((unsigned)b[i] >= (unsigned)(BITS))
> \
> + __builtin_unreachable();\
>dest[i] = a[i] OP b[i];
> \
>  }
> \
>  }
> 
>

Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

2024-06-21 Thread Richard Earnshaw (lists)

On 21/06/2024 08:57, Alexandre Oliva wrote:
> On Jun 20, 2024, Christophe Lyon  wrote:
> 
>> Maybe using
>> if ((unsigned)b[i] >= BITS) \
>> would be clearer?
> 
> Heh.  Why make it simpler if we can make it unreadable, right? :-D
> 
> Thanks, here's another version I've just retested on x-arm-eabi.  Ok?
> 
> I'm not sure how to credit your suggestion.  It's not like you pretty
> much wrote the entire patch, as in Richard's case, but it's still a
> sizable chunk of this two-liner.  Any preferences?

How about mentioning Christophe's simplification in the commit log?
> 
> 
> The test was too optimistic, alas.  We used to vectorize shifts
> involving 8-bit and 16-bit integral types by clamping the shift count
> at the highest in-range shift count, but that was not correct: such
> narrow shifts expect integral promotion, so larger shift counts should
> be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
> before the fix).

This is OK, but you might wish to revisit this statement before committing.  I 
think the above is a mis-summary of the original bug report which had a test to 
pick between 0 and 1 as the result of a shift operation.

If I've understood what's going on here correctly, then we have 

(int16_t)32768 >> (int16_t) 16

but shift is always done at int precision, so this is (due to default 
promotions)

(int)(int16_t)32768 >> 16  // size/type of the shift amount does not matter.

which then simplifies to

-32768 >> 16;  // 0x8000 >> 16

= -1;

I think the original bug was that we were losing the cast to short (and hence 
the sign extension of the intermediate value), so effectively we simplified 
this to 

32768 >> 16; // 0x8000 >> 16

= 0;

And the other part of the observation was that it had to be done this way (and 
couldn't be narrowed for vectorization) because 16 is larger than the maximum 
shift for a short (actually you say that just below).

R.

> 
> Unfortunately, in the gimple model of vector units, such large shift
> counts wouldn't be well-defined, so we won't vectorize such shifts any
> more, unless we can tell they're in range or undefined.
> 
> So the test that expected the incorrect clamping we no longer perform
> needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> suggested annotating the test with the expected ranges so as to enable
> the optimization.
> 
> 
> Co-Authored-By: Richard Earnshaw 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR tree-optimization/113281
>   * gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index 8c7adef9ed8f1..03078de49c65e 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -9,6 +9,8 @@
>void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
>  int i;   \
>  for (i=0; i +  if ((unsigned)b[i] >= (unsigned)(BITS))
> \
> + __builtin_unreachable();\
>dest[i] = a[i] OP b[i];
> \
>  }
> \
>  }
> 
>

Re: [PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

2024-06-19 Thread Richard Earnshaw (lists)

On 19/06/2024 16:11, Siarhei Volkau wrote:
> ср, 19 июн. 2024 г. в 15:19, Richard Earnshaw (lists)
> :
>>
>> On 18/06/2024 19:14, Siarhei Volkau wrote:
>>> If the address register is dead after load/store operation it looks
>>> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
>>> at least if optimizing for size.
>>>
>>> Changes v1 -> v2:
>>>  - switching to peephole2 approach
>>>  - added test case
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
>>> (peephole2 to rewrite DI/DF store): New.
>>> (thumb1_movdi_insn): Handle overlapped regs ldmia case.
>>> (thumb_movdf_insn): Likewise.
>>>
>>>   * config/arm/iterators.md (DIDF): New.
>>>
>>> gcc/testsuite:
>>>
>>> * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
>>>
>>> Signed-off-by: Siarhei Volkau 
>>> ---
>>>  gcc/config/arm/iterators.md   |  3 +++
>>>  gcc/config/arm/thumb1.md  | 27 ++-
>>>  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++
>>>  3 files changed, 45 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
>>>
>>> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
>>> index 8d066fcf05d..09046bff83b 100644
>>> --- a/gcc/config/arm/iterators.md
>>> +++ b/gcc/config/arm/iterators.md
>>> @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
>>>  ;; A list of the 32bit and 64bit integer modes
>>>  (define_mode_iterator SIDI [SI DI])
>>>
>>> +;; A list of the 64bit modes for thumb1.
>>> +(define_mode_iterator DIDF [DI DF])
>>> +
>>>  ;; A list of atomic compare and swap success return modes
>>>  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
>>>
>>> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
>>> index d7074b43f60..ed4b706773a 100644
>>> --- a/gcc/config/arm/thumb1.md
>>> +++ b/gcc/config/arm/thumb1.md
>>> @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
>>>gcc_assert (TARGET_HAVE_MOVT);
>>>return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
>>>  case 4:
>>> +  if (reg_overlap_mentioned_p (operands[0], operands[1]))
>>> + return \"ldmia\\t%m1, {%0, %H0}\";
>>
>> See below for why I don't think this is a case we need to consider here.
>>
>>>return \"ldmia\\t%1, {%0, %H0}\";
>>>  case 5:
>>>return \"stmia\\t%0, {%1, %H1}\";
>>> @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
>>>   return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
>>>return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
>>>  case 1:
>>> +  if (reg_overlap_mentioned_p (operands[0], operands[1]))
>>> + return \"ldmia\\t%m1, {%0, %H0}\";
>>>return \"ldmia\\t%1, {%0, %H0}\";
>>>  case 2:
>>>return \"stmia\\t%0, {%1, %H1}\";
>>> @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
>>> (set_attr "conds" "clob")
>>> (set_attr "type" "multiple")]
>>>  )
>>> -
>>> +
>>> +;; match patterns usable by ldmia/stmia
>>> +(define_peephole2
>>> +  [(set (match_operand:DIDF 0 "low_register_operand" "")
>>> + (mem:DIDF (match_operand:SI 1 "low_register_operand")))]
>>> +  "TARGET_THUMB1
>>> +   && (peep2_reg_dead_p (1, operands[1])
>>> +   || REGNO (operands[0]) + 1 == REGNO (operands[1]))"
>>
>> I don't understand this second condition (partial overlap of the base 
>> address with the value loaded), what are you guarding against here?  The 
>> instruction specification says that there is no writeback if the base 
>> register has any overlap with any of the loaded registers, so really we 
>> should reject the peephole if that's true (we'd have invalid RTL as well as 
>> there would end up being two writes to the same register).
>>
> 
> The first condition here is for ldmia cases with writeback, the second
> condition is for cases without writeback. I falsely thought

Re: [PATCH] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

2024-06-19 Thread Richard Earnshaw (lists)

On 13/06/2024 10:23, Alexandre Oliva wrote:
> 
> The test was too optimistic, alas.  We used to vectorize shifts
> involving 8-bit and 16-bit integral types by clamping the shift count
> at the highest in-range shift count, but that was not correct: such
> narrow shifts expect integral promotion, so larger shift counts should
> be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
> before the fix).
> 
> Unfortunately, in the gimple model of vector units, such large shift
> counts wouldn't be well-defined, so we won't vectorize such shifts any
> more, unless we can tell they're in range or undefined.
> 
> So the test that expected the incorrect clamping we no longer perform
> needs to be adjusted.
> 
> Tested on x86_64-linux-gnu-x-arm-eabi.  Also tested with gcc-13
> x-arm-vx7r2.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR tree-optimization/113281
>   * gcc.target/arm/simd/mve-vshr.c: Adjust expectations.
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index 8c7adef9ed8f1..8253427db6ef6 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -56,9 +56,9 @@ FUNC_IMM(u, uint, 8, 16, >>, vshrimm)
>  /* MVE has only 128-bit vectors, so we can vectorize only half of the
> functions above.  */
>  /* Vector right shifts use vneg and left shifts.  */
> -/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } 
> */
> -/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 3 } } 
> */
> -/* { dg-final { scan-assembler-times {vneg.s[0-9]+\tq[0-9]+, q[0-9]+} 6 } } 
> */
> +/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 1 } } 
> */
> +/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 1 } } 
> */
> +/* { dg-final { scan-assembler-times {vneg.s[0-9]+\tq[0-9]+, q[0-9]+} 2 } } 
> */
>  
>  
>  /* Shift by immediate.  */
> 
> 

We know the range of the LHS of the shift operand (it comes from a T* array, so 
that isn't the issue.  The problem comes from the RHS of the shift where the 
range can legitimately come from 0..31 (since the shift is conceptually 
performed at int precision).  That can't be handled correctly as gimple only 
supports shifts on the number of bits in the type, itself.  But we can inform 
the compiler that it needn't care about the larger range with a 
__builtin_unreachable().

It looks like adding

  if ((unsigned)b[i] >= 8*sizeof (TYPE##BITS##_t)) \
__builtin_unreachable();   \
 
to the shift-by-variable case is enough to tell the vectorizer that it's safe 
to vectorize this code without needing to handle any additional clamping.

Since this test is primarily about testing the MVE vector operations, I think 
I'd rather go with a solution along those lines rather than nobbling the test.

Thoughts?

R.

Re: [PATCH v2] Arm: Fix ldrd offset range [PR115153]

2024-06-19 Thread Richard Earnshaw (lists)

On 11/06/2024 17:42, Wilco Dijkstra wrote:
> v2: use a new arm_arch_v7ve_neon, fix use of DImode in output_move_neon
> 
> The valid offset range of LDRD in arm_legitimate_index_p is increased to
> -1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode.
> Fix this by moving the LDRD check earlier.
> 
> Passes bootstrap & regress, OK for commit?
> 
> gcc:
> PR target/115153
> * config/arm/arm.cc (arm_legitimate_index_p): Move LDRD case before 
> NEON.
> (thumb2_legitimate_index_p): Update comments.
> (output_move_neon): Use DFmode for vldr/vstr.
> * lib/target-supports.exp: Add arm_arch_v7ve_neon target support.
> 
> gcc/testsuite:
> PR target/11515> * gcc.target/arm/pr115153.c: Add new test.

The Linaro CI is reporting an ICE while building libgfortran with this change.

# 00:14:58 
/home/tcwg-build/workspace/tcwg_gnu_3/abe/snapshots/gcc.git~master/libgfortran/generated/matmul_i1.c:3006:1:
 internal compiler error: in change_address_1, at emit-rtl.cc:2299
# 00:14:58 make[3]: *** [Makefile:4262: generated/matmul_i1.lo] Error 1
# 00:14:58 make[2]: *** [Makefile:1861: all] Error 2
# 00:14:58 make[1]: *** [Makefile:15767: all-target-libgfortran] Error 2
# 00:14:58 make: *** [Makefile:1065: all] Error 2

Could you investigate please?

R.

> 
> ---
> 
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 
> ea0c963a4d67ecd70e1571624e84dfe46d757df9..7dec0254f5a953050c9c52aa297fad7f3dfb6c74
>  100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -8852,6 +8852,28 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
> RTX_CODE outer,
>   && INTVAL (index) > -1024
>   && (INTVAL (index) & 3) == 0);
>  
> +  if (arm_address_register_rtx_p (index, strict_p)
> +  && (GET_MODE_SIZE (mode) <= 4))
> +return 1;
> +
> +  /* This handles DFmode only if !TARGET_HARD_FLOAT.  */
> +  if (mode == DImode || mode == DFmode)
> +{
> +  if (code == CONST_INT)
> + {
> +   HOST_WIDE_INT val = INTVAL (index);
> +
> +   /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
> +  If vldr is selected it uses arm_coproc_mem_operand.  */
> +   if (TARGET_LDRD)
> + return val > -256 && val < 256;
> +   else
> + return val > -4096 && val < 4092;
> + }
> +
> +  return TARGET_LDRD && arm_address_register_rtx_p (index, strict_p);
> +}
> +
>/* For quad modes, we restrict the constant offset to be slightly less
>   than what the instruction format permits.  We do this because for
>   quad mode moves, we will actually decompose them into two separate
> @@ -8864,7 +8886,7 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
> RTX_CODE outer,
>   && (INTVAL (index) & 3) == 0);
>  
>/* We have no such constraint on double mode offsets, so we permit the
> - full range of the instruction format.  */
> + full range of the instruction format.  Note DImode is included here.  */
>if (TARGET_NEON && VALID_NEON_DREG_MODE (mode))
>  return (code == CONST_INT
>   && INTVAL (index) < 1024
> @@ -8877,27 +8899,6 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
> RTX_CODE outer,
>   && INTVAL (index) > -1024
>   && (INTVAL (index) & 3) == 0);
>  
> -  if (arm_address_register_rtx_p (index, strict_p)
> -  && (GET_MODE_SIZE (mode) <= 4))
> -return 1;
> -
> -  if (mode == DImode || mode == DFmode)
> -{
> -  if (code == CONST_INT)
> - {
> -   HOST_WIDE_INT val = INTVAL (index);
> -
> -   /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
> -  If vldr is selected it uses arm_coproc_mem_operand.  */
> -   if (TARGET_LDRD)
> - return val > -256 && val < 256;
> -   else
> - return val > -4096 && val < 4092;
> - }
> -
> -  return TARGET_LDRD && arm_address_register_rtx_p (index, strict_p);
> -}
> -
>if (GET_MODE_SIZE (mode) <= 4
>&& ! (arm_arch4
>   && (mode == HImode
> @@ -9000,7 +9001,7 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
> index, int strict_p)
>   && (INTVAL (index) & 3) == 0);
>  
>/* We have no such constraint on double mode offsets, so we permit the
> - full range of the instruction format.  */
> + full range of the instruction format.  Note DImode is included here.  */
>if (TARGET_NEON && VALID_NEON_DREG_MODE (mode))
>  return (code == CONST_INT
>   && INTVAL (index) < 1024
> @@ -9011,6 +9012,7 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
> index, int strict_p)
>&& (GET_MODE_SIZE (mode) <= 4))
>  return 1;
>  
> +  /* This handles DImode if !TARGET_NEON, and DFmode if !TARGET_VFP_BASE.  */
>if (mode == DImode || mode == DFmode)
>  {
>if (code == CONST_INT)
> @@ -20854,7 +20856,7 @@ output_move_neon (rtx *operands)
>   /* We're only using DImode here because it's a convenient
>  size.  */
>

Re: [PATCH 0/2] arm, doloop: Add support for MVE Tail-Predicated Low Overhead Loops

2024-06-19 Thread Richard Earnshaw (lists)

On 23/05/2024 15:37, Andre Vieira wrote:
> 
> Hi,
> 
>   We held these two patches back in stage 4 because they touched 
> target-agnostic code, though I am quite confident they will not affect other 
> targets. Given stage one has reopened, I am reposting them, I rebased them 
> but they seem to apply cleanly on trunk.
>   No changes from previously reviewed patches.
> 
>   OK for trunk?
> 
> Andre Vieira (2):
>   doloop: Add support for predicated vectorized loops
>   arm: Add support for MVE Tail-Predicated Low Overhead Loops
> 
>  gcc/config/arm/arm-protos.h   |4 +-
>  gcc/config/arm/arm.cc | 1249 -
>  gcc/config/arm/arm.opt|3 +
>  gcc/config/arm/iterators.md   |   15 +
>  gcc/config/arm/mve.md |   50 +
>  gcc/config/arm/thumb2.md  |  138 +-
>  gcc/config/arm/types.md   |6 +-
>  gcc/config/arm/unspecs.md |   14 +-
>  gcc/df-core.cc|   15 +
>  gcc/df.h  |1 +
>  gcc/loop-doloop.cc|  164 ++-
>  gcc/testsuite/gcc.target/arm/lob.h|  128 +-
>  gcc/testsuite/gcc.target/arm/lob1.c   |   23 +-
>  gcc/testsuite/gcc.target/arm/lob6.c   |8 +-
>  .../gcc.target/arm/mve/dlstp-compile-asm-1.c  |  146 ++
>  .../gcc.target/arm/mve/dlstp-compile-asm-2.c  |  749 ++
>  .../gcc.target/arm/mve/dlstp-compile-asm-3.c  |   46 +
>  .../gcc.target/arm/mve/dlstp-int16x8-run.c|   44 +
>  .../gcc.target/arm/mve/dlstp-int16x8.c|   31 +
>  .../gcc.target/arm/mve/dlstp-int32x4-run.c|   45 +
>  .../gcc.target/arm/mve/dlstp-int32x4.c|   31 +
>  .../gcc.target/arm/mve/dlstp-int64x2-run.c|   48 +
>  .../gcc.target/arm/mve/dlstp-int64x2.c|   28 +
>  .../gcc.target/arm/mve/dlstp-int8x16-run.c|   44 +
>  .../gcc.target/arm/mve/dlstp-int8x16.c|   32 +
>  .../gcc.target/arm/mve/dlstp-invalid-asm.c|  521 +++
>  26 files changed, 3434 insertions(+), 149 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-int16x8-run.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-int16x8.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-int32x4-run.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-int32x4.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-int64x2-run.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-int64x2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16-run.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
> 

These are OK.  Please watch out for any fallout on other architectures.

R.

Re: [PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

2024-06-19 Thread Richard Earnshaw (lists)

On 18/06/2024 19:14, Siarhei Volkau wrote:
> If the address register is dead after load/store operation it looks
> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> at least if optimizing for size.
> 
> Changes v1 -> v2:
>  - switching to peephole2 approach
>  - added test case
> 
> gcc/ChangeLog:
> 
> * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
> (peephole2 to rewrite DI/DF store): New.
> (thumb1_movdi_insn): Handle overlapped regs ldmia case.
> (thumb_movdf_insn): Likewise.
> 
>   * config/arm/iterators.md (DIDF): New.
> 
> gcc/testsuite:
> 
> * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
> 
> Signed-off-by: Siarhei Volkau 
> ---
>  gcc/config/arm/iterators.md   |  3 +++
>  gcc/config/arm/thumb1.md  | 27 ++-
>  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++
>  3 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 8d066fcf05d..09046bff83b 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
>  ;; A list of the 32bit and 64bit integer modes
>  (define_mode_iterator SIDI [SI DI])
>  
> +;; A list of the 64bit modes for thumb1.
> +(define_mode_iterator DIDF [DI DF])
> +
>  ;; A list of atomic compare and swap success return modes
>  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
>  
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index d7074b43f60..ed4b706773a 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
>gcc_assert (TARGET_HAVE_MOVT);
>return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
>  case 4:
> +  if (reg_overlap_mentioned_p (operands[0], operands[1]))
> + return \"ldmia\\t%m1, {%0, %H0}\";

See below for why I don't think this is a case we need to consider here.

>return \"ldmia\\t%1, {%0, %H0}\";
>  case 5:
>return \"stmia\\t%0, {%1, %H1}\";
> @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
>   return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
>return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
>  case 1:
> +  if (reg_overlap_mentioned_p (operands[0], operands[1]))
> + return \"ldmia\\t%m1, {%0, %H0}\";
>return \"ldmia\\t%1, {%0, %H0}\";
>  case 2:
>return \"stmia\\t%0, {%1, %H1}\";
> @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
> (set_attr "conds" "clob")
> (set_attr "type" "multiple")]
>  )
> -
> +
> +;; match patterns usable by ldmia/stmia
> +(define_peephole2
> +  [(set (match_operand:DIDF 0 "low_register_operand" "")
> + (mem:DIDF (match_operand:SI 1 "low_register_operand")))]
> +  "TARGET_THUMB1
> +   && (peep2_reg_dead_p (1, operands[1])
> +   || REGNO (operands[0]) + 1 == REGNO (operands[1]))"

I don't understand this second condition (partial overlap of the base address 
with the value loaded), what are you guarding against here?  The instruction 
specification says that there is no writeback if the base register has any 
overlap with any of the loaded registers, so really we should reject the 
peephole if that's true (we'd have invalid RTL as well as there would end up 
being two writes to the same register).

> +  [(set (match_dup 0)
> + (mem:DIDF (post_inc:SI (match_dup 1]
> +  ""

This is not enough, unfortunately.  MEM() objects carry attributes about the 
memory accessed (alias sets, known alignment, etc) and these will not be 
propagated correctly if you rewrite the pattern this way.  The correct solution 
is to match the entire mem as operand1, then use change_address to rewrite 
that.  Something like:

 operands[1] = change_address (operands[1], VOIDmode,
   gen_rtx_POST_INC (SImode,
 XEXP (operands[1], 0)));

> +)
> +
> +(define_peephole2
> +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
> + (match_operand:DIDF 0 "low_register_operand" ""))]
> +  "TARGET_THUMB1
> +   && peep2_reg_dead_p (1, operands[1])"
> +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
> + (match_dup 0))]

There's a similar address rewriting issue here as well, but we also have to 
guard against the (unlikely) case where the base register is part of the value 
stored if it isn't the lowest numbered register in the list.  In that case the 
value stored would be undefined.  That is:

  stmia r0!, {r0, r1}

is ok, but

  stmia r1!, {r0, r1}

is not.

> +  ""
> +)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c 
> b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> new file mode 100644
> index

Re: [RFC PATCH] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

2024-06-17 Thread Richard Earnshaw (lists)

Hi Siarahei,

On 16/06/2024 09:51, Siarhei Volkau wrote:
> If the address register is dead after load/store operation it looks
> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> at least if optimizing for size.
> 
> E.g.
>  ldr r0, [r3, #0]
>  ldr r1, [r3, #4]  @ r3 is dead after
> will be replaced by
>  ldmia r3!, {r0, r1}
> 
> also for reused reg is legal to:
>  ldr r2, [r3, #0]
>  ldr r3, [r3, #4] @ r3 reused
> will be replaced by
>  ldmia r3, {r2, r3}
> 
> However, I know little about other thumb CPUs except Cortex M0/M0+.
> 1. Is there any drawbacks if optimizing speed?
> 2. Might it be profitable for thumb2?

I like the idea behind this patch, but I think I'd try first doing this as a 
peephole2 rule to rewrite the address in this case.  That has the additional 
advantage that we then estimate the size of the instruction more accurately.  

I think it would then be easy to extend this to thumb2 as well if it looks like 
a win (perhaps only for -Os in the thumb2 case).


> 
> Regarding code size with the patch gives for v6-m/nofp:
>libgcc:  -52 bytes / -0.10%
> Newlib's libc:  -68 bytes / -0.03%
>  libm:  -96 bytes / -0.10%
> libstdc++: -140 bytes / -0.02%
> 
> Also I have questions regarding testing the patch.
> It's obscure how to do it properly, for now I compile
> for arm-none-eabi target and make check seems failing
> on any compilable test due to missing symbols from libnosys.
> I guess that arm-gnu-elf is the correct triple but it still
> advisable for proper commands to make & run the testsuite.

For testing, I'd start with something like 
gcc/testsuite/gcc.target/arm/thumb-andsi.c as a template and adapt that for 
your specific case.  Matching something like "ldmia\tr[0-7]!," should be enough.

R.

> 
> Signed-off-by: Siarhei Volkau 
> ---
>  gcc/config/arm/arm-protos.h |  2 +-
>  gcc/config/arm/arm.cc   |  7 ++-
>  gcc/config/arm/thumb1.md| 10 --
>  3 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 2cd560c9925..548bfbaccdc 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -254,7 +254,7 @@ extern int thumb_shiftable_const (unsigned HOST_WIDE_INT);
>  extern enum arm_cond_code maybe_get_arm_condition_code (rtx);
>  extern void thumb1_final_prescan_insn (rtx_insn *);
>  extern void thumb2_final_prescan_insn (rtx_insn *);
> -extern const char *thumb_load_double_from_address (rtx *);
> +extern const char *thumb_load_double_from_address (rtx *, rtx_insn *);
>  extern const char *thumb_output_move_mem_multiple (int, rtx *);
>  extern const char *thumb_call_via_reg (rtx);
>  extern void thumb_expand_cpymemqi (rtx *);
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index b8c32db0a1d..73c2478ed77 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -28350,7 +28350,7 @@ thumb1_output_interwork (void)
> a computed memory address.  The computed address may involve a
> register which is overwritten by the load.  */
>  const char *
> -thumb_load_double_from_address (rtx *operands)
> +thumb_load_double_from_address (rtx *operands, rtx_insn *insn)
>  {
>rtx addr;
>rtx base;
> @@ -28368,6 +28368,11 @@ thumb_load_double_from_address (rtx *operands)
>switch (GET_CODE (addr))
>  {
>  case REG:
> +  if (find_reg_note (insn, REG_DEAD, addr))
> +return "ldmia\t%m1!, {%0, %H0}";
> +  else if (REGNO (addr) == REGNO (operands[0]) + 1)
> +return "ldmia\t%m1, {%0, %H0}";
> +
>operands[2] = adjust_address (operands[1], SImode, 4);
>  
>if (REGNO (operands[0]) == REGNO (addr))
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index d7074b43f60..8da6887b560 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -637,8 +637,11 @@
>  case 5:
>return \"stmia\\t%0, {%1, %H1}\";
>  case 6:
> -  return thumb_load_double_from_address (operands);
> +  return thumb_load_double_from_address (operands, insn);
>  case 7:
> +  if (MEM_P (operands[0]) && REG_P (XEXP (operands[0], 0))
> +  && find_reg_note (insn, REG_DEAD, XEXP (operands[0], 0)))
> +return \"stmia\\t%m0!, {%1, %H1}\";
>operands[2] = gen_rtx_MEM (SImode,
>plus_constant (Pmode, XEXP (operands[0], 0), 4));
>output_asm_insn (\"str\\t%1, %0\;str\\t%H1, %2\", operands);
> @@ -970,8 +973,11 @@
>  case 2:
>return \"stmia\\t%0, {%1, %H1}\";
>  case 3:
> -  return thumb_load_double_from_address (operands);
> +  return thumb_load_double_from_address (operands, insn);
>  case 4:
> +  if (MEM_P (operands[0]) && REG_P (XEXP (operands[0], 0))
> +  && find_reg_note (insn, REG_DEAD, XEXP (operands[0], 0)))
> +return \"stmia\\t%m0!, {%1, %H1}\";
>operands[2] = gen_rtx_MEM (SImode,
>

Re: arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]

2024-06-12 Thread Richard Earnshaw (lists)

On 12/06/2024 09:53, Andre Vieira (lists) wrote:
> 
> 
> On 06/06/2024 12:53, Richard Earnshaw (lists) wrote:
>> On 05/06/2024 17:07, Andre Vieira (lists) wrote:
>>> Hi,
>>>
>>> This patch adds missing assembly directives to the CMSE library wrapper to 
>>> call functions with attribute cmse_nonsecure_call.  Without the .type 
>>> directive the linker will fail to produce the correct veneer if a call to 
>>> this wrapper function is to far from the wrapper itself.  The .size was 
>>> added for completeness, though we don't necessarily have a usecase for it.
>>>
>>> I did not add a testcase as I couldn't get dejagnu to disassemble the 
>>> linked binary to check we used an appropriate branch instruction, I did 
>>> however test it locally and with this change the GNU linker now generates 
>>> an appropriate veneer and call to that veneer when 
>>> __gnu_cmse_nonsecure_call is too far.
>>>
>>> OK for trunk and backport to any release branches still in support (after 
>>> waiting a week or so)?
>>>
>>> libgcc/ChangeLog:
>>>
>>>  PR target/115360
>>>  * config/arm/cmse_nonsecure_call.S: Add .type and .size directives.
>>
>> OK.
>>
>> R.
> 
> OK to backport? I was thinking backporting it as far as gcc-11 (we haven't 
> done a 11.5 yet).
> 
> Kind Regards,
> Andre

Yes.

R.

Re: [PATCH v2] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-12 Thread Richard Earnshaw




On 12/06/2024 11:35, Richard Earnshaw (lists) wrote:
> On 11/06/2024 17:35, Wilco Dijkstra wrote:
>> Hi Christophe,
>>
>>>  PR target/115153
>> I guess this is typo (should be 115188) ?
>>
>> Correct.
>>
>>> +/* { dg-options "-O2 -mthumb" } */-mthumb is included in arm_arch_v6m, so 
>>> I think you don't need to add it
>> here?
>>
>> Indeed, it's not strictly necessary. Fixed in v2:
>>
>> A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
>> printed as LDR/STR with writeback in unified syntax, resulting in strange
>> assembler errors if writeback is selected.  To work around this, use the 'Uw'
>> constraint that blocks writeback.
> 
> Doing just this will mean that the register allocator will have to undo a 
> pre/post memory operand that was accepted by the predicate (memory_operand).  
> I think we really need a tighter predicate (lets call it noautoinc_mem_op) 
> here to avoid that.  Note that the existing uses of Uw also had another 
> alternative that did permit 'm', so this wasn't previously practical, but 
> they had alternative ways of being reloaded.

No, sorry that won't work; there's another 'm' alternative here as well.
The correct fix is to add alternatives for T1, I think, similar to the one in 
thumb1_movsi_insn.

Also, by observation I think there's a similar problem in the load operations.

R.

> 
> R.
> 
>>
>> Passes bootstrap & regress, OK for commit and backport?
>>
>> gcc:
>> PR target/115188
>> * config/arm/sync.md (arm_atomic_load): Use 'Uw' constraint.
>> (arm_atomic_store): Likewise.
>>
>> gcc/testsuite:
>> PR target/115188
>> * gcc.target/arm/pr115188.c: Add new test.
>>
>> ---
>>
>> diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
>> index 
>> df8dbe170cacb6b60d56a6f19aadd5a6c9c51f7a..e856ee51d9ae7b945c4d1e9d1f08afeedc95707a
>>  100644
>> --- a/gcc/config/arm/sync.md
>> +++ b/gcc/config/arm/sync.md
>> @@ -65,7 +65,7 @@
>>  (define_insn "arm_atomic_load"
>>[(set (match_operand:QHSI 0 "register_operand" "=r,l")
>>  (unspec_volatile:QHSI
>> -  [(match_operand:QHSI 1 "memory_operand" "m,m")]
>> +  [(match_operand:QHSI 1 "memory_operand" "m,Uw")]
>>VUNSPEC_LDR))]
>>""
>>"ldr\t%0, %1"
>> @@ -81,7 +81,7 @@
>>  )
>>
>>  (define_insn "arm_atomic_store"
>> -  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
>> +  [(set (match_operand:QHSI 0 "memory_operand" "=m,Uw")
>>  (unspec_volatile:QHSI
>>[(match_operand:QHSI 1 "register_operand" "r,l")]
>>VUNSPEC_STR))]
>> diff --git a/gcc/testsuite/gcc.target/arm/pr115188.c 
>> b/gcc/testsuite/gcc.target/arm/pr115188.c
>> new file mode 100644
>> index 
>> ..9a4022b56796d6962bb3f22e40bac4b81eb78ccf
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr115188.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do assemble } */
>> +/* { dg-require-effective-target arm_arch_v6m_ok }
>> +/* { dg-options "-O2" } */
>> +/* { dg-add-options arm_arch_v6m } */
>> +
>> +void init (int *p, int n)
>> +{
>> +  for (int i = 0; i < n; i++)
>> +__atomic_store_4 (p + i, 0, __ATOMIC_RELAXED);
>> +}
>>
>

Re: [PATCH v2] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-12 Thread Richard Earnshaw (lists)

On 11/06/2024 17:35, Wilco Dijkstra wrote:
> Hi Christophe,
> 
>>  PR target/115153
> I guess this is typo (should be 115188) ?
> 
> Correct.
> 
>> +/* { dg-options "-O2 -mthumb" } */-mthumb is included in arm_arch_v6m, so I 
>> think you don't need to add it
> here?
> 
> Indeed, it's not strictly necessary. Fixed in v2:
> 
> A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
> printed as LDR/STR with writeback in unified syntax, resulting in strange
> assembler errors if writeback is selected.  To work around this, use the 'Uw'
> constraint that blocks writeback.

Doing just this will mean that the register allocator will have to undo a 
pre/post memory operand that was accepted by the predicate (memory_operand).  I 
think we really need a tighter predicate (lets call it noautoinc_mem_op) here 
to avoid that.  Note that the existing uses of Uw also had another alternative 
that did permit 'm', so this wasn't previously practical, but they had 
alternative ways of being reloaded.

R.

> 
> Passes bootstrap & regress, OK for commit and backport?
> 
> gcc:
> PR target/115188
> * config/arm/sync.md (arm_atomic_load): Use 'Uw' constraint.
> (arm_atomic_store): Likewise.
> 
> gcc/testsuite:
> PR target/115188
> * gcc.target/arm/pr115188.c: Add new test.
> 
> ---
> 
> diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
> index 
> df8dbe170cacb6b60d56a6f19aadd5a6c9c51f7a..e856ee51d9ae7b945c4d1e9d1f08afeedc95707a
>  100644
> --- a/gcc/config/arm/sync.md
> +++ b/gcc/config/arm/sync.md
> @@ -65,7 +65,7 @@
>  (define_insn "arm_atomic_load"
>[(set (match_operand:QHSI 0 "register_operand" "=r,l")
>  (unspec_volatile:QHSI
> -  [(match_operand:QHSI 1 "memory_operand" "m,m")]
> +  [(match_operand:QHSI 1 "memory_operand" "m,Uw")]
>VUNSPEC_LDR))]
>""
>"ldr\t%0, %1"
> @@ -81,7 +81,7 @@
>  )
> 
>  (define_insn "arm_atomic_store"
> -  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
> +  [(set (match_operand:QHSI 0 "memory_operand" "=m,Uw")
>  (unspec_volatile:QHSI
>[(match_operand:QHSI 1 "register_operand" "r,l")]
>VUNSPEC_STR))]
> diff --git a/gcc/testsuite/gcc.target/arm/pr115188.c 
> b/gcc/testsuite/gcc.target/arm/pr115188.c
> new file mode 100644
> index 
> ..9a4022b56796d6962bb3f22e40bac4b81eb78ccf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pr115188.c
> @@ -0,0 +1,10 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target arm_arch_v6m_ok }
> +/* { dg-options "-O2" } */
> +/* { dg-add-options arm_arch_v6m } */
> +
> +void init (int *p, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +__atomic_store_4 (p + i, 0, __ATOMIC_RELAXED);
> +}
>

Re: [PATCH] [testsuite] [arm] test board cflags in multilib.exp

2024-06-11 Thread Richard Earnshaw (lists)

On 07/06/2024 05:47, Alexandre Oliva wrote:
> 
> multilib.exp tests for multilib-altering flags in a board's
> multilib_flags and skips the test, but if such flags appear in the
> board's cflags, with the same distorting effects on tested multilibs,
> we fail to skip the test.
> 
> Extend the skipping logic to board's cflags as well.
> 
> Regstrapping on x86_64-linux-gnu.  Already tested on arm-eabi (gcc-13
> and trunk).  Ok to install?
> 

OK, thanks.

R.

> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/arm/multilib.exp: Skip based on board cflags too.
> ---
>  gcc/testsuite/gcc.target/arm/multilib.exp |8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
> b/gcc/testsuite/gcc.target/arm/multilib.exp
> index 4442d5d754bd6..12c93bc89d222 100644
> --- a/gcc/testsuite/gcc.target/arm/multilib.exp
> +++ b/gcc/testsuite/gcc.target/arm/multilib.exp
> @@ -18,13 +18,15 @@ load_lib gcc-dg.exp
>  
>  dg-init
>  
> -if { [board_info [target_info name] exists multilib_flags] 
> - && [regexp 
> {(-marm|-mthumb|-march=.*|-mcpu=.*|-mfpu=.*|-mfloat=abi=.*)\y} [board_info 
> [target_info name] multilib_flags]] } {
> +foreach flagsvar {multilib_flags cflags} {
> +  if { [board_info [target_info name] exists $flagsvar] 
> + && [regexp 
> {(-marm|-mthumb|-march=.*|-mcpu=.*|-mfpu=.*|-mfloat=abi=.*)\y} [board_info 
> [target_info name] $flagsvar]] } {
>   
>  # Multilib flags override anything we can apply to a test, so
>  # skip if any of the above options are set there.
> -verbose "skipping multilib tests due to multilib_flags setting" 1
> +verbose "skipping multilib tests due to $flagsvar setting" 1
>  return
> +  }
>  }
>  
>  # We don't want to run this test multiple times in a parallel make check.
>

Re: [PATCH v3 2/2] testsuite: Fix expand-return CMSE test for Armv8.1-M [PR115253]

2024-06-11 Thread Richard Earnshaw (lists)

On 10/06/2024 15:04, Torbjörn SVENSSON wrote:
> For Armv8.1-M, the clearing of the registers is handled differently than
> for Armv8-M, so update the test case accordingly.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/115253
>   * gcc.target/arm/cmse/extend-return.c: Update test case
>   condition for Armv8.1-M.
> 
> Signed-off-by: Torbjörn SVENSSON 
> Co-authored-by: Yvan ROUX 
> ---
>  .../gcc.target/arm/cmse/extend-return.c   | 62 +--
>  1 file changed, 56 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
> b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
> index 081de0d699f..2288d166bd3 100644
> --- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
> +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
> @@ -1,5 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-mcmse -fshort-enums" } */
> +/* ARMv8-M expectation with target { ! arm_cmse_clear_ok }.  */
> +/* ARMv8.1-M expectation with target arm_cmse_clear_ok.  */
>  /* { dg-final { check-function-bodies "**" "" "" } } */
>  
>  #include 
> @@ -20,7 +22,15 @@ typedef enum offset __attribute__ ((cmse_nonsecure_call)) 
> ns_enum_foo_t (void);
>  typedef bool __attribute__ ((cmse_nonsecure_call)) ns_bool_foo_t (void);
>  
>  /*
> -**unsignNonsecure0:
> +**unsignNonsecure0:  { target arm_cmse_clear_ok }
> +**   ...
> +**   blxns   r[0-3]
> +**   ...
> +**   uxtbr0, r0
> +**   ...
> +*/
> +/*
> +**unsignNonsecure0: { target { ! arm_cmse_clear_ok } }
>  **   ...
>  **   bl  __gnu_cmse_nonsecure_call
>  **   uxtbr0, r0
> @@ -32,7 +42,15 @@ unsigned char unsignNonsecure0 (ns_unsign_foo_t * ns_foo_p)
>  }
>  
>  /*
> -**signNonsecure0:
> +**signNonsecure0:  { target arm_cmse_clear_ok }
> +**   ...
> +**   blxns   r[0-3]
> +**   ...
> +**   sxtbr0, r0
> +**   ...
> +*/
> +/*
> +**signNonsecure0: { target { ! arm_cmse_clear_ok } }
>  **   ...
>  **   bl  __gnu_cmse_nonsecure_call
>  **   sxtbr0, r0
> @@ -44,7 +62,15 @@ signed char signNonsecure0 (ns_sign_foo_t * ns_foo_p)
>  }
>  
>  /*
> -**shortUnsignNonsecure0:
> +**shortUnsignNonsecure0:  { target arm_cmse_clear_ok }
> +**   ...
> +**   blxns   r[0-3]
> +**   ...
> +**   uxthr0, r0
> +**   ...
> +*/
> +/*
> +**shortUnsignNonsecure0: { target { ! arm_cmse_clear_ok } }
>  **   ...
>  **   bl  __gnu_cmse_nonsecure_call
>  **   uxthr0, r0
> @@ -56,7 +82,15 @@ unsigned short shortUnsignNonsecure0 
> (ns_short_unsign_foo_t * ns_foo_p)
>  }
>  
>  /*
> -**shortSignNonsecure0:
> +**shortSignNonsecure0:  { target arm_cmse_clear_ok }
> +**   ...
> +**   blxns   r[0-3]
> +**   ...
> +**   sxthr0, r0
> +**   ...
> +*/
> +/*
> +**shortSignNonsecure0: { target { ! arm_cmse_clear_ok } }
>  **   ...
>  **   bl  __gnu_cmse_nonsecure_call
>  **   sxthr0, r0
> @@ -68,7 +102,15 @@ signed short shortSignNonsecure0 (ns_short_sign_foo_t * 
> ns_foo_p)
>  }
>  
>  /*
> -**enumNonsecure0:
> +**enumNonsecure0:  { target arm_cmse_clear_ok }
> +**   ...
> +**   blxns   r[0-3]
> +**   ...
> +**   uxtbr0, r0
> +**   ...
> +*/
> +/*
> +**enumNonsecure0: { target { ! arm_cmse_clear_ok } }
>  **   ...
>  **   bl  __gnu_cmse_nonsecure_call
>  **   uxtbr0, r0
> @@ -80,7 +122,15 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
> (ns_enum_foo_t * ns_foo_p)
>  }
>  
>  /*
> -**boolNonsecure0:
> +**boolNonsecure0:  { target arm_cmse_clear_ok }
> +**   ...
> +**   blxns   r[0-3]
> +**   ...
> +**   uxtbr0, r0
> +**   ...
> +*/
> +/*
> +**boolNonsecure0: { target { ! arm_cmse_clear_ok } }
>  **   ...
>  **   bl  __gnu_cmse_nonsecure_call
>  **   uxtbr0, r0

OK when the nits in the first patch are sorted.

R.

Re: [PATCH v3 1/2] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]

2024-06-11 Thread Richard Earnshaw (lists)

On 10/06/2024 15:04, Torbjörn SVENSSON wrote:
> Properly handle zero and sign extension for Armv8-M.baseline as
> Cortex-M23 can have the security extension active.
> Currently, there is an internal compiler error on Cortex-M23 for the
> epilog processing of sign extension.
> 
> This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.
> 
> gcc/ChangeLog:
> 
>   PR target/115253
>   * config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
>   Sign extend for Thumb1.
>   (thumb1_expand_prologue): Add zero/sign extend.
> 
> Signed-off-by: Torbjörn SVENSSON 
> Co-authored-by: Yvan ROUX 
> ---
>  gcc/config/arm/arm.cc | 71 ++-
>  1 file changed, 63 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index ea0c963a4d6..e7b4caf1083 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -19220,17 +19220,22 @@ cmse_nonsecure_call_inline_register_clear (void)
> || TREE_CODE (ret_type) == BOOLEAN_TYPE)
> && known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 4))
>   {
> -   machine_mode ret_mode = TYPE_MODE (ret_type);
> +   rtx ret_reg = gen_rtx_REG (TYPE_MODE (ret_type), R0_REGNUM);
> +   rtx si_reg = gen_rtx_REG (SImode, R0_REGNUM);
> rtx extend;
> if (TYPE_UNSIGNED (ret_type))
> - extend = gen_rtx_ZERO_EXTEND (SImode,
> -   gen_rtx_REG (ret_mode, 
> R0_REGNUM));
> + extend = gen_rtx_SET (si_reg, gen_rtx_ZERO_EXTEND (SImode,
> +ret_reg));
> else
> - extend = gen_rtx_SIGN_EXTEND (SImode,
> -   gen_rtx_REG (ret_mode, 
> R0_REGNUM));
> -   emit_insn_after (gen_rtx_SET (gen_rtx_REG (SImode, R0_REGNUM),
> -  extend), insn);
> -
> + /* Signed-extension is a special case because of
> +thumb1_extendhisi2.  */
> + if (TARGET_THUMB1

You effectively have an 'else if' split across a comment here, and the 
indentation looks weird.  Either write 'else if' on one line (and re-indent 
accordingly) or put this entire block inside braces.

> + && known_ge (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))

You can use known_eq here.  We'll never have any value other than 2, given the 
known_le (4) above and anyway it doesn't make sense to call extendhisi with any 
other size.

> +   extend = gen_thumb1_extendhisi2 (si_reg, ret_reg);
> + else
> +   extend = gen_rtx_SET (si_reg, gen_rtx_SIGN_EXTEND (SImode,
> +  ret_reg));
> +   emit_insn_after (extend, insn);
>   }
>  
>  
> @@ -27250,6 +27255,56 @@ thumb1_expand_prologue (void)
>live_regs_mask = offsets->saved_regs_mask;
>lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
>  

Similar comments to above apply to the hunk below.

> +  /* The AAPCS requires the callee to widen integral types narrower
> + than 32 bits to the full width of the register; but when handling
> + calls to non-secure space, we cannot trust the callee to have
> + correctly done so.  So forcibly re-widen the result here.  */
> +  if (IS_CMSE_ENTRY (func_type))
> +{
> +  function_args_iterator args_iter;
> +  CUMULATIVE_ARGS args_so_far_v;
> +  cumulative_args_t args_so_far;
> +  bool first_param = true;
> +  tree arg_type;
> +  tree fndecl = current_function_decl;
> +  tree fntype = TREE_TYPE (fndecl);
> +  arm_init_cumulative_args (_so_far_v, fntype, NULL_RTX, fndecl);
> +  args_so_far = pack_cumulative_args (_so_far_v);
> +  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
> + {
> +   rtx arg_rtx;
> +
> +   if (VOID_TYPE_P (arg_type))
> + break;
> +
> +   function_arg_info arg (arg_type, /*named=*/true);
> +   if (!first_param)
> + /* We should advance after processing the argument and pass
> +the argument we're advancing past.  */
> + arm_function_arg_advance (args_so_far, arg);
> +   first_param = false;
> +   arg_rtx = arm_function_arg (args_so_far, arg);
> +   gcc_assert (REG_P (arg_rtx));
> +   if ((TREE_CODE (arg_type) == INTEGER_TYPE
> +   || TREE_CODE (arg_type) == ENUMERAL_TYPE
> +   || TREE_CODE (arg_type) == BOOLEAN_TYPE)
> +   && known_lt (GET_MODE_SIZE (GET_MODE (arg_rtx)), 4))
> + {
> +   rtx res_reg = gen_rtx_REG (SImode, REGNO (arg_rtx));
> +   if (TYPE_UNSIGNED (arg_type))
> + emit_set_insn (res_reg, gen_rtx_ZERO_EXTEND (SImode, arg_rtx));
> +   else
> + /* Signed-extension is a special case because of
> +thumb1_extendhisi2.  */
> + if

Re: [PATCH] arm: Fix CASE_VECTOR_SHORTEN_MODE for thumb2.

2024-06-06 Thread Richard Earnshaw (lists)

On 06/06/2024 15:40, Richard Ball wrote:
> The CASE_VECTOR_SHORTEN_MODE query is missing some equals signs
> which causes suboptimal codegen due to missed optimisation
> opportunities. This patch also adds a test for thumb2
> switch statements as none exist currently.
> 
> gcc/ChangeLog:
>   PR target/115353
>   * config/arm/arm.h (enum arm_auto_incmodes):
>   Correct CASE_VECTOR_SHORTEN_MODE query.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/thumb2-switchstatement.c: New test.

OK.

R.

Re: arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]

2024-06-06 Thread Richard Earnshaw (lists)

On 05/06/2024 17:07, Andre Vieira (lists) wrote:
> Hi,
> 
> This patch adds missing assembly directives to the CMSE library wrapper to 
> call functions with attribute cmse_nonsecure_call.  Without the .type 
> directive the linker will fail to produce the correct veneer if a call to 
> this wrapper function is to far from the wrapper itself.  The .size was added 
> for completeness, though we don't necessarily have a usecase for it.
> 
> I did not add a testcase as I couldn't get dejagnu to disassemble the linked 
> binary to check we used an appropriate branch instruction, I did however test 
> it locally and with this change the GNU linker now generates an appropriate 
> veneer and call to that veneer when __gnu_cmse_nonsecure_call is too far.
> 
> OK for trunk and backport to any release branches still in support (after 
> waiting a week or so)?
> 
> libgcc/ChangeLog:
> 
> PR target/115360
> * config/arm/cmse_nonsecure_call.S: Add .type and .size directives.

OK.

R.

Re: [PATCH v2] testsuite: Verify r0-r3 are extended with CMSE

2024-05-22 Thread Richard Earnshaw (lists)

On 22/05/2024 12:14, Torbjorn SVENSSON wrote:
> Hello Richard,
> 
> Thanks for the reply.
> 
> From my point of view, at least the -fshort-enums part should be on all 
> branches. Just to be clean, maybe it's easier to backport the entire patch?

Yes, that's a fair point.  I was only thinking about the broadening of the test 
to the other argument registers when I said that.

So, just to be clear, OK all.

R.

> 
> Unless you have an objection, I would like to go ahead and just backport it 
> to all branches.
> 
> Kind regards,
> Torbjörn
> 
> On 2024-05-22 12:55, Richard Earnshaw (lists) wrote:
>> On 06/05/2024 12:50, Torbjorn SVENSSON wrote:
>>> Hi,
>>>
>>> Forgot to mention when I sent the patch that I would like to commit it to 
>>> the following branches:
>>>
>>> - releases/gcc-11
>>> - releases/gcc-12
>>> - releases/gcc-13
>>> - releases/gcc-14
>>> - trunk
>>>
>>
>> Well you can [commit it to the release branches], but I'm not sure it's 
>> essential.  It seems pretty unlikely to me that this would regress on a 
>> release branch without having first regressed on trunk.
>>
>> R.
>>
>>> Kind regards,
>>> Torbjörn
>>>
>>> On 2024-05-02 12:50, Torbjörn SVENSSON wrote:
>>>> Add regression test to the existing zero/sign extend tests for CMSE to
>>>> verify that r0, r1, r2 and r3 are properly extended, not just r0.
>>>>
>>>> boolCharShortEnumSecureFunc test is done using -O0 to ensure the
>>>> instructions are in a predictable order.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>  * gcc.target/arm/cmse/extend-param.c: Add regression test. Add
>>>>    -fshort-enums.
>>>>  * gcc.target/arm/cmse/extend-return.c: Add -fshort-enums option.
>>>>
>>>> Signed-off-by: Torbjörn SVENSSON 
>>>> ---
>>>>    .../gcc.target/arm/cmse/extend-param.c    | 21 +++
>>>>    .../gcc.target/arm/cmse/extend-return.c   |  4 ++--
>>>>    2 files changed, 19 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c 
>>>> b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>>>> index 01fac786238..d01ef87e0be 100644
>>>> --- a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>>>> +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>>>> @@ -1,5 +1,5 @@
>>>>    /* { dg-do compile } */
>>>> -/* { dg-options "-mcmse" } */
>>>> +/* { dg-options "-mcmse -fshort-enums" } */
>>>>    /* { dg-final { check-function-bodies "**" "" "" } } */
>>>>      #include 
>>>> @@ -78,7 +78,6 @@ __attribute__((cmse_nonsecure_entry)) char 
>>>> enumSecureFunc (enum offset index) {
>>>>  if (index >= ARRAY_SIZE)
>>>>    return 0;
>>>>  return array[index];
>>>> -
>>>>    }
>>>>      /*
>>>> @@ -88,9 +87,23 @@ __attribute__((cmse_nonsecure_entry)) char 
>>>> enumSecureFunc (enum offset index) {
>>>>    **    ...
>>>>    */
>>>>    __attribute__((cmse_nonsecure_entry)) char boolSecureFunc (bool index) {
>>>> -
>>>>  if (index >= ARRAY_SIZE)
>>>>    return 0;
>>>>  return array[index];
>>>> +}
>>>>    -}
>>>> \ No newline at end of file
>>>> +/*
>>>> +**__acle_se_boolCharShortEnumSecureFunc:
>>>> +**    ...
>>>> +**    uxtb    r0, r0
>>>> +**    uxtb    r1, r1
>>>> +**    uxth    r2, r2
>>>> +**    uxtb    r3, r3
>>>> +**    ...
>>>> +*/
>>>> +__attribute__((cmse_nonsecure_entry,optimize(0))) char 
>>>> boolCharShortEnumSecureFunc (bool a, unsigned char b, unsigned short c, 
>>>> enum offset d) {
>>>> +  size_t index = a + b + c + d;
>>>> +  if (index >= ARRAY_SIZE)
>>>> +    return 0;
>>>> +  return array[index];
>>>> +}
>>>> diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
>>>> b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
>>>> index cf731ed33df..081de0d699f 100644
>>>> --- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
>>>> +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
>>>> @@ -1,5 +1,5 @@
>>>>    /* { dg-do compile } */
>>>> -/* { dg-options "-mcmse" } */
>>>> +/* { dg-options "-mcmse -fshort-enums" } */
>>>>    /* { dg-final { check-function-bodies "**" "" "" } } */
>>>>      #include 
>>>> @@ -89,4 +89,4 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
>>>> (ns_enum_foo_t * ns_foo_p)
>>>>    unsigned char boolNonsecure0 (ns_bool_foo_t * ns_foo_p)
>>>>    {
>>>>  return ns_foo_p ();
>>>> -}
>>>> \ No newline at end of file
>>>> +}
>>

Re: [PATCH v2] testsuite: Verify r0-r3 are extended with CMSE

2024-05-22 Thread Richard Earnshaw (lists)

On 06/05/2024 12:50, Torbjorn SVENSSON wrote:
> Hi,
> 
> Forgot to mention when I sent the patch that I would like to commit it to the 
> following branches:
> 
> - releases/gcc-11
> - releases/gcc-12
> - releases/gcc-13
> - releases/gcc-14
> - trunk
> 

Well you can [commit it to the release branches], but I'm not sure it's 
essential.  It seems pretty unlikely to me that this would regress on a release 
branch without having first regressed on trunk.

R.

> Kind regards,
> Torbjörn
> 
> On 2024-05-02 12:50, Torbjörn SVENSSON wrote:
>> Add regression test to the existing zero/sign extend tests for CMSE to
>> verify that r0, r1, r2 and r3 are properly extended, not just r0.
>>
>> boolCharShortEnumSecureFunc test is done using -O0 to ensure the
>> instructions are in a predictable order.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/arm/cmse/extend-param.c: Add regression test. Add
>>   -fshort-enums.
>> * gcc.target/arm/cmse/extend-return.c: Add -fshort-enums option.
>>
>> Signed-off-by: Torbjörn SVENSSON 
>> ---
>>   .../gcc.target/arm/cmse/extend-param.c    | 21 +++
>>   .../gcc.target/arm/cmse/extend-return.c   |  4 ++--
>>   2 files changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c 
>> b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>> index 01fac786238..d01ef87e0be 100644
>> --- a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>> +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>> @@ -1,5 +1,5 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-mcmse" } */
>> +/* { dg-options "-mcmse -fshort-enums" } */
>>   /* { dg-final { check-function-bodies "**" "" "" } } */
>>     #include 
>> @@ -78,7 +78,6 @@ __attribute__((cmse_nonsecure_entry)) char enumSecureFunc 
>> (enum offset index) {
>>     if (index >= ARRAY_SIZE)
>>   return 0;
>>     return array[index];
>> -
>>   }
>>     /*
>> @@ -88,9 +87,23 @@ __attribute__((cmse_nonsecure_entry)) char enumSecureFunc 
>> (enum offset index) {
>>   **    ...
>>   */
>>   __attribute__((cmse_nonsecure_entry)) char boolSecureFunc (bool index) {
>> -
>>     if (index >= ARRAY_SIZE)
>>   return 0;
>>     return array[index];
>> +}
>>   -}
>> \ No newline at end of file
>> +/*
>> +**__acle_se_boolCharShortEnumSecureFunc:
>> +**    ...
>> +**    uxtb    r0, r0
>> +**    uxtb    r1, r1
>> +**    uxth    r2, r2
>> +**    uxtb    r3, r3
>> +**    ...
>> +*/
>> +__attribute__((cmse_nonsecure_entry,optimize(0))) char 
>> boolCharShortEnumSecureFunc (bool a, unsigned char b, unsigned short c, enum 
>> offset d) {
>> +  size_t index = a + b + c + d;
>> +  if (index >= ARRAY_SIZE)
>> +    return 0;
>> +  return array[index];
>> +}
>> diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
>> b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
>> index cf731ed33df..081de0d699f 100644
>> --- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
>> +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
>> @@ -1,5 +1,5 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-mcmse" } */
>> +/* { dg-options "-mcmse -fshort-enums" } */
>>   /* { dg-final { check-function-bodies "**" "" "" } } */
>>     #include 
>> @@ -89,4 +89,4 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
>> (ns_enum_foo_t * ns_foo_p)
>>   unsigned char boolNonsecure0 (ns_bool_foo_t * ns_foo_p)
>>   {
>>     return ns_foo_p ();
>> -}
>> \ No newline at end of file
>> +}

Re: [PATCH] aarch64: Fix typo in aarch64-ldp-fusion.cc:combine_reg_notes [PR114936]

2024-05-07 Thread Richard Earnshaw (lists)

On 03/05/2024 15:45, Alex Coplan wrote:
> This fixes a typo in combine_reg_notes in the load/store pair fusion
> pass.  As it stands, the calls to filter_notes store any
> REG_FRAME_RELATED_EXPR to fr_expr with the following association:
> 
>  - i2 -> fr_expr[0]
>  - i1 -> fr_expr[1]
> 
> but then the checks inside the following if statement expect the
> opposite (more natural) association, i.e.:
> 
>  - i2 -> fr_expr[1]
>  - i1 -> fr_expr[0]
> 
> this patch fixes the oversight by swapping the fr_expr indices in the
> calls to filter_notes.
> 
> In hindsight it would probably have been less confusing / error-prone to
> have combine_reg_notes take an array of two insns, then we wouldn't have
> to mix 1-based and 0-based indexing as well as remembering to call
> filter_notes in reverse program order.  This however is a minimal fix
> for backporting purposes.
> 
> Many thanks to Matthew for spotting this typo and pointing it out to me.
> 
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk and the 14
> branch after the 14.1 release?
> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR target/114936
>   * config/aarch64/aarch64-ldp-fusion.cc (combine_reg_notes):
>   Ensure insn iN has its REG_FRAME_RELATED_EXPR (if any) stored in
>   FR_EXPR[N-1], thus matching the correspondence expected by the
>   copy_rtx calls.


OK.

R.

Re: [PATCH] testsuite: Verify r0-r3 are extended with CMSE

2024-04-30 Thread Richard Earnshaw (lists)

On 30/04/2024 16:37, Torbjorn SVENSSON wrote:
> 
> 
> On 2024-04-30 17:11, Richard Earnshaw (lists) wrote:
>> On 27/04/2024 15:13, Torbjörn SVENSSON wrote:
>>> Add regression test to the existing zero/sign extend tests for CMSE to
>>> verify that r0, r1, r2 and r3 are properly extended, not just r0.
>>>
>>> Test is done using -O0 to ensure the instructions are in a predictable
>>> order.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/arm/cmse/extend-param.c: Add regression test.
>>>
>>> Signed-off-by: Torbjörn SVENSSON 
>>> ---
>>>   .../gcc.target/arm/cmse/extend-param.c    | 20 ++-
>>>   1 file changed, 19 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c 
>>> b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>>> index 01fac786238..b8b8ecbff56 100644
>>> --- a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>>> +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
>>> @@ -93,4 +93,22 @@ __attribute__((cmse_nonsecure_entry)) char 
>>> boolSecureFunc (bool index) {
>>>   return 0;
>>>     return array[index];
>>>   -}
>>> \ No newline at end of file
>>> +}
>>> +
>>> +/*
>>> +**__acle_se_boolCharShortEnumSecureFunc:
>>> +**    ...
>>> +**    uxtb    r0, r0
>>> +**    uxtb    r1, r1
>>> +**    uxth    r2, r2
>>> +**    uxtb    r3, r3
>>> +**    ...
>>> +*/
>>> +__attribute__((cmse_nonsecure_entry,optimize(0))) char 
>>> boolCharShortEnumSecureFunc (bool a, unsigned char b, unsigned short c, 
>>> enum offset d) {
>>> +
>>> +  size_t index = a + b + c + d;
>>> +  if (index >= ARRAY_SIZE)
>>> +    return 0;
>>> +  return array[index];
>>> +
>>> +}
>>
>> Ok, but please can you add '-fshort-enums' to dg-options to ensure this test 
>> still behaves correctly if run with a different default (I missed that last 
>> time around).
> 
> Ok, I'll add that to extend-param.c. Do you want me to also add it to the 
> extend-return.c test case?
> 
> Kind regards,
> Torbjörn

Yes please, if it has the same issue.

R.

Re: [PATCH] testsuite: Verify r0-r3 are extended with CMSE

2024-04-30 Thread Richard Earnshaw (lists)

On 27/04/2024 15:13, Torbjörn SVENSSON wrote:
> Add regression test to the existing zero/sign extend tests for CMSE to
> verify that r0, r1, r2 and r3 are properly extended, not just r0.
> 
> Test is done using -O0 to ensure the instructions are in a predictable
> order.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/cmse/extend-param.c: Add regression test.
> 
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  .../gcc.target/arm/cmse/extend-param.c| 20 ++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c 
> b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
> index 01fac786238..b8b8ecbff56 100644
> --- a/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
> +++ b/gcc/testsuite/gcc.target/arm/cmse/extend-param.c
> @@ -93,4 +93,22 @@ __attribute__((cmse_nonsecure_entry)) char boolSecureFunc 
> (bool index) {
>  return 0;
>return array[index];
>  
> -}
> \ No newline at end of file
> +}
> +
> +/*
> +**__acle_se_boolCharShortEnumSecureFunc:
> +**   ...
> +**   uxtbr0, r0
> +**   uxtbr1, r1
> +**   uxthr2, r2
> +**   uxtbr3, r3
> +**   ...
> +*/
> +__attribute__((cmse_nonsecure_entry,optimize(0))) char 
> boolCharShortEnumSecureFunc (bool a, unsigned char b, unsigned short c, enum 
> offset d) {
> +
> +  size_t index = a + b + c + d;
> +  if (index >= ARRAY_SIZE)
> +return 0;
> +  return array[index];
> +
> +}

Ok, but please can you add '-fshort-enums' to dg-options to ensure this test 
still behaves correctly if run with a different default (I missed that last 
time around).

R.

Re: [PATCH][GCC] aarch64: Fix SCHEDULER_IDENT for Cortex-A510

2024-04-26 Thread Richard Earnshaw (lists)

On 25/04/2024 15:59, Richard Ball wrote:
> Hi Richard,
> 
> I committed this combined patch (with Cortex-A520) for trunk 
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=cab53aae43cf94171b01320c08302e47a5daa391
>  
> <https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=cab53aae43cf94171b01320c08302e47a5daa391>
> 
> Am I ok to commit just the Cortex-A510 half into gcc-12 and gcc-13.

Yes, if that's the correct thing to do there.

R.

> 
> Thanks,
> Richard Ball
> --
> *From:* Richard Ball
> *Sent:* 12 March 2024 14:08
> *To:* gcc-patches@gcc.gnu.org ; Richard Earnshaw 
> ; Richard Sandiford ; 
> Marcus Shawcroft 
> *Subject:* [PATCH][GCC] aarch64: Fix SCHEDULER_IDENT for Cortex-A510
>  
> The SCHEDULER_IDENT for this CPU was incorrectly
> set to cortexa55, which is incorrect. This can cause
> sub-optimal asm to be generated.
> 
> Ok for trunk?
> 
> Can I also backport this to gcc-12 and gcc-13?
> 
> gcc/ChangeLog:
>     PR target/114272
>     * config/aarch64/aarch64-cores.def (AARCH64_CORE):
>     Change SCHEDULER_IDENT from cortexa55 to cortexa53
>     for Cortex-A510.

Re: [PATCH] arm: Zero/Sign extends for CMSE security

2024-04-26 Thread Richard Earnshaw (lists)

On 26/04/2024 09:39, Torbjorn SVENSSON wrote:
> Hi,
> 
> On 2024-04-25 16:25, Richard Ball wrote:
>> Hi Torbjorn,
>>
>> Thanks very much for the comments.
>> I think given that the code that handles this, is within a 
>> FOREACH_FUNCTION_ARGS loop.
>> It seems a fairly safe assumption that if the code works for one that it 
>> will work for all.
>> To go back and add extra tests to me seems a little overkill.
> 
> For verifying that the implementation does the right thing now, no, but for 
> verifying against future regressions, then yes.
> 
> So, from a regression point of view, I think it makes sense to have the check 
> that more than the first argument is managed properly.
> 
> Kind regards,
> Torbjörn

Feel free to post some additional tests, Torbjorn.

R.

Re: [PATCH] arm: Zero/Sign extends for CMSE security

2024-04-25 Thread Richard Earnshaw (lists)

On 24/04/2024 16:55, Richard Ball wrote:
> This patch makes the following changes:
> 
> 1) When calling a secure function from non-secure code then any arguments
>smaller than 32-bits that are passed in registers are zero- or 
> sign-extended.
> 2) After a non-secure function returns into secure code then any return value
>smaller than 32-bits that is passed in a register is  zero- or 
> sign-extended.
> 
> This patch addresses the following CVE-2024-0151.
> 
> gcc/ChangeLog:
> PR target/114837
> * config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
>   Add zero/sign extend.
> (arm_expand_prologue): Add zero/sign extend.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/arm/cmse/extend-param.c: New test.
> * gcc.target/arm/cmse/extend-return.c: New test.

OK.  And OK to backport to active branches.

R.

Re: [PATCH] [testsuite] [arm] require arm_v8_1m_main for pacbti tests

2024-04-19 Thread Richard Earnshaw (lists)

On 19/04/2024 13:45, Alexandre Oliva wrote:
> On Apr 16, 2024, "Richard Earnshaw (lists)"  wrote:
> 
>> The require-effective-target flags test whether a specific set of
>> flags will make the compilation work, so they need to be used in
>> conjunction with the corresponding dg-add-options flags that then
>> apply those options.
> 
> *nod*, that's the theory.  Problem is the architectures suported by
> [add_options_for_]arm_arch_*[_ok] do not match exactly those expected by
> the tests, and I can't quite tell whether the subtle changes they would
> introduce would change what they intend to test, or even whether the
> differences are irrelevant, or would be sensible to add as variants to
> the dg machinery.  I think it would take someone more familiar than I am
> with all of the ARM variants to do this correctly.  I don't even know
> how these changes would need to be tested to be sure they remain
> correct.

It's ok to add additional variations to the table of variants in 
target-supports.exp, but we should avoid writing new specific run-time 
functions unless we really want an executable test.

I started doing some cleanup of the Arm tests infrastructure during phase 3, 
but stopped during phase 4 as I wanted to minimise the changes being made now.  
I plan to go back and work on it some more once stage 1 re-opens.

> 
> Would you be willing to take it from here, or would you accept the patch
> as an incremental yet imperfect improvement, or would you prefer to
> guide me in making it correct, and in verifying it (there are questions
> below)?  I don't have a lot of cycles to put into this (we've already
> worked around the testsuite bugs we ran into), but it would be desirable
> to get a fix into GCC as well, if we can converge on one without
> unreasonably burdening anyone.
> 
> 
>   v8_1m_main "-march=armv8.1-m.main+fp -mthumb" __ARM_ARCH_8M_MAIN__
>   v8_1m_main_pacbti "-march=armv8.1-m.main+pacbti+fp -mthumb"
>   "__ARM_ARCH_8M_MAIN__ && __ARM_FEATURE_BTI && 
> __ARM_FEATURE_PAUTH
> 
> Why do these have +fp in -march but not in the v8_1m* arch name?

It's ... complicated :)

The +fp is there because, with the move to having -mfpu=auto as the default, we 
need to avoid problems when the compiler has been configured with 
--with-float=hard, which requires the extension register set (fp or vector 
support) even if the test code itself doesn't care.  The best way to handle 
this in most cases is to give the architecture strings a default FPU 
specification (ie +fp). 

> 
> 
> gcc/testsuite/g++.target/arm/pac-1.C:
> /* { dg-options "-march=armv8.1-m.main+mve+pacbti -mbranch-protection=pac-ret 
> -mthumb -mfloat-abi=hard -g -O0" } */
> 
> v8_1m_main_pacbti plus +mve minus +fp.
> Do we need a dg arch for that?

I'd be inclined to drop +mve from this one; there's nothing I can see in the 
test that would generate mve instructions, so I think it's irrelevant.  We can 
use the existing v8_1m_main_pacbti operations.

> 
> 
> gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-7.c:
> /* { dg-additional-options "-march=armv8.1-m.main+pacbti+fp --save-temps 
> -mfloat-abi=hard" } */
> gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c:
> /* { dg-options "-march=armv8.1-m.main+fp+pacbti" } */
> 
> v8_1m_main_pacbti minus -mthumb.
> AFAICT the -mthumb is redundant.

Nearly, but not quite.  Although the gcc driver knows that m-profile 
architectures require thumb, that's not enough to override an explicit -marm 
from a testsuite configuration run, so if your site.exp file adds -marm in a 
test configuration we need to override that or the test will fail.  But the 
table based list of options will do that for you.

> 
> 
> gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-12.c:
> /* { dg-options "-march=armv8-m.main+fp -mfloat-abi=softfp" } */
> 
> v8_1m_main minus -mthumb.
> AFAICT the -mthumb is redundant.

As above

> 
> 
> gcc/testsuite/gcc.target/arm/bti-1.c:
> /* { dg-options "-march=armv8.1-m.main -mthumb -mfloat-abi=softfp 
> -mbranch-protection=bti --save-temps" } */
> gcc/testsuite/gcc.target/arm/bti-2.c:
> /* { dg-options "-march=armv8.1-m.main -mthumb -mfloat-abi=softfp 
> -mbranch-protection=bti --save-temps" } */
> 
> v8_1m_main minus +fp.> 
> Can these be bumped to +fp, or do we need an extra dg arch?
> 
> Are these missing +pacbti?

The tests themselves do not require fp, but if we use the effective-target 
rules (arm_arch_v8_1m_main), we can remove the -march, -mthumb and -mfloat-abi 
flags from these tests.

These tests for BTI should NOT have +pacbti: they're testing that the compiler 
generates the right nop-based implementation that is backw

Re: [PATCH]AArch64: remove reliance on register allocator for simd/gpreg costing. [PR114741]

2024-04-18 Thread Richard Earnshaw (lists)

On 18/04/2024 11:11, Tamar Christina wrote:
> Hi All,
> 
> In PR114741 we see that we have a regression in codegen when SVE is enable 
> where
> the simple testcase:
> 
> void foo(unsigned v, unsigned *p)
> {
> *p = v & 1;
> }
> 
> generates
> 
> foo:
> fmovs31, w0
> and z31.s, z31.s, #1
> str s31, [x1]
> ret
> 
> instead of:
> 
> foo:
> and w0, w0, 1
> str w0, [x1]
> ret
> 
> This causes an impact it not just codesize but also performance.  This is 
> caused
> by the use of the ^ constraint modifier in the pattern 3.
> 
> The documentation states that this modifier should only have an effect on the
> alternative costing in that a particular alternative is to be preferred unless
> a non-psuedo reload is needed.
> 
> The pattern was trying to convey that whenever both r and w are required, that
> it should prefer r unless a reload is needed.  This is because if a reload is
> needed then we can construct the constants more flexibly on the SIMD side.
> 
> We were using this so simplify the implementation and to get generic cases 
> such
> as:
> 
> double negabs (double x)
> {
>unsigned long long y;
>memcpy (, , sizeof(double));
>y = y | (1UL << 63);
>memcpy (, , sizeof(double));
>return x;
> }
> 
> which don't go through an expander.
> However the implementation of ^ in the register allocator is not according to
> the documentation in that it also has an effect during coloring.  During 
> initial
> register class selection it applies a penalty to a class, similar to how ? 
> does.
> 
> In this example the penalty makes the use of GP regs expensive enough that it 
> no
> longer considers them:
> 
> r106: preferred FP_REGS, alternative NO_REGS, allocno FP_REGS
> ;;3--> b  0: i   9 r106=r105&0x1
> :cortex_a53_slot_any:GENERAL_REGS+0(-1)FP_REGS+1(1)PR_LO_REGS+0(0)
>  PR_HI_REGS+0(0):model 4
> 
> which is not the expected behavior.  For GCC 14 this is a conservative fix.
> 
> 1. we remove the ^ modifier from the logical optabs.
> 
> 2. In order not to regress copysign we then move the copysign expansion to
>directly use the SIMD variant.  Since copysign only supports floating point
>modes this is fine and no longer relies on the register allocator to select
>the right alternative.
> 
> It once again regresses the general case, but this case wasn't optimized in
> earlier GCCs either so it's not a regression in GCC 14.  This change gives
> strict better codegen than earlier GCCs and still optimizes the important 
> cases.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 
>   PR target/114741
>   * config/aarch64/aarch64.md (3): Remove ^ from alt 2.
>   (copysign3): Use SIMD version of IOR directly.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/114741
>   * gcc.target/aarch64/fneg-abs_2.c: Update codegen.
>   * gcc.target/aarch64/fneg-abs_4.c: xfail for now.
>   * gcc.target/aarch64/pr114741.c: New test.
> 
> ---
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 385a669b9b3c31cc9108a660e881b9091c71fc7c..dbde066f7478bec51a8703b017ea553aa98be309
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4811,7 +4811,7 @@ (define_insn "3"
>""
>{@ [ cons: =0 , 1  , 2; attrs: type , arch  ]
>   [ r, %r , r; logic_reg   , * ] \t%0, 
> %1, %2
> - [ rk   , ^r ,  ; logic_imm   , * ] \t%0, 
> %1, %2
> + [ rk   , r  ,  ; logic_imm   , * ] \t%0, 
> %1, %2
>   [ w, 0  ,  ; *   , sve   ] \t%Z0., 
> %Z0., #%2
>   [ w, w  , w; neon_logic  , simd  ] 
> \t%0., %1., %2.
>}
> @@ -7192,22 +7192,29 @@ (define_expand "copysign3"
> (match_operand:GPF 2 "nonmemory_operand")]
>"TARGET_SIMD"
>  {
> -  machine_mode int_mode = mode;
> -  rtx bitmask = gen_reg_rtx (int_mode);
> -  emit_move_insn (bitmask, GEN_INT (HOST_WIDE_INT_M1U
> - << (GET_MODE_BITSIZE (mode) - 1)));
> +  rtx signbit_const = GEN_INT (HOST_WIDE_INT_M1U
> +<< (GET_MODE_BITSIZE (mode) - 1));
>/* copysign (x, -1) should instead be expanded as orr with the sign
>   bit.  */
>rtx op2_elt = unwrap_const_vec_duplicate (operands[2]);
>if (GET_CODE (op2_elt) == CONST_DOUBLE
>&& real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
>  {
> -  emit_insn (gen_ior3 (
> - lowpart_subreg (int_mode, operands[0], mode),
> - lowpart_subreg (int_mode, operands[1], mode), bitmask));
> +  rtx v_bitmask
> + = force_reg (V2mode,
> +  gen_const_vec_duplicate (V2mode,
> +   signbit_const));
> +
> +  emit_insn (gen_iorv23 (
> + lowpart_subreg (V2mode, operands[0], mode),
> + lowpart_subreg

Re: [PATCH] [testsuite] [arm] accept empty init for bfloat16

2024-04-16 Thread Richard Earnshaw (lists)

On 16/04/2024 04:50, Alexandre Oliva wrote:
> 
> Complete r13-2205, adjusting an arm-specific test that expects a
> no-longer-issued error at an empty initializer.
> 
> Regstrapped on x86_64-linux-gnu.  Also tested with gcc-13 on arm-,
> aarch64-, x86- and x86_64-vxworks7r2.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/arm/bfloat16_scalar_typecheck.c: Accept C23
> empty initializers.
> ---
>  .../gcc.target/arm/bfloat16_scalar_typecheck.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c 
> b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c
> index 8c80c55bc9f4c..04ede93bda152 100644
> --- a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c
> +++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c
> @@ -42,7 +42,7 @@ bfloat16_t footest (bfloat16_t scalar0)
>short initi_1_4 = glob_bfloat; /* { dg-error {invalid conversion from type 
> 'bfloat16_t'} } */
>double initi_1_5 = glob_bfloat; /* { dg-error {invalid conversion from 
> type 'bfloat16_t'} } */
>  
> -  bfloat16_t scalar2_1 = {}; /* { dg-error {empty scalar initializer} } */
> +  bfloat16_t scalar2_1 = {};
>bfloat16_t scalar2_2 = { glob_bfloat };
>bfloat16_t scalar2_3 = { 0 }; /* { dg-error {invalid conversion to type 
> 'bfloat16_t'} } */
>bfloat16_t scalar2_4 = { 0.1 }; /* { dg-error {invalid conversion to type 
> 'bfloat16_t'} } */
> @@ -94,7 +94,7 @@ bfloat16_t footest (bfloat16_t scalar0)
>  
>/* Compound literals.  */
>  
> -  (bfloat16_t) {}; /* { dg-error {empty scalar initializer} } */
> +  (bfloat16_t) {};
>(bfloat16_t) { glob_bfloat };
>(bfloat16_t) { 0 }; /* { dg-error {invalid conversion to type 
> 'bfloat16_t'} } */
>(bfloat16_t) { 0.1 }; /* { dg-error {invalid conversion to type 
> 'bfloat16_t'} } */
> 


This test is checking for errors.  Perhaps it would be better to select an 
older version of the standard and then set pedantic-error mode.

R.

Re: [testsuite] [aarch64] Require fpic effective target

2024-04-16 Thread Richard Earnshaw (lists)

On 16/04/2024 04:08, Alexandre Oliva wrote:
> Regstrapped on x86_64-linux-gnu.  Also tested with gcc-13 on arm-,
> aarch64-, x86- and x86_64-vxworks7r2.  Ok to install?
> 
> Co-authored-by: Olivier Hainque 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/aarch64/pr94201.c: Add missing
>   dg-require-effective-target fpic.
>   * gcc.target/aarch64/pr103085.c: Likewise.
> 
> ---
>  gcc/testsuite/gcc.target/aarch64/pr103085.c |1 +
>  gcc/testsuite/gcc.target/aarch64/pr94201.c  |1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr103085.c 
> b/gcc/testsuite/gcc.target/aarch64/pr103085.c
> index dbc9c15b71f22..347280ed42b2d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr103085.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr103085.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fstack-protector-strong -fPIC" } */
> +/* { dg-require-effective-target fpic } */
>  
>  void g(int*);
>  void
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr94201.c 
> b/gcc/testsuite/gcc.target/aarch64/pr94201.c
> index 691761691868a..3b9b79059e02b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr94201.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr94201.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-mcmodel=tiny -mabi=ilp32 -fPIC" } */
> +/* { dg-require-effective-target fpic } */
>  
>  extern int bar (void *);
>  extern long long a;
> 


OK

R.

Re: [PATCH] [testsuite] [arm] require arm_v8_1m_main for pacbti tests

2024-04-16 Thread Richard Earnshaw (lists)

On 16/04/2024 04:48, Alexandre Oliva wrote:
> 
> arm pac and bti tests that use -march=armv8.1-m.main get an implicit
> -mthumb, that is incompatible with vxworks kernel mode.  Declaring the
> requirement for a 8.1-m.main-compatible toolchain is enough to avoid
> those fails, because the toolchain feature test fails in kernel mode.
> 
> Regstrapped on x86_64-linux-gnu.  Also tested with gcc-13 on arm-,
> aarch64-, x86- and x86_64-vxworks7r2.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * g++.target/arm/pac-1.C: Require arm_arch_v8_1m_main.
>   * gcc.target/arm/acle/pacbti-m-predef-11.c: Likewise.
>   * gcc.target/arm/acle/pacbti-m-predef-12.c: Likewise.
>   * gcc.target/arm/acle/pacbti-m-predef-7.c: Likewise.
>   * gcc.target/arm/bti-1.c: Likewise.
>   * gcc.target/arm/bti-2.c: Likewise.
> ---
>  gcc/testsuite/g++.target/arm/pac-1.C   |1 +
>  .../gcc.target/arm/acle/pacbti-m-predef-11.c   |1 +
>  .../gcc.target/arm/acle/pacbti-m-predef-12.c   |1 +
>  .../gcc.target/arm/acle/pacbti-m-predef-7.c|1 +
>  gcc/testsuite/gcc.target/arm/bti-1.c   |1 +
>  gcc/testsuite/gcc.target/arm/bti-2.c   |1 +
>  6 files changed, 6 insertions(+)
> 
> diff --git a/gcc/testsuite/g++.target/arm/pac-1.C 
> b/gcc/testsuite/g++.target/arm/pac-1.C
> index f671a27b048c6..f48ad6cc5cb65 100644
> --- a/gcc/testsuite/g++.target/arm/pac-1.C
> +++ b/gcc/testsuite/g++.target/arm/pac-1.C
> @@ -2,6 +2,7 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
> "-mcpu=*" } } */
>  /* { dg-options "-march=armv8.1-m.main+mve+pacbti 
> -mbranch-protection=pac-ret -mthumb -mfloat-abi=hard -g -O0" } */
> +/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */

The require-effective-target flags test whether a specific set of flags will 
make the compilation work, so they need to be used in conjunction with the 
corresponding dg-add-options flags that then apply those options.  It isn't 
safe to just add a different architecture flag instead.  So if you're going to 
use this effective target, you should use it along with "dg-add-options 
arm_arch_v8_1m_main" (ie the effective-target name minus the trailing '_ok'), 
and then replace dg-options with dg-additional-options adding the remaining 
flags.  You can then remove the dg-skip-if as well because that's what the 
require-effective-target flag is doing.  So something like

dg-do compile
dg-require-effective-target arm_arch_v8_1m_main_ok
dg-add-options arm_arch_v8_1m_main
dg-additional-options "-mbranch-protection=pac-ret -g -O0"

But this test is also adding pacbti to the architecture flags, so it would 
probably be better to use v8_1m_main_pacbti_ok as the effective target.  It's 
not identical to the options above, but it's probably sufficient for this test. 
 Each test below will need checking for the exact flags that are needed for the 
test in question.


>  
>  __attribute__((noinline)) void
>  fn1 (int a, int b, int c)
> diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c 
> b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c
> index 6a5ae92c567f3..dba4f491cfea7 100644
> --- a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c
> +++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
> "-mcpu=*" "-mfloat-abi=*" } } */
>  /* { dg-options "-march=armv8.1-m.main+fp+pacbti" } */
> +/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */
>  
>  #if (__ARM_FEATURE_BTI != 1)
>  #error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be defined to 1."
> diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-12.c 
> b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-12.c
> index db40b17c3b030..308a41eb4ba4c 100644
> --- a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-12.c
> +++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-12.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
> "-mcpu=*" } } */
>  /* { dg-options "-march=armv8-m.main+fp -mfloat-abi=softfp" } */
> +/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */
>  
>  #if defined (__ARM_FEATURE_BTI)
>  #error "Feature test macro __ARM_FEATURE_BTI should not be defined."
> diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-7.c 
> b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-7.c
> index 1b25907635e24..10836a84bde56 100644
> --- a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-7.c
> +++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-7.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
> "-mcpu=*" } } */
>  /* { dg-additional-options "-march=armv8.1-m.main+pacbti+fp --save-temps 
> -mfloat-abi=hard" } */
> +/* {

Re: [PATCH 1/1] aarch64: Sync aarch64-sys-regs.def with Binutils

2024-03-20 Thread Richard Earnshaw (lists)

On 20/03/2024 11:21, Yury Khrustalev wrote:
> This patch updates `aarch64-sys-regs.def', bringing it into sync with
> the Binutils source.
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-sys-regs.def: Copy from Binutils.

Thanks, I've pushed this.  It's trivial enough and there's value of keeping it 
in sync with binutils.

One comment though, there should be one hard tab before "* config/..."; you 
seem to have some other random characters there that looked like white space.

R.

> ---
>  gcc/config/aarch64/aarch64-sys-regs.def | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
> b/gcc/config/aarch64/aarch64-sys-regs.def
> index 6a948171d6e..8b65673a5d6 100644
> --- a/gcc/config/aarch64/aarch64-sys-regs.def
> +++ b/gcc/config/aarch64/aarch64-sys-regs.def
> @@ -521,6 +521,7 @@
>SYSREG ("id_aa64isar0_el1",CPENC (3,0,0,6,0),  F_REG_READ, 
> AARCH64_NO_FEATURES)
>SYSREG ("id_aa64isar1_el1",CPENC (3,0,0,6,1),  F_REG_READ, 
> AARCH64_NO_FEATURES)
>SYSREG ("id_aa64isar2_el1",CPENC (3,0,0,6,2),  F_REG_READ, 
> AARCH64_NO_FEATURES)
> +  SYSREG ("id_aa64isar3_el1",CPENC (3,0,0,6,3),  F_REG_READ, 
> AARCH64_NO_FEATURES)
>SYSREG ("id_aa64mmfr0_el1",CPENC (3,0,0,7,0),  F_REG_READ, 
> AARCH64_NO_FEATURES)
>SYSREG ("id_aa64mmfr1_el1",CPENC (3,0,0,7,1),  F_REG_READ, 
> AARCH64_NO_FEATURES)
>SYSREG ("id_aa64mmfr2_el1",CPENC (3,0,0,7,2),  F_REG_READ, 
> AARCH64_NO_FEATURES)

Re: [PATCH] arm: [MVE intrinsics] Fix support for loads [PR target/114323]

2024-03-18 Thread Richard Earnshaw (lists)





On 15/03/2024 20:08, Christophe Lyon wrote:

The testcase in this PR shows that we would load from an uninitialized
location, because the vld1 instrinsics are reported as "const". This
is because function_instance::reads_global_state_p() does not take
CP_READ_MEMORY into account.  Fixing this gives vld1 the "pure"
attribute instead, and solves the problem.

2024-03-15  Christophe Lyon  

PR target/114323
gcc/
* config/arm/arm-mve-builtins.cc
(function_instance::reads_global_state_p): Take CP_READ_MEMORY
into account.

gcc/testsuite/
* gcc.target/arm/mve/pr114323.c: New.


OK.

R.


---
  gcc/config/arm/arm-mve-builtins.cc  |  2 +-
  gcc/testsuite/gcc.target/arm/mve/pr114323.c | 22 +
  2 files changed, 23 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114323.c

diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index 2f2c0f4a02a..6a5775c67e5 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -746,7 +746,7 @@ function_instance::reads_global_state_p () const
if (flags & CP_READ_FPCR)
  return true;
  
-  return false;

+  return flags & CP_READ_MEMORY;
  }
  
  /* Return true if calls to the function could modify some form of

diff --git a/gcc/testsuite/gcc.target/arm/mve/pr114323.c 
b/gcc/testsuite/gcc.target/arm/mve/pr114323.c
new file mode 100644
index 000..bd9127b886a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/pr114323.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_mve_hw } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#include 
+
+__attribute__((noipa))
+uint32x4_t foo (void) {
+  uint32x4_t V0 = vld1q_u32(((const uint32_t[4]){1, 2, 3, 4}));
+  return V0;
+}
+
+int main(void)
+{
+  uint32_t buf[4];
+ vst1q_u32 (buf, foo());
+
+  for (int i = 0; i < 4; i++)
+if (buf[i] != i+1)
+  __builtin_abort ();
+}

Re: [PATCH] testsuite: Turn errors back into warnings in arm/acle/cde-mve-error-2.c

2024-03-18 Thread Richard Earnshaw (lists)





On 15/03/2024 15:13, Thiago Jung Bauermann wrote:


Hello,

"Richard Earnshaw (lists)"  writes:


On 13/01/2024 20:46, Thiago Jung Bauermann wrote:

diff --git a/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c 
b/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
index 5b7774825442..da283a06a54d 100644
--- a/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
+++ b/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
@@ -2,6 +2,7 @@

  /* { dg-do assemble } */
  /* { dg-require-effective-target arm_v8_1m_main_cde_mve_fp_ok } */
+/* { dg-options "-fpermissive" } */
  /* { dg-add-options arm_v8_1m_main_cde_mve_fp } */

  /* The error checking files are split since there are three kinds of
@@ -115,73 +116,73 @@ uint8x16_t test_bad_immediates (uint8x16_t n, uint8x16_t 
m, int someval,

/* `imm' is of wrong type.  */
accum += __arm_vcx1q_u8 (0, "");/* { dg-error {argument 
2 to '__builtin_arm_vcx1qv16qi' must be a constant immediate in range \[0-4095\]} } */
-  /* { dg-warning {passing argument 2 of '__builtin_arm_vcx1qv16qi' makes integer from 
pointer without a cast \[-Wint-conversion\]} "" { target *-*-* } 117 } */
+  /* { dg-warning {passing argument 2 of '__builtin_arm_vcx1qv16qi' makes integer from 
pointer without a cast \[-Wint-conversion\]} "" { target *-*-* } 118 } */


Absolute line numbers are a pain, but I think we can use '.-1' (without the 
quotes) in
these cases to minimize the churn.


That worked, thank you for the tip.


If that works, ok with that change.


I took the opportunity to request commit access to the GCC repo so that
I can commit the patch myself. Sorry for the delay. I'll commit it as
soon as I get it.

Thank you for the patch review! I'm including below the updated version.


I pushed this, thanks.

R.



--
Thiago


 From 78e70788da5ed849d7828b0219d3aa8955ad0fea Mon Sep 17 00:00:00 2001
From: Thiago Jung Bauermann 
Date: Sat, 13 Jan 2024 14:28:07 -0300
Subject: [PATCH v2] testsuite: Turn errors back into warnings in
  arm/acle/cde-mve-error-2.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Since commit 2c3db94d9fd ("c: Turn int-conversion warnings into
permerrors") the test fails with errors such as:

   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
32)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
33)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
34)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
35)
 ⋮
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 118 (test for 
warnings, line 117)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
119)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 120 (test for 
warnings, line 119)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
121)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 122 (test for 
warnings, line 121)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
123)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 124 (test for 
warnings, line 123)
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
125)
 ⋮
   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0  (test for excess errors)

There's a total of 1016 errors.  Here's a sample of the excess errors:

   Excess errors:
   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:117:31: 
error: passing argument 2 of '__builtin_arm_vcx1qv16qi' makes integer from 
pointer without a cast [-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:119:3: 
error: passing argument 3 of '__builtin_arm_vcx1qav16qi' makes integer from 
pointer without a cast [-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:121:3: 
error: passing argument 3 of '__builtin_arm_vcx2qv16qi' makes integer from 
pointer without a cast [-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:123:3: 
error: passing argument 3 of '__builtin_arm_vcx2qv16qi' makes integer from 
pointer without a cast [-Wint-conversion]

The test expects these messages to be warnings, not errors.  My first try
was to change it to expect them as errors instead.  This didn't work, IIUC
because the error prevents the compiler from continuing processing the file
and thus other errors which are expected by the test don't get emitted.

Therefore, add -fpermissive so that the test behaves as it did previously.
Because of the additional line in the header, the line numbers of the
expected warnings don't match anymore so replace them with ".-1" as
suggested by Richard Earnshaw.

Tested on armv8l-linux-gnueabihf.

gcc/testsuite/ChangeLog:
* gcc.target/arm/acle/cde-mve-error-

Re: [PATCH] aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]

2024-03-14 Thread Richard Earnshaw (lists)





On 14/03/2024 08:37, Jakub Jelinek wrote:

Hi!

The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.

So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.

Bootstrapped/regtested on aarch64-linux, ok for trunk?


An alternative fix would be to use a mode_attr to pick a different 
predicate for TImode.  But that's probably just a matter of taste; I'm 
not sure that one would be better than the other in reality.


OK (or with my suggestion if you prefer).

R.



2024-03-14  Jakub Jelinek  

PR target/114310
* config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For
TImode force newval into a register.

* gcc.dg/pr114310.c: New test.

--- gcc/config/aarch64/aarch64.cc.jj2024-03-12 10:16:12.024101665 +0100
+++ gcc/config/aarch64/aarch64.cc   2024-03-13 18:55:39.147986554 +0100
@@ -24693,6 +24693,8 @@ aarch64_expand_compare_and_swap (rtx ope
  rval = copy_to_mode_reg (r_mode, oldval);
else
emit_move_insn (rval, gen_lowpart (r_mode, oldval));
+  if (mode == TImode)
+   newval = force_reg (mode, newval);
  
emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,

   newval, mod_s));
--- gcc/testsuite/gcc.dg/pr114310.c.jj  2024-03-13 19:09:25.322597418 +0100
+++ gcc/testsuite/gcc.dg/pr114310.c 2024-03-13 19:08:50.802073314 +0100
@@ -0,0 +1,20 @@
+/* PR target/114310 */
+/* { dg-do run { target int128 } } */
+
+volatile __attribute__((aligned (sizeof (__int128_t __int128_t v = 10;
+
+int
+main ()
+{
+#if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
+  if (__sync_val_compare_and_swap (, (__int128_t) 10, (__int128_t) 0) != 10)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 10, (__int128_t) 15) != 0)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 0, (__int128_t) 42) != 0)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 31, (__int128_t) 35) != 42)
+__builtin_abort ();
+#endif
+  return 0;
+}

Jakub

Re: [PATCH v2] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Richard Earnshaw





On 13/03/2024 12:12, Maxim Kuvyrkov wrote:

Changes in v2:
- Better changelog entry.
- NFC.


This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

   - FAIL now PASS [FAIL => PASS]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

   - UNSUPPORTED disappears[UNSUP=> ]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

   - UNSUPPORTED appears   [ =>UNSUP]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

   - UNRESOLVED disappears [UNRES=> ]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: Remove dg-run.
* gcc.dg/vect/complex/complex-operations-run.c: Likewise.
* gcc.dg/vect/pr113576.c: Remove dg-run.  Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-40.c: Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-41.c: Likewise.
* gcc.dg/vect/pr101145inf.c: Likewise.
* gcc.dg/vect/pr101145inf_1.c: Likewise.
* gcc.dg/vect/pr108316.c: Likewise.
* gcc.dg/vect/pr109011-1.c: Likewise.
* gcc.dg/vect/pr109011-2.c: Likewise.
* gcc.dg/vect/pr109011-3.c: Likewise.
* gcc.dg/vect/pr109011-4.c: Likewise.
* gcc.dg/vect/pr109011-5.c: Likewise.
* gcc.dg/vect/pr111846.c: Likewise.
* gcc.dg/vect/pr111860-2.c: Likewise.
* gcc.dg/vect/pr111860-3.c: Likewise.
* gcc.dg/vect/pr113002.c: Likewise.
* gcc.dg/vect/pr84711.c: Likewise.
* gcc.dg/vect/pr85597.c: Likewise.
* gcc.dg/vect/pr88497-1.c: Likewise.
* gcc.dg/vect/pr88497-2.c: Likewise.
* gcc.dg/vect/pr88497-3.c: Likewise.
* gcc.dg/vect/pr88497-4.c: Likewise.
* gcc.dg/vect/pr88497-5.c: Likewise.
* gcc.dg/vect/pr88497-7.c: Likewise.
* gcc.dg/vect/pr92347.c: Likewise.
* gcc.dg/vect/pr93069.c: Likewise.
* gcc.dg/vect/pr97241.c: Likewise.
* gcc.dg/vect/pr99102.c: Likewise.
* gcc.dg/vect/vect-early-break_65.c: Likewise.
* gcc.dg/vect/vect-fold-1.c: Likewise.
* gcc.dg/vect/vect-ifcvt-19.c: Likewise.
* gcc.dg/vect/vect-ifcvt-20.c: Likewise.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c: Likewise.
* gcc.dg/vect/vect-singleton_1.c: Likewise.
* g++.dg/vect/pr84556.cc: Likewise.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise.
* gfortran.dg/vect/pr77848.f: Likewise.
* gfortran.dg/vect/pr90913.f90: Likewise.


OK.

(I wonder how many of the target-specific additional options are

Re: [PATCH][GCC] aarch64: Fix SCHEDULER_IDENT for Cortex-A520

2024-03-13 Thread Richard Earnshaw





On 12/03/2024 14:08, Richard Ball wrote:

The SCHEDULER_IDENT for this CPU was incorrectly
set to cortexa55, which is incorrect. This can cause
sub-optimal asm to be generated.

Ok for trunk?

gcc/ChangeLog:
PR target/114272
* config/aarch64/aarch64-cores.def (AARCH64_CORE):
Change SCHEDULER_IDENT from cortexa55 to cortexa53
for Cortex-A520.


I don't see having this as a separate patch to the one for Cortex-A510 
as having any value.


Please merge the two together.  A merged patch is pre-approved.

R.

Re: [PATCH] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Richard Earnshaw





On 13/03/2024 10:58, Maxim Kuvyrkov wrote:

This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

   - FAIL now PASS [FAIL => PASS]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

   - UNSUPPORTED disappears[UNSUP=> ]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

   - UNSUPPORTED appears   [ =>UNSUP]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

   - UNRESOLVED disappears [UNRES=> ]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* g++.dg/vect/pr84556.cc: Fixup.
* gcc.dg/vect/complex/complex-operations-run.c Fixup.
* gcc.dg/vect/gimplefe-40.c Fixup.
* gcc.dg/vect/gimplefe-41.c Fixup.
* gcc.dg/vect/pr101145inf.c Fixup.
* gcc.dg/vect/pr101145inf_1.c Fixup.
* gcc.dg/vect/pr108316.c Fixup.
* gcc.dg/vect/pr109011-1.c Fixup.
* gcc.dg/vect/pr109011-2.c Fixup.
* gcc.dg/vect/pr109011-3.c Fixup.
* gcc.dg/vect/pr109011-4.c Fixup.
* gcc.dg/vect/pr109011-5.c Fixup.
* gcc.dg/vect/pr111846.c Fixup.
* gcc.dg/vect/pr111860-2.c Fixup.
* gcc.dg/vect/pr111860-3.c Fixup.
* gcc.dg/vect/pr113002.c Fixup.
* gcc.dg/vect/pr113576.c Fixup.
* gcc.dg/vect/pr84711.c Fixup.
* gcc.dg/vect/pr85597.c Fixup.
* gcc.dg/vect/pr88497-1.c Fixup.
* gcc.dg/vect/pr88497-2.c Fixup.
* gcc.dg/vect/pr88497-3.c Fixup.
* gcc.dg/vect/pr88497-4.c Fixup.
* gcc.dg/vect/pr88497-5.c Fixup.
* gcc.dg/vect/pr88497-7.c Fixup.
* gcc.dg/vect/pr92347.c Fixup.
* gcc.dg/vect/pr93069.c Fixup.
* gcc.dg/vect/pr97241.c Fixup.
* gcc.dg/vect/pr99102.c Fixup.
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c Fixup.
* gcc.dg/vect/vect-early-break_65.c Fixup.
* gcc.dg/vect/vect-fold-1.c Fixup.
* gcc.dg/vect/vect-ifcvt-19.c Fixup.
* gcc.dg/vect/vect-ifcvt-20.c Fixup.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c Fixup.
* gcc.dg/vect/vect-singleton_1.c Fixup.
* gfortran.dg/vect/fast-math-mgrid-resid.f Fixup.
* gfortran.dg/vect/pr77848.f Fixup.
* gfortran.dg/vect/pr90913.f90 Fixup.


Thanks for looking into this, I agree that changing to 
dg-additional-options looks the right choice.


The only thing to be wary of is that later 'dg-options' directives may 
override dg-additional-options directives; you might want to test at 
least one target where there are target-specific dg-options that you've 
not modified.


The patch is OK, but the ChangeLog is not!  Fixup doesn't

[COMMITTED] arm: testsuite: tweak bics_3.c [PR113542]

2024-03-08 Thread Richard Earnshaw


This test was too simple, which meant that the compiler was sometimes
able to find a better optimization of the code than using a BICS
instruction.  Fix this by changing the test slightly to produce a
sequence where BICS should always be the preferred solution.

gcc/testsuite:
PR target/113542
* gcc.target/arm/bics_3.c: Adjust code to something which should
always result in BICS.
---
 gcc/testsuite/gcc.target/arm/bics_3.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bics_3.c b/gcc/testsuite/gcc.target/arm/bics_3.c
index e056b264e15..4d6938948a1 100644
--- a/gcc/testsuite/gcc.target/arm/bics_3.c
+++ b/gcc/testsuite/gcc.target/arm/bics_3.c
@@ -2,13 +2,11 @@
 /* { dg-options "-O2 --save-temps -fno-inline" } */
 /* { dg-require-effective-target arm32 } */
 
-extern void abort (void);
-
 int
 bics_si_test (int a, int b)
 {
-  if (a & ~b)
-return 1;
+  if ((a & ~b) >= 0)
+return 3;
   else
 return 0;
 }
@@ -16,8 +14,8 @@ bics_si_test (int a, int b)
 int
 bics_si_test2 (int a, int b)
 {
-  if (a & ~ (b << 2))
-return 1;
+  if ((a & ~ (b << 2)) >= 0)
+return 3;
   else
 return 0;
 }
@@ -28,13 +26,12 @@ main (void)
   int a = 5;
   int b = 5;
   int c = 20;
-  if (bics_si_test (a, b))
-abort ();
-  if (bics_si_test2 (c, b))
-abort ();
+  if (bics_si_test (a, b) != 3)
+__builtin_abort ();
+  if (bics_si_test2 (c, b) != 3)
+__builtin_abort ();
   return 0;
 }
 
 /* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 } } */
 /* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl #2" 1 } } */
-

Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2024-03-07 Thread Richard Earnshaw (lists)

On 06/03/2024 20:28, Alexandre Oliva wrote:
> On Mar  1, 2024, "Richard Earnshaw (lists)"  wrote:
> 
>> On 01/03/2024 04:38, Alexandre Oliva wrote:
>>> Thanks for the review.
> 
>> For closure, Jakub has just pushed a patch to the generic code, so I
>> don't think we need this now.
> 
> ACK.  I see the c2x-stdarg-4.c test is now passing in our arm-eabi
> gcc-13 tree.  Thank you all.
> 
> Alas, the same nightly build showed a new riscv fail in c23-stdarg-6.c,
> that also got backported to gcc-13.  Presumably it's failing in the
> trunk as well, both riscv32-elf and riscv64-elf.
> 
> I haven't looked into whether it's a regression brought about by the
> patch or just a new failure mode that the new test exposed.  Either way,
> I'm not sure whether to link this new failure to any of the associated
> PRs or to file a new one, but, FTR, I'm going to look into it.
> 

I'd suggest a new pr.  It's easier to track than re-opening an existing on.

R.

> -- 
> Alexandre Oliva, happy hacker    https://FSFLA.org/blogs/lxo/ 
> <https://FSFLA.org/blogs/lxo/>
>    Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] gomp: testsuite: improve compatibility of bad-array-section-3.c [PR113428]

2024-03-06 Thread Richard Earnshaw


This test generates different warnings on ilp32 targets because the size
of an integer matches the size of a pointer.  Avoid this by using
signed char.

gcc/testsuite:

PR testsuite/113428
* gcc.dg/gomp/bad-array-section-c-3.c: Use signed char instead
of int.
---

I think this fixes the issues seen on ilp32 machines, without substantially
changing what the test does, but a second set of eyes wouldn't hurt.

 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c b/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
index 8be15ced8c0..431af71c422 100644
--- a/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
+++ b/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
@@ -1,15 +1,15 @@
 /* { dg-do compile } */
 
 struct S {
-  int *ptr;
+  signed char *ptr;
 };
 
 int main()
 {
-  int arr[20];
+  signed char arr[20];
 
   /* Reject array section in compound initialiser.  */
-#pragma omp target map( (struct S) { .ptr = (int *) arr[5:5] } )
+#pragma omp target map( (struct S) { .ptr = (signed char *) arr[5:5] } )
 /* { dg-error {expected '\]' before ':' token} "" { target *-*-* } .-1 } */
 /* { dg-warning {cast to pointer from integer of different size} "" { target *-*-* } .-2 } */
 /* { dg-message {sorry, unimplemented: unsupported map expression} "" { target *-*-* } .-3 } */

Re: [PATCH] arm: Support -mfdpic for more targets

2024-03-06 Thread Richard Earnshaw (lists)

On 06/03/2024 05:07, Fangrui Song wrote:
> On Fri, Feb 23, 2024 at 7:33 PM Fangrui Song  wrote:
>>
>> From: Fangrui Song 
>>
>> Targets that are not arm*-*-uclinuxfdpiceabi can use -S -mfdpic, but -c
>> -mfdpic does not pass --fdpic to gas.  This is an unnecessary
>> restriction.  Just define the ASM_SPEC in bpabi.h.
>>
>> Additionally, use armelf[b]_linux_fdpiceabi emulations for -mfdpic in
>> linux-eabi.h.  This will allow a future musl fdpic port to use the
>> desired BFD emulation.
>>
>> gcc/ChangeLog:
>>
>> * config/arm/bpabi.h (TARGET_FDPIC_ASM_SPEC): Transform -mfdpic.
>> * config/arm/linux-eabi.h (TARGET_FDPIC_LINKER_EMULATION): Define.
>> (SUBTARGET_EXTRA_LINK_SPEC): Use TARGET_FDPIC_LINKER_EMULATION
>> if -mfdpic.
>> ---
>>  gcc/config/arm/bpabi.h  | 2 +-
>>  gcc/config/arm/linux-eabi.h | 5 -
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
>> index 7a279f3ed3c..6778be1a8bf 100644
>> --- a/gcc/config/arm/bpabi.h
>> +++ b/gcc/config/arm/bpabi.h
>> @@ -55,7 +55,7 @@
>>  #define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*"\
>>    "|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"
>>
>> -#define TARGET_FDPIC_ASM_SPEC ""
>> +#define TARGET_FDPIC_ASM_SPEC "%{mfdpic: --fdpic}"
>>
>>  #define BE8_LINK_SPEC  \
>>    "%{!r:%{!mbe32:%:be8_linkopt(%{mlittle-endian:little}"   \
>> diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
>> index eef791f6a02..0c5c58e4928 100644
>> --- a/gcc/config/arm/linux-eabi.h
>> +++ b/gcc/config/arm/linux-eabi.h
>> @@ -46,12 +46,15 @@
>>  #undef  TARGET_LINKER_EMULATION
>>  #if TARGET_BIG_ENDIAN_DEFAULT
>>  #define TARGET_LINKER_EMULATION "armelfb_linux_eabi"
>> +#define TARGET_FDPIC_LINKER_EMULATION "armelfb_linux_fdpiceabi"
>>  #else
>>  #define TARGET_LINKER_EMULATION "armelf_linux_eabi"
>> +#define TARGET_FDPIC_LINKER_EMULATION "armelf_linux_fdpiceabi"
>>  #endif
>>
>>  #undef  SUBTARGET_EXTRA_LINK_SPEC
>> -#define SUBTARGET_EXTRA_LINK_SPEC " -m " TARGET_LINKER_EMULATION
>> +#define SUBTARGET_EXTRA_LINK_SPEC " -m %{mfdpic: " \
>> +  TARGET_FDPIC_LINKER_EMULATION ";:" TARGET_LINKER_EMULATION "}"
>>
>>  /* GNU/Linux on ARM currently supports three dynamic linkers:
>> - ld-linux.so.2 - for the legacy ABI
>> --
>> 2.44.0.rc1.240.g4c46232300-goog
>>
> 
> Ping:)
> 

We're in stage4 at present and this is new material.  I'll look at it after the 
branch has been cut.

R.

> 
> -- 
> 宋方睿

[PATCH] arm: check for low register before applying peephole [PR113510]

2024-03-05 Thread Richard Earnshaw


For thumb1, when using a peephole to fuse

mov reg, #const
add reg, reg, SP

into

add reg, SP, #const

we must first check that reg is a low register, otherwise we will ICE
when trying to recognize the resulting insn.

gcc/ChangeLog:

PR target/113510
* config/arm/thumb1.md (peephole2 to fuse mov imm/add SP): Use
low_register_operand.
---

This appears to have gone latent again, but checked against the known
failing version.

 gcc/config/arm/thumb1.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 14d6df580af..d7074b43f60 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -113,7 +113,7 @@ (define_insn_and_split "*thumb1_addsi3"
 ;; Reloading and elimination of the frame pointer can
 ;; sometimes cause this optimization to be missed.
 (define_peephole2
-  [(set (match_operand:SI 0 "arm_general_register_operand" "")
+  [(set (match_operand:SI 0 "low_register_operand" "")
 	(match_operand:SI 1 "const_int_operand" ""))
(set (match_dup 0)
 	(plus:SI (match_dup 0) (reg:SI SP_REGNUM)))]

Re: [PATCH v2] testsuite, arm: Fix testcase arm/pr112337.c to check for the options first

2024-03-05 Thread Richard Earnshaw (lists)

On 19/02/2024 10:11, Saurabh Jha wrote:
> 
> On 2/9/2024 2:57 PM, Richard Earnshaw (lists) wrote:
>> On 30/01/2024 17:07, Saurabh Jha wrote:
>>> Hey,
>>>
>>> Previously, this test was added to fix this bug: 
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337. However, it did not 
>>> check the compilation options before using them, leading to errors.
>>>
>>> This patch fixes the test by first checking whether it can use the options 
>>> before using them.
>>>
>>> Tested for arm-none-eabi and found no regressions. The output of check-gcc 
>>> with RUNTESTFLAGS="arm.exp=*" changed like this:
>>>
>>> Before:
>>> # of expected passes  5963
>>> # of unexpected failures  64
>>>
>>> After:
>>> # of expected passes  5964
>>> # of unexpected failures  63
>>>
>>> Ok for master?
>>>
>>> Regards,
>>> Saurabh
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>  * gcc.target/arm/pr112337.c: Check whether we can use the 
>>> compilation options before using them.
>> My apologies for missing this earlier.  It didn't show up in patchwork. 
>> That's most likely because the attachment is a binary blob instead of 
>> text/plain.  That also means that the Linaro CI system hasn't seen this 
>> patch either.  Please can you fix your mailer to add plain text patch files.
>>
>> -/* { dg-options "-O2 -march=armv8.1-m.main+fp.dp+mve.fp -mfloat-abi=hard" } 
>> */
>> +/* { dg-require-effective-target arm_hard_ok } */
>> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
>> +/* { dg-options "-O2 -mfloat-abi=hard" } */
>> +/* { dg-add-options arm_v8_1m_mve } */
>>
>> This is moving in the right direction, but it adds more than necessary now: 
>> checking for, and adding -mfloat-abi=hard is not necessary any more as 
>> arm_v8_1m_mve_ok will work out what float-abi flags are needed to make the 
>> options work. (What's more, it will prevent the test from running if the 
>> base configuration of the compiler is incompatible with the hard float ABI, 
>> which is more than we need.).
>>
>> So please can you re-spin removing the hard-float check and removing that 
>> from dg-options.
>>
>> Thanks,
>> R.
> 
> Hi Richard,
> 
> Agreed with your comments. Please find the patch with the suggested changes 
> attached.
> 
> Regards,
> 
> Saurabh
> 


Thanks, I've pushed this.  Next time, please can you put the commit message 
inside the patch, so that I can apply things automatically.  Eg: 

>From 1c92c94074449929f40cea99a6450bcde3aec12f Mon Sep 17 00:00:00 2001
From: Saurabh Jha 
Date: Tue, 30 Jan 2024 15:03:36 +
Subject: [PATCH] Fix testcase pr112337.c to check the options [PR112337]

gcc.target/arm/pr112337.c was failing to validate that adding MVE options
was compatible with the test environment, so add the missing checks.

gcc/testsuite/ChangeLog:

PR target/112337
* gcc.target/arm/pr112337.c: Check for, then use the right MVE
options.

---
 gcc/testsuite/gcc.target/arm/pr112337.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr112337.c 
b/gcc/testsuite/gcc.target/arm/pr112337.c

...

Re: [PATCH v4] aarch64,arm: Move branch-protection data to targets

2024-03-01 Thread Richard Earnshaw (lists)

On 11/01/2024 14:35, Szabolcs Nagy wrote:
> The branch-protection types are target specific, not the same on arm
> and aarch64.  This currently affects pac-ret+b-key, but there will be
> a new type on aarch64 that is not relevant for arm.
> 
> After the move, change aarch_ identifiers to aarch64_ or arm_ as
> appropriate.
> 
> Refactor aarch_validate_mbranch_protection to take the target specific
> branch-protection types as an argument.
> 
> In case of invalid input currently no hints are provided: the way
> branch-protection types and subtypes can be mixed makes it difficult
> without causing confusion.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.md: Rename aarch_ to aarch64_.
>   * config/aarch64/aarch64.opt: Likewise.
>   * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Likewise.
>   * config/aarch64/aarch64.cc (aarch64_expand_prologue): Likewise.
>   (aarch64_expand_epilogue): Likewise.
>   (aarch64_post_cfi_startproc): Likewise.
>   (aarch64_handle_no_branch_protection): Copy and rename.
>   (aarch64_handle_standard_branch_protection): Likewise.
>   (aarch64_handle_pac_ret_protection): Likewise.
>   (aarch64_handle_pac_ret_leaf): Likewise.
>   (aarch64_handle_pac_ret_b_key): Likewise.
>   (aarch64_handle_bti_protection): Likewise.
>   (aarch64_override_options): Update branch protection validation.
>   (aarch64_handle_attr_branch_protection): Likewise.
>   * config/arm/aarch-common-protos.h (aarch_validate_mbranch_protection):
>   Pass branch protection type description as argument.
>   (struct aarch_branch_protect_type): Move from aarch-common.h.
>   * config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
>   Remove.
>   (aarch_handle_standard_branch_protection): Remove.
>   (aarch_handle_pac_ret_protection): Remove.
>   (aarch_handle_pac_ret_leaf): Remove.
>   (aarch_handle_pac_ret_b_key): Remove.
>   (aarch_handle_bti_protection): Remove.
>   (aarch_validate_mbranch_protection): Pass branch protection type
>   description as argument.
>   * config/arm/aarch-common.h (enum aarch_key_type): Remove.
>   (struct aarch_branch_protect_type): Remove.
>   * config/arm/arm-c.cc (arm_cpu_builtins): Remove aarch_ra_sign_key.
>   * config/arm/arm.cc (arm_handle_no_branch_protection): Copy and rename.
>   (arm_handle_standard_branch_protection): Likewise.
>   (arm_handle_pac_ret_protection): Likewise.
>   (arm_handle_pac_ret_leaf): Likewise.
>   (arm_handle_bti_protection): Likewise.
>   (arm_configure_build_target): Update branch protection validation.
>   * config/arm/arm.opt: Remove aarch_ra_sign_key.
> ---
> v4:
> - pass types as argument to validation.
> - make target specific types data static.
> 
>  gcc/config/aarch64/aarch64-c.cc  |  4 +-
>  gcc/config/aarch64/aarch64.cc| 75 
>  gcc/config/aarch64/aarch64.md|  2 +-
>  gcc/config/aarch64/aarch64.opt   |  2 +-
>  gcc/config/arm/aarch-common-protos.h | 19 ++-
>  gcc/config/arm/aarch-common.cc   | 71 --
>  gcc/config/arm/aarch-common.h| 20 
>  gcc/config/arm/arm-c.cc  |  2 -
>  gcc/config/arm/arm.cc| 55 +---
>  gcc/config/arm/arm.opt   |  3 --
>  10 files changed, 145 insertions(+), 108 deletions(-)
> 


OK

R.

Re: [PATCH v6 5/5] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2024-03-01 Thread Richard Earnshaw (lists)

On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch adds support for MVE Tail-Predicated Low Overhead Loops by using 
> the
> doloop funcitonality added to support predicated vectorized hardware loops.
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-protos.h (arm_target_bb_ok_for_lob): Change
>   declaration to pass basic_block.
>   (arm_attempt_dlstp_transform): New declaration.
>   * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): Define targethook.
>   (TARGET_PREDICT_DOLOOP_P): Likewise.
>   (arm_target_bb_ok_for_lob): Adapt condition.
>   (arm_mve_get_vctp_lanes): New function.
>   (arm_dl_usage_type): New internal enum.
>   (arm_get_required_vpr_reg): New function.
>   (arm_get_required_vpr_reg_param): New function.
>   (arm_get_required_vpr_reg_ret_val): New function.
>   (arm_mve_get_loop_vctp): New function.
>   (arm_mve_insn_predicated_by): New function.
>   (arm_mve_across_lane_insn_p): New function.
>   (arm_mve_load_store_insn_p): New function.
>   (arm_mve_impl_pred_on_outputs_p): New function.
>   (arm_mve_impl_pred_on_inputs_p): New function.
>   (arm_last_vect_def_insn): New function.
>   (arm_mve_impl_predicated_p): New function.
>   (arm_mve_check_reg_origin_is_num_elems): New function.
>   (arm_mve_dlstp_check_inc_counter): New function.
>   (arm_mve_dlstp_check_dec_counter): New function.
>   (arm_mve_loop_valid_for_dlstp): New function.
>   (arm_predict_doloop_p): New function.
>   (arm_loop_unroll_adjust): New function.
>   (arm_emit_mve_unpredicated_insn_to_seq): New function.
>   (arm_attempt_dlstp_transform): New function.
>   * config/arm/arm.opt (mdlstp): New option.
>   * config/arm/iteratords.md (dlstp_elemsize, letp_num_lanes,
>   letp_num_lanes_neg, letp_num_lanes_minus_1): New attributes.
>   (DLSTP, LETP): New iterators.
>   (predicated_doloop_end_internal): New pattern.
>   (dlstp_insn): New pattern.
>   * config/arm/thumb2.md (doloop_end): Adapt to support tail-predicated
>   loops.
>   (doloop_begin): Likewise.
>   * config/arm/types.md (mve_misc): New mve type to represent
>   predicated_loop_end insn sequences.
>   * config/arm/unspecs.md:
>   (DLSTP8, DLSTP16, DLSTP32, DSLTP64,
>   LETP8, LETP16, LETP32, LETP64): New unspecs for DLSTP and LETP.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/lob.h: Add new helpers.
>   * gcc.target/arm/lob1.c: Use new helpers.
>   * gcc.target/arm/lob6.c: Likewise.
>   * gcc.target/arm/dlstp-compile-asm-1.c: New test.
>   * gcc.target/arm/dlstp-compile-asm-2.c: New test.
>   * gcc.target/arm/dlstp-compile-asm-3.c: New test.
>   * gcc.target/arm/dlstp-int8x16.c: New test.
>   * gcc.target/arm/dlstp-int8x16-run.c: New test.
>   * gcc.target/arm/dlstp-int16x8.c: New test.
>   * gcc.target/arm/dlstp-int16x8-run.c: New test.
>   * gcc.target/arm/dlstp-int32x4.c: New test.
>   * gcc.target/arm/dlstp-int32x4-run.c: New test.
>   * gcc.target/arm/dlstp-int64x2.c: New test.
>   * gcc.target/arm/dlstp-int64x2-run.c: New test.
>   * gcc.target/arm/dlstp-invalid-asm.c: New test.
> 
> Co-authored-by: Stam Markianos-Wright 
> ---
>  gcc/config/arm/arm-protos.h   |4 +-
>  gcc/config/arm/arm.cc | 1249 -
>  gcc/config/arm/arm.opt|3 +
>  gcc/config/arm/iterators.md   |   15 +
>  gcc/config/arm/mve.md |   50 +
>  gcc/config/arm/thumb2.md  |  138 +-
>  gcc/config/arm/types.md   |6 +-
>  gcc/config/arm/unspecs.md |   14 +-
>  gcc/testsuite/gcc.target/arm/lob.h|  128 +-
>  gcc/testsuite/gcc.target/arm/lob1.c   |   23 +-
>  gcc/testsuite/gcc.target/arm/lob6.c   |8 +-
>  .../gcc.target/arm/mve/dlstp-compile-asm-1.c  |  146 ++
>  .../gcc.target/arm/mve/dlstp-compile-asm-2.c  |  749 ++
>  .../gcc.target/arm/mve/dlstp-compile-asm-3.c  |   46 +
>  .../gcc.target/arm/mve/dlstp-int16x8-run.c|   44 +
>  .../gcc.target/arm/mve/dlstp-int16x8.c|   31 +
>  .../gcc.target/arm/mve/dlstp-int32x4-run.c|   45 +
>  .../gcc.target/arm/mve/dlstp-int32x4.c|   31 +
>  .../gcc.target/arm/mve/dlstp-int64x2-run.c|   48 +
>  .../gcc.target/arm/mve/dlstp-int64x2.c|   28 +
>  .../gcc.target/arm/mve/dlstp-int8x16-run.c|   44 +
>  .../gcc.target/arm/mve/dlstp-int8x16.c|   32 +
>  .../gcc.target/arm/mve/dlstp-invalid-asm.c|  521 +++
>  23 files changed, 3321 insertions(+), 82 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-3.c
>  create mode 100644

Re: [PATCH v6 4/5] doloop: Add support for predicated vectorized loops

2024-03-01 Thread Richard Earnshaw (lists)

On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch adds support in the target agnostic doloop pass for the detection 
> of
> predicated vectorized hardware loops.  Arm is currently the only target that
> will make use of this feature.
> 
> gcc/ChangeLog:
> 
>   * df-core.cc (df_bb_regno_only_def_find): New helper function.
>   * df.h (df_bb_regno_only_def_find): Declare new function.
>   * loop-doloop.cc (doloop_condition_get): Add support for detecting
>   predicated vectorized hardware loops.
>   (doloop_modify): Add support for GTU condition checks.
>   (doloop_optimize): Update costing computation to support alterations to
>   desc->niter_expr by the backend.
> 
> Co-authored-by: Stam Markianos-Wright 
> ---
>  gcc/df-core.cc |  15 +
>  gcc/df.h   |   1 +
>  gcc/loop-doloop.cc | 164 +++--
>  3 files changed, 113 insertions(+), 67 deletions(-)
> 

As discussed, I think we should wait for gcc-15 for this[*]; I know it was 
initially submitted during stage1 but it's had to go through a lot of revision 
since then and we're very close to wanting to cut the release branch.

R.

[*] Unless an independent reviewer wants to sign this off anyway.

Re: [PATCH v6 3/5] arm: Fix a wrong attribute use and remove unused unspecs and iterators

2024-03-01 Thread Richard Earnshaw (lists)

On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch fixes the erroneous use of a mode attribute without a mode iterator
> in the pattern and removes unused unspecs and iterators.
> 
> gcc/ChangeLog:
> 
>   * config/arm/iterators.md (supf): Remove VMLALDAVXQ_U, VMLALDAVXQ_P_U,
>   VMLALDAVAXQ_U cases.
>   (VMLALDAVXQ): Remove iterator.
>   (VMLALDAVXQ_P): Likewise.
>   (VMLALDAVAXQ): Likewise.
>   * config/arm/mve.md (mve_vstrwq_p_fv4sf): Replace use of 
>   mode iterator attribute with V4BI mode.
>   * config/arm/unspecs.md (VMLALDAVXQ_U, VMLALDAVXQ_P_U,
>   VMLALDAVAXQ_U): Remove unused unspecs.
> ---
>  gcc/config/arm/iterators.md | 9 +++--
>  gcc/config/arm/mve.md   | 2 +-
>  gcc/config/arm/unspecs.md   | 3 ---
>  3 files changed, 4 insertions(+), 10 deletions(-)
> 

OK

R.

Re: [PATCH v6 2/5] arm: Annotate instructions with mve_safe_imp_xlane_pred

2024-03-01 Thread Richard Earnshaw (lists)

On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch annotates some MVE across lane instructions with a new attribute.
> We use this attribute to let the compiler know that these instructions can be
> safely implicitly predicated when tail predicating if their operands are
> guaranteed to have zeroed tail predicated lanes.  These instructions were
> selected because having the value 0 in those lanes or 'tail-predicating' those
> lanes have the same effect.
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.md (mve_safe_imp_xlane_pred): New attribute.
>   * config/arm/iterators.md (mve_vmaxmin_safe_imp): New iterator
>   attribute.
>   * config/arm/mve.md (vaddvq_s, vaddvq_u, vaddlvq_s, vaddlvq_u,
>   vaddvaq_s, vaddvaq_u, vmaxavq_s, vmaxvq_u, vmladavq_s, vmladavq_u,
>   vmladavxq_s, vmlsdavq_s, vmlsdavxq_s, vaddlvaq_s, vaddlvaq_u,
>   vmlaldavq_u, vmlaldavq_s, vmlaldavq_u, vmlaldavxq_s, vmlsldavq_s,
>   vmlsldavxq_s, vrmlaldavhq_u, vrmlaldavhq_s, vrmlaldavhxq_s,
>   vrmlsldavhq_s, vrmlsldavhxq_s, vrmlaldavhaq_s, vrmlaldavhaq_u,
>   vrmlaldavhaxq_s, vrmlsldavhaq_s, vrmlsldavhaxq_s, vabavq_s, vabavq_u,
>   vmladavaq_u, vmladavaq_s, vmladavaxq_s, vmlsdavaq_s, vmlsdavaxq_s,
>   vmlaldavaq_s, vmlaldavaq_u, vmlaldavaxq_s, vmlsldavaq_s,
>   vmlsldavaxq_s): Added mve_safe_imp_xlane_pred.
> ---
>  gcc/config/arm/arm.md   |  6 ++
>  gcc/config/arm/iterators.md |  8 
>  gcc/config/arm/mve.md   | 12 
>  3 files changed, 26 insertions(+)
> 

OK

R.

Re: [PATCH v6 1/5] arm: Add define_attr to to create a mapping between MVE predicated and unpredicated insns

2024-03-01 Thread Richard Earnshaw (lists)

On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch adds an attribute to the mve md patterns to be able to identify
> predicable MVE instructions and what their predicated and unpredicated 
> variants
> are.  This attribute is used to encode the icode of the unpredicated variant 
> of
> an instruction in its predicated variant.
> 
> This will make it possible for us to transform VPT-predicated insns in
> the insn chain into their unpredicated equivalents when transforming the loop
> into a MVE Tail-Predicated Low Overhead Loop. For example:
> `mve_vldrbq_z_ -> mve_vldrbq_`.
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.md (mve_unpredicated_insn): New attribute.
>   * config/arm/arm.h (MVE_VPT_PREDICATED_INSN_P): New define.
>   (MVE_VPT_UNPREDICATED_INSN_P): Likewise.
>   (MVE_VPT_PREDICABLE_INSN_P): Likewise.
>   * config/arm/vec-common.md (mve_vshlq_): Add attribute.
>   * config/arm/mve.md (arm_vcx1q_p_v16qi): Add attribute.
>   (arm_vcx1qv16qi): Likewise.
>   (arm_vcx1qav16qi): Likewise.
>   (arm_vcx1qv16qi): Likewise.
>   (arm_vcx2q_p_v16qi): Likewise.
>   (arm_vcx2qv16qi): Likewise.
>   (arm_vcx2qav16qi): Likewise.
>   (arm_vcx2qv16qi): Likewise.
>   (arm_vcx3q_p_v16qi): Likewise.
>   (arm_vcx3qv16qi): Likewise.
>   (arm_vcx3qav16qi): Likewise.
>   (arm_vcx3qv16qi): Likewise.
>   (@mve_q_): Likewise.
>   (@mve_q_int_): Likewise.
>   (@mve_q_v4si): Likewise.
>   (@mve_q_n_): Likewise.
>   (@mve_q_r_): Likewise.
>   (@mve_q_f): Likewise.
>   (@mve_q_m_): Likewise.
>   (@mve_q_m_n_): Likewise.
>   (@mve_q_m_r_): Likewise.
>   (@mve_q_m_f): Likewise.
>   (@mve_q_int_m_): Likewise.
>   (@mve_q_p_v4si): Likewise.
>   (@mve_q_p_): Likewise.
>   (@mve_q_): Likewise.
>   (@mve_q_f): Likewise.
>   (@mve_q_m_): Likewise.
>   (@mve_q_m_f): Likewise.
>   (mve_vq_f): Likewise.
>   (mve_q): Likewise.
>   (mve_q_f): Likewise.
>   (mve_vadciq_v4si): Likewise.
>   (mve_vadciq_m_v4si): Likewise.
>   (mve_vadcq_v4si): Likewise.
>   (mve_vadcq_m_v4si): Likewise.
>   (mve_vandq_): Likewise.
>   (mve_vandq_f): Likewise.
>   (mve_vandq_m_): Likewise.
>   (mve_vandq_m_f): Likewise.
>   (mve_vandq_s): Likewise.
>   (mve_vandq_u): Likewise.
>   (mve_vbicq_): Likewise.
>   (mve_vbicq_f): Likewise.
>   (mve_vbicq_m_): Likewise.
>   (mve_vbicq_m_f): Likewise.
>   (mve_vbicq_m_n_): Likewise.
>   (mve_vbicq_n_): Likewise.
>   (mve_vbicq_s): Likewise.
>   (mve_vbicq_u): Likewise.
>   (@mve_vclzq_s): Likewise.
>   (mve_vclzq_u): Likewise.
>   (@mve_vcmp_q_): Likewise.
>   (@mve_vcmp_q_n_): Likewise.
>   (@mve_vcmp_q_f): Likewise.
>   (@mve_vcmp_q_n_f): Likewise.
>   (@mve_vcmp_q_m_f): Likewise.
>   (@mve_vcmp_q_m_n_): Likewise.
>   (@mve_vcmp_q_m_): Likewise.
>   (@mve_vcmp_q_m_n_f): Likewise.
>   (mve_vctpq): Likewise.
>   (mve_vctpq_m): Likewise.
>   (mve_vcvtaq_): Likewise.
>   (mve_vcvtaq_m_): Likewise.
>   (mve_vcvtbq_f16_f32v8hf): Likewise.
>   (mve_vcvtbq_f32_f16v4sf): Likewise.
>   (mve_vcvtbq_m_f16_f32v8hf): Likewise.
>   (mve_vcvtbq_m_f32_f16v4sf): Likewise.
>   (mve_vcvtmq_): Likewise.
>   (mve_vcvtmq_m_): Likewise.
>   (mve_vcvtnq_): Likewise.
>   (mve_vcvtnq_m_): Likewise.
>   (mve_vcvtpq_): Likewise.
>   (mve_vcvtpq_m_): Likewise.
>   (mve_vcvtq_from_f_): Likewise.
>   (mve_vcvtq_m_from_f_): Likewise.
>   (mve_vcvtq_m_n_from_f_): Likewise.
>   (mve_vcvtq_m_n_to_f_): Likewise.
>   (mve_vcvtq_m_to_f_): Likewise.
>   (mve_vcvtq_n_from_f_): Likewise.
>   (mve_vcvtq_n_to_f_): Likewise.
>   (mve_vcvtq_to_f_): Likewise.
>   (mve_vcvttq_f16_f32v8hf): Likewise.
>   (mve_vcvttq_f32_f16v4sf): Likewise.
>   (mve_vcvttq_m_f16_f32v8hf): Likewise.
>   (mve_vcvttq_m_f32_f16v4sf): Likewise.
>   (mve_vdwdupq_m_wb_u_insn): Likewise.
>   (mve_vdwdupq_wb_u_insn): Likewise.
>   (mve_veorq_s>): Likewise.
>   (mve_veorq_u>): Likewise.
>   (mve_veorq_f): Likewise.
>   (mve_vidupq_m_wb_u_insn): Likewise.
>   (mve_vidupq_u_insn): Likewise.
>   (mve_viwdupq_m_wb_u_insn): Likewise.
>   (mve_viwdupq_wb_u_insn): Likewise.
>   (mve_vldrbq_): Likewise.
>   (mve_vldrbq_gather_offset_): Likewise.
>   (mve_vldrbq_gather_offset_z_): Likewise.
>   (mve_vldrbq_z_): Likewise.
>   (mve_vldrdq_gather_base_v2di): Likewise.
>   (mve_vldrdq_gather_base_wb_v2di_insn): Likewise.
>   (mve_vldrdq_gather_base_wb_z_v2di_insn): Likewise.
>   (mve_vldrdq_gather_base_z_v2di): Likewise.
>   (mve_vldrdq_gather_offset_v2di): Likewise.
>   (mve_vldrdq_gather_offset_z_v2di): Likewise.
>   (mve_vldrdq_gather_shifted_offset_v2di): Likewise.
>   (mve_vldrdq_gather_shifted_offset_z_v2di): Likewise.
>   (mve_vldrhq_): Likewise.
>

Re: [PATCH] arm: Fixed C23 call compatibility with arm-none-eabi

2024-03-01 Thread Richard Earnshaw (lists)

On 19/02/2024 09:13, Torbjörn SVENSSON wrote:
> Ok for trunk and releases/gcc-13?
> Regtested on top of 945cb8490cb for arm-none-eabi, without any regression.
> 
> Backporting to releases/gcc-13 will change -std=c23 to -std=c2x.

Jakub has just pushed a different fix for this, so I don't think we need this 
now.

R.


> 
> --
> 
> In commit 4fe34cdcc80ac225b80670eabc38ac5e31ce8a5a, -std=c23 support was
> introduced to support functions without any named arguments.  For
> arm-none-eabi, this is not as simple as placing all arguments on the
> stack.  Align the caller to use r0, r1, r2 and r3 for arguments even for
> functions without any named arguments, as specified in the AAPCS.
> 
> Verify that the generic test case have the arguments are in the right
> order and add ARM specific test cases.
> 
> gcc/ChangeLog:
> 
>   * calls.h: Added the type of the function to function_arg_info.
>   * calls.cc: Save the type of the function.
>   * config/arm/arm.cc: Check in the AAPCS layout function if
>   function has no named args.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/torture/c23-stdarg-split-1a.c: Detect out of order
>   arguments.
>   * gcc.dg/torture/c23-stdarg-split-1b.c: Likewise.
>   * gcc.target/arm/aapcs/align_vaarg3.c: New test.
>   * gcc.target/arm/aapcs/align_vaarg4.c: New test.
> 
> Signed-off-by: Torbjörn SVENSSON 
> Co-authored-by: Yvan ROUX 
> ---
>  gcc/calls.cc  |  2 +-
>  gcc/calls.h   | 20 --
>  gcc/config/arm/arm.cc | 13 ---
>  .../gcc.dg/torture/c23-stdarg-split-1a.c  |  4 +-
>  .../gcc.dg/torture/c23-stdarg-split-1b.c  | 15 +---
>  .../gcc.target/arm/aapcs/align_vaarg3.c   | 37 +++
>  .../gcc.target/arm/aapcs/align_vaarg4.c   | 31 
>  7 files changed, 102 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/align_vaarg3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/align_vaarg4.c
> 
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 01f44734743..a1cc283b952 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1376,7 +1376,7 @@ initialize_argument_information (int num_actuals 
> ATTRIBUTE_UNUSED,
>with those made by function.cc.  */
>  
>/* See if this argument should be passed by invisible reference.  */
> -  function_arg_info arg (type, argpos < n_named_args);
> +  function_arg_info arg (type, fntype, argpos < n_named_args);
>if (pass_by_reference (args_so_far_pnt, arg))
>   {
> const bool callee_copies
> diff --git a/gcc/calls.h b/gcc/calls.h
> index 464a4e34e33..88836559ebe 100644
> --- a/gcc/calls.h
> +++ b/gcc/calls.h
> @@ -35,24 +35,33 @@ class function_arg_info
>  {
>  public:
>function_arg_info ()
> -: type (NULL_TREE), mode (VOIDmode), named (false),
> +: type (NULL_TREE), fntype (NULL_TREE), mode (VOIDmode), named (false),
>pass_by_reference (false)
>{}
>  
>/* Initialize an argument of mode MODE, either before or after promotion.  
> */
>function_arg_info (machine_mode mode, bool named)
> -: type (NULL_TREE), mode (mode), named (named), pass_by_reference (false)
> +: type (NULL_TREE), fntype (NULL_TREE), mode (mode), named (named),
> +pass_by_reference (false)
>{}
>  
>/* Initialize an unpromoted argument of type TYPE.  */
>function_arg_info (tree type, bool named)
> -: type (type), mode (TYPE_MODE (type)), named (named),
> +: type (type), fntype (NULL_TREE), mode (TYPE_MODE (type)), named 
> (named),
>pass_by_reference (false)
>{}
>  
> +  /* Initialize an unpromoted argument of type TYPE with a known function 
> type
> + FNTYPE.  */
> +  function_arg_info (tree type, tree fntype, bool named)
> +: type (type), fntype (fntype), mode (TYPE_MODE (type)), named (named),
> +pass_by_reference (false)
> +  {}
> +
>/* Initialize an argument with explicit properties.  */
>function_arg_info (tree type, machine_mode mode, bool named)
> -: type (type), mode (mode), named (named), pass_by_reference (false)
> +: type (type), fntype (NULL_TREE), mode (mode), named (named),
> +pass_by_reference (false)
>{}
>  
>/* Return true if the gimple-level type is an aggregate.  */
> @@ -96,6 +105,9 @@ public:
>   libgcc support functions).  */
>tree type;
>  
> +  /* The type of the function that has this argument, or null if not known.  
> */
> +  tree fntype;
> +
>/* The mode of the argument.  Depending on context, this might be
>   the mode of the argument type or the mode after promotion.  */
>machine_mode mode;
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 1cd69268ee9..98e149e5b7e 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -7006,7 +7006,7 @@ aapcs_libcall_value (machine_mode mode)
> numbers referred to here are those in the

Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2024-03-01 Thread Richard Earnshaw (lists)

On 01/03/2024 04:38, Alexandre Oliva wrote:
> Hello, Matthew,
> 
> Thanks for the review.

For closure, Jakub has just pushed a patch to the generic code, so I don't 
think we need this now.

R.

> 
> On Feb 26, 2024, Matthew Malcomson  wrote:
> 
>> I think you're right that the AAPCS32 requires all arguments to be passed in
>> registers for this testcase.
>> (Nit on the commit-message: It says that your reading of the AAPCS32
>> suggests
>> that the *caller* is correct -- I believe based on the change you
>> suggested you
>> meant *callee* is correct in expecting arguments in registers.)
> 
> Ugh, yeah, sorry about the typo.
> 
>> The approach you suggest looks OK to me -- I do notice that it doesn't
>> fix the
>> legacy ABI's of `atpcs` and `apcs` and guess it would be nicer to have them
>> working at the same time though would defer to maintainers on how
>> important that
>> is.
>> (For the benefit of others reading) I don't believe there is any ABI concern
>> with this since it's fixing something that is currently not working at
>> all and
>> only applies to c23 (so a change shouldn't have too much of an impact).
> 
>> You mention you chose to make the change in the arm backend rather
>> than general
>> code due to hesitancy to change the generic ABI-affecting code. That makes
>> sense to me, certainly at this late stage in the development cycle.
> 
> *nod* I wrote the patch in the following context: I hit the problem on
> the very first toolchain I started transitioning to gcc-13.  I couldn't
> really fathom the notion that this breakage could have survived an
> entire release cycle if it affected many targets, and sort of held on to
> an assumption that the abi used by our arm-eabi toolchain had to be an
> uncommon one.
> 
> All of this hypothesizing falls apart by the now apparent knowledge that
> the test is faling elsewhere as well, even on other ARM ABIs, it just
> hadn't been addressed yet.  I'm glad we're getting there :-)
> 
>> From a quick check on c23-stdarg-4.c it does look like the below
>> change ends up
>> with the same codegen as your patch (except in the case of those
>> legacy ABI's,
>> where the below does make the caller and callee ABI match AFAICT):
> 
>> ```
>>   diff --git a/gcc/calls.cc b/gcc/calls.cc
>>   index 01f44734743..0b302f633ed 100644
>>   --- a/gcc/calls.cc
>>   +++ b/gcc/calls.cc
>>   @@ -2970,14 +2970,15 @@ expand_call (tree exp, rtx target, int ignore)
>>     we do not have any reliable way to pass unnamed args in
>>     registers, so we must force them into memory.  */
> 
>>   -  if (type_arg_types != 0
>>   +  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>>  && targetm.calls.strict_argument_naming (args_so_far))
>>    ;
>>  else if (type_arg_types != 0
>>  && ! targetm.calls.pretend_outgoing_varargs_named
>> (args_so_far))
>>    /* Don't include the last named arg.  */
>>    --n_named_args;
>>   -  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>>   +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
>>   +    && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>>    n_named_args = 0;
>>  else
>>    /* Treat all args as named.  */
>> ```
> 
>> Do you agree that this makes sense (i.e. is there something I'm
>> completely missing)?
> 
> Yeah, your argument is quite convincing, and the target knobs are indeed
> in line with the change you suggest, whereas the current code seems to
> deviate from them.
> 
> With my ABI designer hat on, however, I see that there's room for ABIs
> to make decisions about 0-args stdargs that go differently from stdargs
> with leading named args, from prototyped functions, and even from
> prototypeless functions, and we might end up needing more knobs to deal
> with such custom decisions.  We can cross that bridge if/when we get to
> it, though.
> 
>> (lm32 mcore msp430 gcn cris fr30 frv h8300 arm v850 rx pru)
> 
> Interesting that ppc64le is not on your list.  There's PR107453 about
> that, and another thread is discussing a fix for it that is somewhat
> different from what you propose (presumably because the way the problem
> manifests on ppc64le is different), but it also tweaks expand_call.
> 
> I'll copy you when following up there.
>

Re: [PATCH] testsuite: Turn errors back into warnings in arm/acle/cde-mve-error-2.c

2024-03-01 Thread Richard Earnshaw (lists)

On 13/01/2024 20:46, Thiago Jung Bauermann wrote:
> Since commit 2c3db94d9fd ("c: Turn int-conversion warnings into
> permerrors") the test fails with errors such as:
> 
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 32)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 33)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 34)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 35)
> ⋮
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 118 (test for 
> warnings, line 117)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 119)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 120 (test for 
> warnings, line 119)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 121)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 122 (test for 
> warnings, line 121)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 123)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 124 (test for 
> warnings, line 123)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 125)
> ⋮
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0  (test for excess errors)
> 
> There's a total of 1016 errors.  Here's a sample of the excess errors:
> 
>   Excess errors:
>   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:117:31: 
> error: passing argument 2 of '__builtin_arm_vcx1qv16qi' makes integer from 
> pointer without a cast [-Wint-conversion]
>   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:119:3: 
> error: passing argument 3 of '__builtin_arm_vcx1qav16qi' makes integer from 
> pointer without a cast [-Wint-conversion]
>   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:121:3: 
> error: passing argument 3 of '__builtin_arm_vcx2qv16qi' makes integer from 
> pointer without a cast [-Wint-conversion]
>   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:123:3: 
> error: passing argument 3 of '__builtin_arm_vcx2qv16qi' makes integer from 
> pointer without a cast [-Wint-conversion]
> 
> The test expects these messages to be warnings, not errors.  My first try
> was to change it to expect them as errors instead.  This didn't work, IIUC
> because the error prevents the compiler from continuing processing the file
> and thus other errors which are expected by the test don't get emitted.
> 
> Therefore, add -fpermissive so that the test behaves as it did previously.
> Because of the additional line in the header, I had to adjust the line
> numbers of the expected warnings.
> 
> Tested on armv8l-linux-gnueabihf.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/arm/acle/cde-mve-error-2.c: Add -fpermissive.
> ---
>  .../gcc.target/arm/acle/cde-mve-error-2.c | 63 ++-
>  1 file changed, 32 insertions(+), 31 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c 
> b/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
> index 5b7774825442..da283a06a54d 100644
> --- a/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
> +++ b/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
> @@ -2,6 +2,7 @@
>  
>  /* { dg-do assemble } */
>  /* { dg-require-effective-target arm_v8_1m_main_cde_mve_fp_ok } */
> +/* { dg-options "-fpermissive" } */
>  /* { dg-add-options arm_v8_1m_main_cde_mve_fp } */
>  
>  /* The error checking files are split since there are three kinds of
> @@ -115,73 +116,73 @@ uint8x16_t test_bad_immediates (uint8x16_t n, 
> uint8x16_t m, int someval,
>  
>/* `imm' is of wrong type.  */
>accum += __arm_vcx1q_u8 (0, "");/* { dg-error 
> {argument 2 to '__builtin_arm_vcx1qv16qi' must be a constant immediate in 
> range \[0-4095\]} } */
> -  /* { dg-warning {passing argument 2 of '__builtin_arm_vcx1qv16qi' makes 
> integer from pointer without a cast \[-Wint-conversion\]} "" { target *-*-* } 
> 117 } */
> +  /* { dg-warning {passing argument 2 of '__builtin_arm_vcx1qv16qi' makes 
> integer from pointer without a cast \[-Wint-conversion\]} "" { target *-*-* } 
> 118 } */

Absolute line numbers are a pain, but I think we can use '.-1' (without the 
quotes) in these cases to minimize the churn.

If that works, ok with that change.

R.

Re: [PATCH] testsuite: Fix fallout of turning warnings into errors on 32-bit Arm

2024-03-01 Thread Richard Earnshaw (lists)

On 01/03/2024 14:23, Andre Vieira (lists) wrote:
> Hi Thiago,
> 
> Thanks for this, LGTM but I can't approve this, CC'ing Richard.
> 
> Do have a nitpick, in the gcc/testsuite/ChangeLog: remove 'gcc/testsuite' 
> from bullet points 2-4.
> 

Yes, this is OK with the change Andre mentioned (your push will fail if you 
don't fix that).

R.

PS, if you've set up GCC git customizations (see 
contrib/gcc-git-customization.sh), you can verify things like this with 'git 
gcc-verify HEAD^..HEAD'


> Kind regards,
> Andre
> 
> On 13/01/2024 00:55, Thiago Jung Bauermann wrote:
>> Since commits 2c3db94d9fd ("c: Turn int-conversion warnings into
>> permerrors") and 55e94561e97e ("c: Turn -Wimplicit-function-declaration
>> into a permerror") these tests fail with errors such as:
>>
>>    FAIL: gcc.target/arm/pr59858.c (test for excess errors)
>>    FAIL: gcc.target/arm/pr65647.c (test for excess errors)
>>    FAIL: gcc.target/arm/pr65710.c (test for excess errors)
>>    FAIL: gcc.target/arm/pr97969.c (test for excess errors)
>>
>> Here's one example of the excess errors:
>>
>>    FAIL: gcc.target/arm/pr65647.c (test for excess errors)
>>    Excess errors:
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:17: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:51: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:62: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:7:48: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:8:9: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:24:5: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:25:5: error: 
>> initialization of 'int' from 'struct S1 *' makes integer from pointer 
>> without a cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:41:3: error: 
>> implicit declaration of function 'fn3'; did you mean 'fn2'? 
>> [-Wimplicit-function-declaration]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:46:3: error: 
>> implicit declaration of function 'fn5'; did you mean 'fn4'? 
>> [-Wimplicit-function-declaration]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:57:16: error: 
>> implicit declaration of function 'fn6'; did you mean 'fn4'? 
>> [-Wimplicit-function-declaration]
>>
>> PR rtl-optimization/59858 and PR target/65710 test the fix of an ICE.
>> PR target/65647 and PR target/97969 test for a compilation infinite loop.
>>
>> Therefore, add -fpermissive so that the tests behave as they did previously.
>> Tested on armv8l-linux-gnueabihf.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/arm/pr59858.c: Add -fpermissive.
>> * gcc/testsuite/gcc.target/arm/pr65647.c: Likewise.
>> * gcc/testsuite/gcc.target/arm/pr65710.c: Likewise.
>> * gcc/testsuite/gcc.target/arm/pr97969.c: Likewise.
>> ---
>>   gcc/testsuite/gcc.target/arm/pr59858.c | 2 +-
>>   gcc/testsuite/gcc.target/arm/pr65647.c | 2 +-
>>   gcc/testsuite/gcc.target/arm/pr65710.c | 2 +-
>>   gcc/testsuite/gcc.target/arm/pr97969.c | 2 +-
>>   4 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
>> b/gcc/testsuite/gcc.target/arm/pr59858.c
>> index 3360b48e8586..9336edfce277 100644
>> --- a/gcc/testsuite/gcc.target/arm/pr59858.c
>> +++ b/gcc/testsuite/gcc.target/arm/pr59858.c
>> @@ -1,5 +1,5 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
>> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts 
>> -fPIC -w" } */
>> +/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
>> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts 
>> -fPIC -w -fpermissive" } */
>>   /* { dg-require-effective-target fpic } */
>>   /* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft 
>> -mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
>>   /* { dg-require-effective-target arm_arch_v5te_thumb_ok } */
>> diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c 
>> b/gcc/testsuite/gcc.target/arm/pr65647.c
>> index 26b4e399f6be..3cbf6b804ec0 100644
>> --- a/gcc/testsuite/gcc.target/arm/pr65647.c
>> +++ b/gcc/testsuite/gcc.target/arm/pr65647.c
>> @@ -1,7 +1,7 @@
>>   /* { dg-do compile } */
>>   /* { dg-require-effective-target arm_arch_v6m_ok } */
>>   /* {

Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-03-01 Thread Richard Earnshaw (lists)

On 29/02/2024 15:55, Jakub Jelinek wrote:
> On Thu, Feb 29, 2024 at 02:14:05PM +0000, Richard Earnshaw wrote:
>>> I tried the above on arm, aarch64 and x86_64 and that seems fine,
>>> including the new testcase you added.
>>>
>>
>> I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores
>> n_named_args entirely, it doesn't need it (I don't think it even existed
>> when the AAPCS code was added).
> 
> So far I've just checked that the new testcase passes not just on
> x86_64/i686-linux, but also on {powerpc64le,s390x,aarch64}-linux
> with vanilla trunk.
> Haven't posted this patch in patch form, plus while I'm not really sure
> whether setting n_named_args to 0 or not changing in the
> !pretend_outgoing_varargs_named is right, the setting to 0 feels more
> correct to me.  If structure_value_addr_parm is 1, the function effectively
> has a single named argument and then ... args and if the target wants
> n_named_args to be number of named arguments except the last, then that
> should be 0 rather than 1.
> 
> Thus, is the following patch ok for trunk then?
> 
> 2024-02-29  Jakub Jelinek  
> 
>   PR target/107453

PR 114136

Would be more appropriate for this, I think.

Otherwise, OK.

R.

>   * calls.cc (expand_call): For TYPE_NO_NAMED_ARGS_STDARG_P set
>   n_named_args initially before INIT_CUMULATIVE_ARGS to
>   structure_value_addr_parm rather than 0, after it don't modify
>   it if strict_argument_naming and clear only if
>   !pretend_outgoing_varargs_named.
> 
> --- gcc/calls.cc.jj   2024-01-22 11:48:08.045847508 +0100
> +++ gcc/calls.cc  2024-02-29 16:24:47.799855912 +0100
> @@ -2938,7 +2938,7 @@ expand_call (tree exp, rtx target, int i
>/* Count the struct value address, if it is passed as a parm.  */
>+ structure_value_addr_parm);
>else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> -n_named_args = 0;
> +n_named_args = structure_value_addr_parm;
>else
>  /* If we know nothing, treat all args as named.  */
>  n_named_args = num_actuals;
> @@ -2970,14 +2970,15 @@ expand_call (tree exp, rtx target, int i
>   we do not have any reliable way to pass unnamed args in
>   registers, so we must force them into memory.  */
>  
> -  if (type_arg_types != 0
> +  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>&& targetm.calls.strict_argument_naming (args_so_far))
>  ;
>else if (type_arg_types != 0
>  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  /* Don't include the last named arg.  */
>  --n_named_args;
> -  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
> +&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  n_named_args = 0;
>else
>  /* Treat all args as named.  */
> 
>   Jakub
>

Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-03-01 Thread Richard Earnshaw (lists)

On 29/02/2024 17:56, Jakub Jelinek wrote:
> On Thu, Feb 29, 2024 at 05:51:03PM +0000, Richard Earnshaw (lists) wrote:
>> Oh, but wait!  Perhaps that now falls into the initial 'if' clause and we 
>> never reach the point where you pick zero.  So perhaps I'm worrying about 
>> nothing.
> 
> If you are worried about the
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
> +  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  n_named_args = 0;
> case in the patch, we know at that point that the initial n_named_args is
> equal to structure_value_addr_parm, so either 0, in that case
> --n_named_args;
> would yield the undesirable negative value, so we want 0 instead; for that
> case we could as well just have ; in there instead of n_named_args = 0;,
> or it is 1, in that case --n_named_args; would turn that into 0.
> 
>   Jakub
> 

No, I was thinking about the case of strict_argument_naming when the first 
argument is the artificial return value pointer.  In that case we'd want 
n_named_args=1.

But I think it's a non-issue as that will be caught by 

  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
   && targetm.calls.strict_argument_naming (args_so_far))
 ;

R.

Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Richard Earnshaw (lists)

On 29/02/2024 17:55, Andrew Pinski (QUIC) wrote:
>> -Original Message-
>> From: Maxim Kuvyrkov 
>> Sent: Thursday, February 29, 2024 9:46 AM
>> To: Andrew Pinski (QUIC) 
>> Cc: Evgeny Karpov ; Andrew Pinski
>> ; Richard Sandiford ; gcc-
>> patc...@gcc.gnu.org; 10wa...@gmail.com; m...@harmstone.com; Zac
>> Walker ; Ron Riddle
>> ; Radek Barton 
>> Subject: Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW
>> environments for AArch64
>>
>> WARNING: This email originated from outside of Qualcomm. Please be wary
>> of any links or attachments, and do not enable macros.
>>
>>> On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC)
>>  wrote:
>>>
>>>
>>>
 -Original Message-
 From: Evgeny Karpov 
 Sent: Thursday, February 29, 2024 8:46 AM
 To: Andrew Pinski 
 Cc: Richard Sandiford ; gcc-
 patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
 ; m...@harmstone.com; Zac Walker
 ; Ron Riddle ;
 Radek Barton ; Andrew Pinski (QUIC)
 
 Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
 for AArch64

 Wednesday, February 28, 2024 2:00 AM
 Andrew Pinski wrote:

> What does this mean with respect to C++ exceptions? Or you using
> SJLJ exceptions support or the dwarf unwinding ones without SEH
>> support?
> I am not sure if SJLJ exceptions is well tested any more in GCC either.
>
> Also I have a question if you ran the full GCC/G++ testsuites and
> what were the results?
> If you did run it, did you use a cross compiler or the native
> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions
>> though)?

 As mentioned in the cover letter and the thread, the current
 contribution covers only the C scope.
 Exception handling is fully disabled for now.
 There is an experimental build with C++ and SEH, however, it is not
 included in the plan for the current contribution.

 https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-
>> build

> If you run using a cross compiler, did you use ssh or some other
> route to run the applications?
>
> Thanks,
> Andrew Pinski

 GitHub Actions are used to cross-compile toolchains, packages and
 tests, and execute tests on Windows Arm64.
>>>
>>> This does not answer my question because what you are running is just
>> simple testcases and not the FULL GCC testsuite.
>>> So again have you ran the GCC testsuite and do you have a dejagnu board to
>> be able to execute the binaries?
>>> I think without the GCC testsuite ran to find all of the known failures, 
>>> you are
>> going to be running into many issues.
>>> The GCC testsuite includes many tests for ABI corner cases and many
>> features that you will most likely not think about testing using your simple
>> testcases.
>>> In fact I suspect there will be some of the aarch64 testcases which will 
>>> need
>> to be modified for the windows ABI which you have not done yet.
>>
>> Hi Andrew,
>>
>> We (Linaro) have a prototype CI loop setup for testing aarch64-w64-
>> mingw32, and we have results for gcc-c and libatomic -- see [1].
>>
>> The results are far from clean, but that's expected.  This patch series aims 
>> at
>> enabling C hello-world only, and subsequent patch series will improve the
>> state of the port.
>>
>> [1] https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64-
>> build/6/artifact/artifacts/sumfiles/
> 
> Looking at these results, this port is not in any shape or form to be 
> upstreamed right now. Even simple -g will cause failures.
> Note we don't need a clean testsuite run but the patch series is not even 
> allowing enabling hello world due to the -g not being able to used.
> 

It seemed to me as though the patch was posted for comments, not for immediate 
inclusion.  I agree this isn't ready for committing yet, but neither should the 
submitters wait until it's perfect before posting it.

I think it's gcc-15 material, so now is about the right time to be thinking 
about it.

R.

> Thanks,
> Amdrew Pinski
> 
>>
>> Thanks,
>>
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>

Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Richard Earnshaw (lists)

On 29/02/2024 17:38, Jakub Jelinek wrote:
> On Thu, Feb 29, 2024 at 05:23:25PM +0000, Richard Earnshaw (lists) wrote:
>> On 29/02/2024 15:55, Jakub Jelinek wrote:
>>> On Thu, Feb 29, 2024 at 02:14:05PM +, Richard Earnshaw wrote:
>>>>> I tried the above on arm, aarch64 and x86_64 and that seems fine,
>>>>> including the new testcase you added.
>>>>>
>>>>
>>>> I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores
>>>> n_named_args entirely, it doesn't need it (I don't think it even existed
>>>> when the AAPCS code was added).
>>>
>>> So far I've just checked that the new testcase passes not just on
>>> x86_64/i686-linux, but also on {powerpc64le,s390x,aarch64}-linux
>>> with vanilla trunk.
>>> Haven't posted this patch in patch form, plus while I'm not really sure
>>> whether setting n_named_args to 0 or not changing in the
>>> !pretend_outgoing_varargs_named is right, the setting to 0 feels more
>>> correct to me.  If structure_value_addr_parm is 1, the function effectively
>>> has a single named argument and then ... args and if the target wants
>>> n_named_args to be number of named arguments except the last, then that
>>> should be 0 rather than 1.
>>>
>>> Thus, is the following patch ok for trunk then?
>>
>> The comment at the start of the section says
>>
>>   /* Now possibly adjust the number of named args.
>>  Normally, don't include the last named arg if anonymous args follow.
>>  We do include the last named arg if
>>  targetm.calls.strict_argument_naming() returns nonzero.
>>  (If no anonymous args follow, the result of list_length is actually
>>  one too large.  This is harmless.)
>>
>> So in the case of strict_argument_naming perhaps it should return 1, but 0 
>> for other cases.
> 
> The TYPE_NO_NAMED_ARGS_STDARG_P (funtype) case is as if type_arg_types != 0
> and list_length (type_arg_types) == 0, i.e. no user named arguments.
> As list_length (NULL) returns 0, perhaps it could be even handled just the
> by changing all the type_arg_types != 0 checks to
> type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
> There are just 2 cases I'm worried about, one is that I think rest of
> calls.cc nor the backends are prepared to see n_named_args -1 after the
> adjustments, I think it is better to use 0, and then the question is what
> the !strict_argument_naming && !pretend_outgoing_varargs_named case
> wants to do for the aggregate return.  The patch as posted for
> void foo (...); void bar () { foo (1, 2, 3); }
> will set n_named_args initially to 0 (no named args) and with the
> adjustments for strict_argument_naming 0, otherwise for !pretend
> 0 as well, otherwise 3.
> For
> struct { char buf[4096]; } baz (...); void qux () { baz (1, 2, 3); }
> the patch sets n_named_args initially to 1 (the hidden return) and
> with the arguments for strict keep it at 1, for !pretend 0 and otherwise
> 3.
> 
> So, which case do you think is handled incorrectly with that?

The way I was thinking about it (and testing it on Arm) was to look at 
n_named_args for the cases of a traditional varargs case, then reduce that by 
one (except it can't ever be negative).

So for 

void f(...);
void g(int, ...);
struct S { int a[32]; };

struct S h (...);
struct S i (int, ...);

void a ()
{
  struct S x;
  f(1, 2, 3, 4);
  g(1, 2, 3, 4);
  x = h (1, 2, 3, 4);
  x = i (1, 2, 3, 4);
}

There are various permutations that could lead to answers of 0, 1, 2, 4 and 5 
depending on how those various targets treat each case and how the result 
pointer address is handled.  My suspicion is that for a target that has strict 
argument naming and the result pointer passed as a first argument, the answer 
for the 'h()' call should be 1, not zero.  

Oh, but wait!  Perhaps that now falls into the initial 'if' clause and we never 
reach the point where you pick zero.  So perhaps I'm worrying about nothing.

R.

> 
>   Jakub
>

Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Richard Earnshaw (lists)

On 29/02/2024 15:55, Jakub Jelinek wrote:
> On Thu, Feb 29, 2024 at 02:14:05PM +0000, Richard Earnshaw wrote:
>>> I tried the above on arm, aarch64 and x86_64 and that seems fine,
>>> including the new testcase you added.
>>>
>>
>> I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores
>> n_named_args entirely, it doesn't need it (I don't think it even existed
>> when the AAPCS code was added).
> 
> So far I've just checked that the new testcase passes not just on
> x86_64/i686-linux, but also on {powerpc64le,s390x,aarch64}-linux
> with vanilla trunk.
> Haven't posted this patch in patch form, plus while I'm not really sure
> whether setting n_named_args to 0 or not changing in the
> !pretend_outgoing_varargs_named is right, the setting to 0 feels more
> correct to me.  If structure_value_addr_parm is 1, the function effectively
> has a single named argument and then ... args and if the target wants
> n_named_args to be number of named arguments except the last, then that
> should be 0 rather than 1.
> 
> Thus, is the following patch ok for trunk then?

The comment at the start of the section says

  /* Now possibly adjust the number of named args.
 Normally, don't include the last named arg if anonymous args follow.
 We do include the last named arg if
 targetm.calls.strict_argument_naming() returns nonzero.
 (If no anonymous args follow, the result of list_length is actually
 one too large.  This is harmless.)

So in the case of strict_argument_naming perhaps it should return 1, but 0 for 
other cases.

R.

> 
> 2024-02-29  Jakub Jelinek  
> 
>   PR target/107453
>   * calls.cc (expand_call): For TYPE_NO_NAMED_ARGS_STDARG_P set
>   n_named_args initially before INIT_CUMULATIVE_ARGS to
>   structure_value_addr_parm rather than 0, after it don't modify
>   it if strict_argument_naming and clear only if
>   !pretend_outgoing_varargs_named.
> 
> --- gcc/calls.cc.jj   2024-01-22 11:48:08.045847508 +0100
> +++ gcc/calls.cc  2024-02-29 16:24:47.799855912 +0100
> @@ -2938,7 +2938,7 @@ expand_call (tree exp, rtx target, int i
>/* Count the struct value address, if it is passed as a parm.  */
>+ structure_value_addr_parm);
>else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> -n_named_args = 0;
> +n_named_args = structure_value_addr_parm;
>else
>  /* If we know nothing, treat all args as named.  */
>  n_named_args = num_actuals;
> @@ -2970,14 +2970,15 @@ expand_call (tree exp, rtx target, int i
>   we do not have any reliable way to pass unnamed args in
>   registers, so we must force them into memory.  */
>  
> -  if (type_arg_types != 0
> +  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>&& targetm.calls.strict_argument_naming (args_so_far))
>  ;
>else if (type_arg_types != 0
>  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  /* Don't include the last named arg.  */
>  --n_named_args;
> -  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
> +&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  n_named_args = 0;
>else
>  /* Treat all args as named.  */
> 
>   Jakub
>

Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Richard Earnshaw




On 29/02/2024 14:10, Richard Earnshaw (lists) wrote:
> On 27/02/2024 17:25, Jakub Jelinek wrote:
>> On Tue, Feb 27, 2024 at 04:41:32PM +, Richard Earnshaw wrote:
>>>> 2023-01-09  Jakub Jelinek  
>>>>
>>>>PR target/107453
>>>>* calls.cc (expand_call): For calls with
>>>>TYPE_NO_NAMED_ARGS_STDARG_P (funtype) use zero for n_named_args.
>>>>Formatting fix.
>>>
>>> This one has been festering for a while; both Alexandre and Torbjorn have 
>>> attempted to fix it recently, but I'm not sure either is really right...
>>>
>>> On Arm this is causing all anonymous arguments to be passed on the stack,
>>> which is incorrect per the ABI.  On a target that uses
>>> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to
>>> zero?  Is it enough to guard both the statements you've added with
>>> !targetm.calls.pretend_outgoing_args_named?
>>
>> I'm afraid I haven't heard of that target hook before.
>> All I was doing with that change was fixing a regression reported in the PR
>> for ppc64le/sparc/nvptx/loongarch at least.
>>
>> The TYPE_NO_NAMED_ARGS_STDARG_P functions (C23 fns like void foo (...) {})
>> have NULL type_arg_types, so the list_length (type_arg_types) isn't done for
>> it, but it should be handled as if it was non-NULL but list length was 0.
>>
>> So, for the
>>   if (type_arg_types != 0)
>> n_named_args
>>   = (list_length (type_arg_types)
>>  /* Count the struct value address, if it is passed as a parm.  */
>>  + structure_value_addr_parm);
>>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> n_named_args = 0;
>>   else
>> /* If we know nothing, treat all args as named.  */
>> n_named_args = num_actuals;
>> case, I think guarding it by any target hooks is wrong, although
>> I guess it should have been
>> n_named_args = structure_value_addr_parm;
>> instead of
>> n_named_args = 0;
>>
>> For the second
>>   if (type_arg_types != 0
>>   && targetm.calls.strict_argument_naming (args_so_far))
>> ;
>>   else if (type_arg_types != 0
>>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>> /* Don't include the last named arg.  */
>> --n_named_args;
>>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> n_named_args = 0;
>>   else
>> /* Treat all args as named.  */
>> n_named_args = num_actuals;
>> bet (but no testing done, don't even know which targets return what for
>> those hooks) we should treat those as if type_arg_types was non-NULL
>> with 0 elements in the list, except the --n_named_args doesn't make sense
>> because that would decrease it to -1.
>> So perhaps
>>   if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>>   && targetm.calls.strict_argument_naming (args_so_far))
>> ;
>>   else if (type_arg_types != 0
>>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>> /* Don't include the last named arg.  */
>> --n_named_args;
>>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
>> && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far)))
>> ;
>>   else
>> /* Treat all args as named.  */
>> n_named_args = num_actuals;
> 
> I tried the above on arm, aarch64 and x86_64 and that seems fine, including 
> the new testcase you added.
> 

I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores n_named_args 
entirely, it doesn't need it (I don't think it even existed when the AAPCS code 
was added).

R.

> R.
> 
>>
>> (or n_named_args = 0; instead of ; before the final else?  Dunno).
>> I guess we need some testsuite coverage for caller/callee ABI match of
>> struct S { char p[64]; };
>> struct S foo (...);
>>
>>  Jakub
>>
>

Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Richard Earnshaw (lists)

On 27/02/2024 17:25, Jakub Jelinek wrote:
> On Tue, Feb 27, 2024 at 04:41:32PM +0000, Richard Earnshaw wrote:
>>> 2023-01-09  Jakub Jelinek  
>>>
>>> PR target/107453
>>> * calls.cc (expand_call): For calls with
>>> TYPE_NO_NAMED_ARGS_STDARG_P (funtype) use zero for n_named_args.
>>> Formatting fix.
>>
>> This one has been festering for a while; both Alexandre and Torbjorn have 
>> attempted to fix it recently, but I'm not sure either is really right...
>>
>> On Arm this is causing all anonymous arguments to be passed on the stack,
>> which is incorrect per the ABI.  On a target that uses
>> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to
>> zero?  Is it enough to guard both the statements you've added with
>> !targetm.calls.pretend_outgoing_args_named?
> 
> I'm afraid I haven't heard of that target hook before.
> All I was doing with that change was fixing a regression reported in the PR
> for ppc64le/sparc/nvptx/loongarch at least.
> 
> The TYPE_NO_NAMED_ARGS_STDARG_P functions (C23 fns like void foo (...) {})
> have NULL type_arg_types, so the list_length (type_arg_types) isn't done for
> it, but it should be handled as if it was non-NULL but list length was 0.
> 
> So, for the
>   if (type_arg_types != 0)
> n_named_args
>   = (list_length (type_arg_types)
>  /* Count the struct value address, if it is passed as a parm.  */
>  + structure_value_addr_parm);
>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> n_named_args = 0;
>   else
> /* If we know nothing, treat all args as named.  */
> n_named_args = num_actuals;
> case, I think guarding it by any target hooks is wrong, although
> I guess it should have been
> n_named_args = structure_value_addr_parm;
> instead of
> n_named_args = 0;
> 
> For the second
>   if (type_arg_types != 0
>   && targetm.calls.strict_argument_naming (args_so_far))
> ;
>   else if (type_arg_types != 0
>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
> /* Don't include the last named arg.  */
> --n_named_args;
>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> n_named_args = 0;
>   else
> /* Treat all args as named.  */
> n_named_args = num_actuals;
> bet (but no testing done, don't even know which targets return what for
> those hooks) we should treat those as if type_arg_types was non-NULL
> with 0 elements in the list, except the --n_named_args doesn't make sense
> because that would decrease it to -1.
> So perhaps
>   if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>   && targetm.calls.strict_argument_naming (args_so_far))
> ;
>   else if (type_arg_types != 0
>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
> /* Don't include the last named arg.  */
> --n_named_args;
>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
>  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far)))
> ;
>   else
> /* Treat all args as named.  */
> n_named_args = num_actuals;

I tried the above on arm, aarch64 and x86_64 and that seems fine, including the 
new testcase you added.

R.

> 
> (or n_named_args = 0; instead of ; before the final else?  Dunno).
> I guess we need some testsuite coverage for caller/callee ABI match of
> struct S { char p[64]; };
> struct S foo (...);
> 
>   Jakub
>

Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-27 Thread Richard Earnshaw

[resending, apologies, I accidentally CC'd the wrong person last time]

On 27/02/2024 16:41, Richard Earnshaw wrote:
> 
> 
> On 09/01/2023 10:32, Jakub Jelinek via Gcc-patches wrote:
>> Hi!
>>
>> On powerpc64le-linux, the following patch fixes
>> -FAIL: gcc.dg/c2x-stdarg-4.c execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O0  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O1  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto 
>> -fno-use-linker-plugin -flto-partition=none  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O3 -g  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -Os  execution test
>> The problem is mismatch between the caller and callee side.
>> On the callee side, we do:
>>   /* NAMED_ARG is a misnomer.  We really mean 'non-variadic'. */
>>   if (!cfun->stdarg)
>> data->arg.named = 1;  /* No variadic parms.  */
>>   else if (DECL_CHAIN (parm))
>> data->arg.named = 1;  /* Not the last non-variadic parm. */
>>   else if (targetm.calls.strict_argument_naming (all->args_so_far))
>> data->arg.named = 1;  /* Only variadic ones are unnamed.  */
>>   else
>> data->arg.named = 0;  /* Treat as variadic.  */
>> which is later passed to the target hooks to determine if a particular
>> argument is named or not.  Now, cfun->stdarg is determined from the stdarg_p
>> call, which for the new C2X TYPE_NO_NAMED_ARGS_STDARG_P function types
>> (rettype fn (...)) returns true.  Such functions have no named arguments,
>> so data->arg.named will be 0 in function.cc.  But on the caller side,
>> as TYPE_NO_NAMED_ARGS_STDARG_P function types have TYPE_ARG_TYPES NULL,
>> we instead treat those calls as unprototyped even when they are prototyped
>> - /* If we know nothing, treat all args as named.  */ n_named_args = 
>> num_actuals;
>> in 2 spots.  We need to treat the TYPE_NO_NAMED_ARGS_STDARG_P cases as
>> prototyped with no named arguments.
>>
>> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux (where
>> it fixes the above failures), aarch64-linux and s390x-linux, ok for trunk?
>>
>> 2023-01-09  Jakub Jelinek  
>>
>>  PR target/107453
>>  * calls.cc (expand_call): For calls with
>>  TYPE_NO_NAMED_ARGS_STDARG_P (funtype) use zero for n_named_args.
>>  Formatting fix.
> 
> This one has been festering for a while; both Alexandre and Torbjorn have 
> attempted to fix it recently, but I'm not sure either is really right...
> 
> On Arm this is causing all anonymous arguments to be passed on the stack, 
> which is incorrect per the ABI.  On a target that uses 
> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to 
> zero?  Is it enough to guard both the statements you've added with 
> !targetm.calls.pretend_outgoing_args_named?
> 
> R.
> 
>>
>> --- gcc/calls.cc.jj  2023-01-02 09:32:28.834192105 +0100
>> +++ gcc/calls.cc 2023-01-06 14:52:14.740594896 +0100
>> @@ -2908,8 +2908,8 @@ expand_call (tree exp, rtx target, int i
>>  }
>>  
>>/* Count the arguments and set NUM_ACTUALS.  */
>> -  num_actuals =
>> -call_expr_nargs (exp) + num_complex_actuals + structure_value_addr_parm;
>> +  num_actuals
>> += call_expr_nargs (exp) + num_complex_actuals + 
>> structure_value_addr_parm;
>>  
>>/* Compute number of named args.
>>   First, do a raw count of the args for INIT_CUMULATIVE_ARGS.  */
>> @@ -2919,6 +2919,8 @@ expand_call (tree exp, rtx target, int i
>>= (list_length (type_arg_types)
>>   /* Count the struct value address, if it is passed as a parm.  */
>>   + structure_value_addr_parm);
>> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> +n_named_args = 0;
>>else
>>  /* If we know nothing, treat all args as named.  */
>>  n_named_args = num_actuals;
>> @@ -2957,6 +2959,8 @@ expand_call (tree exp, rtx target, int i
>> && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>>  /* Don't include the last named arg.  */
>>  --n_named_args;
>> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> +n_named_args = 0;
>>else
>>  /* Treat all args as named.  */
>>  n_named_args = num_actuals;
>>
>>  Jakub
>>

Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-27 Thread Richard Earnshaw




On 09/01/2023 10:32, Jakub Jelinek via Gcc-patches wrote:
> Hi!
> 
> On powerpc64le-linux, the following patch fixes
> -FAIL: gcc.dg/c2x-stdarg-4.c execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O0  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O1  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O3 -g  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -Os  execution test
> The problem is mismatch between the caller and callee side.
> On the callee side, we do:
>   /* NAMED_ARG is a misnomer.  We really mean 'non-variadic'. */
>   if (!cfun->stdarg)
> data->arg.named = 1;  /* No variadic parms.  */
>   else if (DECL_CHAIN (parm))
> data->arg.named = 1;  /* Not the last non-variadic parm. */
>   else if (targetm.calls.strict_argument_naming (all->args_so_far))
> data->arg.named = 1;  /* Only variadic ones are unnamed.  */
>   else
> data->arg.named = 0;  /* Treat as variadic.  */
> which is later passed to the target hooks to determine if a particular
> argument is named or not.  Now, cfun->stdarg is determined from the stdarg_p
> call, which for the new C2X TYPE_NO_NAMED_ARGS_STDARG_P function types
> (rettype fn (...)) returns true.  Such functions have no named arguments,
> so data->arg.named will be 0 in function.cc.  But on the caller side,
> as TYPE_NO_NAMED_ARGS_STDARG_P function types have TYPE_ARG_TYPES NULL,
> we instead treat those calls as unprototyped even when they are prototyped
> - /* If we know nothing, treat all args as named.  */ n_named_args = 
> num_actuals;
> in 2 spots.  We need to treat the TYPE_NO_NAMED_ARGS_STDARG_P cases as
> prototyped with no named arguments.
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux (where
> it fixes the above failures), aarch64-linux and s390x-linux, ok for trunk?
> 
> 2023-01-09  Jakub Jelinek  
> 
>   PR target/107453
>   * calls.cc (expand_call): For calls with
>   TYPE_NO_NAMED_ARGS_STDARG_P (funtype) use zero for n_named_args.
>   Formatting fix.

This one has been festering for a while; both Alexandre and Torbjorn have 
attempted to fix it recently, but I'm not sure either is really right...

On Arm this is causing all anonymous arguments to be passed on the stack, which 
is incorrect per the ABI.  On a target that uses 
'pretend_outgoing_vararg_named', why is it correct to set n_named_args to zero? 
 Is it enough to guard both the statements you've added with 
!targetm.calls.pretend_outgoing_args_named?

R.

> 
> --- gcc/calls.cc.jj   2023-01-02 09:32:28.834192105 +0100
> +++ gcc/calls.cc  2023-01-06 14:52:14.740594896 +0100
> @@ -2908,8 +2908,8 @@ expand_call (tree exp, rtx target, int i
>  }
>  
>/* Count the arguments and set NUM_ACTUALS.  */
> -  num_actuals =
> -call_expr_nargs (exp) + num_complex_actuals + structure_value_addr_parm;
> +  num_actuals
> += call_expr_nargs (exp) + num_complex_actuals + 
> structure_value_addr_parm;
>  
>/* Compute number of named args.
>   First, do a raw count of the args for INIT_CUMULATIVE_ARGS.  */
> @@ -2919,6 +2919,8 @@ expand_call (tree exp, rtx target, int i
>= (list_length (type_arg_types)
>/* Count the struct value address, if it is passed as a parm.  */
>+ structure_value_addr_parm);
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> +n_named_args = 0;
>else
>  /* If we know nothing, treat all args as named.  */
>  n_named_args = num_actuals;
> @@ -2957,6 +2959,8 @@ expand_call (tree exp, rtx target, int i
>  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  /* Don't include the last named arg.  */
>  --n_named_args;
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> +n_named_args = 0;
>else
>  /* Treat all args as named.  */
>  n_named_args = num_actuals;
> 
>   Jakub
>

[PATCH] arm: warn about deprecation of iwmmx in mmintrin.h

2024-02-27 Thread Richard Earnshaw


GCC 13's changes file documents that iwmmx is deprecated.  Raise the bar
by warning when the mmintrin.h header is included by users, but provide
a way to suppress the warning.

gcc:
* config/arm/mmintrin.h: Warn if this header is included without
defining __ENABLE_DEPRECATED_IWMMXT.
---
 gcc/config/arm/mmintrin.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
index 07659502bf2..e9cc3ddd7ab 100644
--- a/gcc/config/arm/mmintrin.h
+++ b/gcc/config/arm/mmintrin.h
@@ -28,6 +28,9 @@
 #error mmintrin.h included without enabling WMMX/WMMX2 instructions (e.g. -march=iwmmxt or -march=iwmmxt2)
 #endif
 
+#ifndef __ENABLE_DEPRECATED_IWMMXT
+#warning support for WMMX/WMMX2 is deprecated and will be removed in GCC 15.  Define __ENABLE_DEPRECATED_IWMMXT to suppress this warning
+#endif
 
 #if defined __cplusplus
 extern "C" {

Re: [PATCH] ARM: Fix conditional execution [PR113915]

2024-02-26 Thread Richard Earnshaw (lists)

On 26/02/2024 16:05, Wilco Dijkstra wrote:
> Hi Richard,
> 
>> Did you test this on a thumb1 target?  It seems to me that the target parts 
>> that you've
>> removed were likely related to that.  In fact, I don't see why this test 
>> would need to be changed at all.
> 
> The testcase explicitly forces a Thumb-2 target (arm_arch_v6t2). The patterns
> were wrong for Thumb-2 indeed, and the testcase was explicitly testing for 
> this.
> There is a separate builtin-bswap-2.c for Thumb-1 target (arm_arch_v6m).
> 
> Cheers,
> Wilco

That's why statements like:

* gcc.target/arm/builtin-bswap-1.c: Fix test.

are less than helpful.  Perhaps if you'd said what you actually changed that 
would have made it more obvious.

So OK, but please fix the commit message to say what you did.

R.

Re: [PATCH] ARM: Fix conditional execution [PR113915]

2024-02-26 Thread Richard Earnshaw (lists)

On 23/02/2024 15:46, Wilco Dijkstra wrote:
> Hi Richard,
> 
>> This bit isn't.  The correct fix here is to fix the pattern(s) concerned to 
>> add the missing predicate.
>>
>> Note that builtin-bswap.x explicitly mentions predicated mnemonics in the 
>> comments.
> 
> I fixed the patterns in v2. There are likely some more, plus we could likely 
> merge many t1 and t2
> patterns where the only difference is predication. But those cleanups are for 
> another time...
> 
> Cheers,
> Wilco
> 
> v2: Add predicable to the rev patterns.
> 
> By default most patterns can be conditionalized on Arm targets.  However
> Thumb-2 predication requires the "predicable" attribute be explicitly
> set to "yes".  Most patterns are shared between Arm and Thumb(-2) and are
> marked with "predicable".  Given this sharing, it does not make sense to
> use a different default for Arm.  So only consider conditional execution
> of instructions that have the predicable attribute set to yes.  This ensures
> that patterns not explicitly marked as such are never conditionally executed.
> 
> Passes regress and bootstrap, OK for commit?
> 
> gcc/ChangeLog:
> PR target/113915
> * config/arm/arm.md (NOCOND): Improve comment.
> (arm_rev*) Add predicable.
> * config/arm/arm.cc (arm_final_prescan_insn): Add check for
> PREDICABLE_YES.
> 
> gcc/testsuite/ChangeLog:
> PR target/113915
> * gcc.target/arm/builtin-bswap-1.c: Fix test.
> 
> ---
> 
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 
> 1cd69268ee986a0953cc85ab259355d2191250ac..6a35fe44138135998877a9fb74c2a82a7f99dcd5
>  100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -25613,11 +25613,12 @@ arm_final_prescan_insn (rtx_insn *insn)
> break;
>  
>   case INSN:
> -   /* Instructions using or affecting the condition codes make it
> -  fail.  */
> +   /* Check the instruction is explicitly marked as predicable.
> +  Instructions using or affecting the condition codes are not.  
> */
> scanbody = PATTERN (this_insn);
> if (!(GET_CODE (scanbody) == SET
>   || GET_CODE (scanbody) == PARALLEL)
> +   || get_attr_predicable (this_insn) != PREDICABLE_YES
> || get_attr_conds (this_insn) != CONDS_NOCOND)
>   fail = TRUE;
> break;
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> 5816409f86f1106b410c5e21d77e599b485f85f2..81237a61d4a2ebcfb77e47c2bd29137aba28a521
>  100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -307,6 +307,8 @@
>  ;
>  ; NOCOND means that the instruction does not use or alter the condition
>  ;   codes but can be converted into a conditionally exectuted instruction.
> +;   Given that NOCOND is the default for most instructions if omitted,
> +;   the attribute predicable must be set to yes as well.
>  
>  (define_attr "conds" "use,set,clob,unconditional,nocond"
>   (if_then_else
> @@ -12547,6 +12549,7 @@
>revsh%?\t%0, %1"
>[(set_attr "arch" "t1,t2,32")
> (set_attr "length" "2,2,4")
> +   (set_attr "predicable" "no,yes,yes")
> (set_attr "type" "rev")]
>  )
>  
> @@ -12560,6 +12563,7 @@
> rev16%?\t%0, %1"
>[(set_attr "arch" "t1,t2,32")
> (set_attr "length" "2,2,4")
> +   (set_attr "predicable" "no,yes,yes")
> (set_attr "type" "rev")]
>  )
>  
> @@ -12584,6 +12588,7 @@
> rev16%?\t%0, %1"
>[(set_attr "arch" "t1,t2,32")
> (set_attr "length" "2,2,4")
> +   (set_attr "predicable" "no,yes,yes")
> (set_attr "type" "rev")]
>  )
>  
> @@ -12619,6 +12624,7 @@
> rev16%?\t%0, %1"
>[(set_attr "arch" "t1,t2,32")
> (set_attr "length" "2,2,4")
> +   (set_attr "predicable" "no,yes,yes")
> (set_attr "type" "rev")]
>  )
>  
> diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c 
> b/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c
> index 
> c1e7740d14d3ca4e93a71e38b12f82c19791a204..1a311a6a5af647d40abd553e5d0ba1273c76d288
>  100644
> --- a/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c
> +++ b/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c
> @@ -5,14 +5,11 @@
> of the instructions.  Add an -mtune option known to facilitate that.  */
>  /* { dg-additional-options "-O2 -mtune=cortex-a53" } */
>  /* { dg-final { scan-assembler-not "orr\[ \t\]" } } */
> -/* { dg-final { scan-assembler-times "revsh\\t" 1 { target { arm_nothumb } } 
> } }  */
> -/* { dg-final { scan-assembler-times "revshne\\t" 1 { target { arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "revsh\\t" 2 { target { ! arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "rev16\\t" 1 { target { arm_nothumb } } 
> } }  */
> -/* { dg-final { scan-assembler-times "rev16ne\\t" 1 { target { arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "rev16\\t" 2 { target { ! arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "rev\\t" 2

[committed] arm: fix ICE with vectorized reciprocal division [PR108120]

2024-02-23 Thread Richard Earnshaw


The expand pattern for reciprocal division was enabled for all math
optimization modes, but the patterns it was generating were not
enabled unless -funsafe-math-optimizations were enabled, this leads to
an ICE when the pattern we generate cannot be recognized.

Fixed by only enabling vector division when doing unsafe math.

gcc:

PR target/108120
* config/arm/neon.md (div3): Rename from div3.
Gate with ARM_HAVE_NEON__ARITH.

gcc/testsuite:
PR target/108120
* gcc.target/arm/neon-recip-div-1.c: New file.
---
 gcc/config/arm/neon.md  |  4 ++--
 gcc/testsuite/gcc.target/arm/neon-recip-div-1.c | 16 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/neon-recip-div-1.c

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 17c90f436c6..fa4a7aeda35 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -553,11 +553,11 @@ (define_insn "*mul3_neon"
Enabled with -funsafe-math-optimizations -freciprocal-math
and disabled for -Os since it increases code size .  */
 
-(define_expand "div3"
+(define_expand "div3"
   [(set (match_operand:VCVTF 0 "s_register_operand")
 (div:VCVTF (match_operand:VCVTF 1 "s_register_operand")
 		  (match_operand:VCVTF 2 "s_register_operand")))]
-  "TARGET_NEON && !optimize_size
+  "ARM_HAVE_NEON__ARITH && !optimize_size
&& flag_reciprocal_math"
   {
 rtx rec = gen_reg_rtx (mode);
diff --git a/gcc/testsuite/gcc.target/arm/neon-recip-div-1.c b/gcc/testsuite/gcc.target/arm/neon-recip-div-1.c
new file mode 100644
index 000..e15c3ca5fe9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-recip-div-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3 -freciprocal-math -fno-unsafe-math-optimizations -save-temps" } */
+/* { dg-add-options arm_neon } */
+
+int *a;
+int n;
+void b() {
+  int c;
+  for (c = 0; c < 10; c++)
+a[c] = (float)c / n;
+}
+/* We should not ICE, or get a vectorized reciprocal instruction when unsafe
+   math optimizations are disabled.  */
+/* { dg-final { scan-assembler-not "vrecpe\\.f32\\t\[qd\].*" } } */
+/* { dg-final { scan-assembler-not "vrecps\\.f32\\t\[qd\].*" } } */

Re: [PATCH v1 00/13] Add aarch64-w64-mingw32 target

2024-02-22 Thread Richard Earnshaw (lists)

On 21/02/2024 17:47, Evgeny Karpov wrote:
> Hello,
> 
> We would like to take your attention to the review of changes for the
> new GCC target, aarch64-w64-mingw32. The new target will be
> supported, tested, added to CI, and maintained by Linaro. This marks
> the first of three planned patch series contributing to the GCC C
> compiler's support for Windows Arm64.
> 
> 1. Minimal aarch64-w64-mingw32 C implementation to cross-compile
> hello-world with libgcc for Windows Arm64 using MinGW.
> 2. Extension of the aarch64-w64-mingw32 C implementation to
> cross-compile OpenSSL, OpenBLAS, FFmpeg, and libjpeg-turbo. All
> packages successfully pass tests.
> 3. Addition of call stack support for debugging, resolution of
> optimization issues in the C compiler, and DLL export/import for the
> aarch64-w64-mingw32 target.
> 
> This patch series introduces the 1st point, which involves building
> hello-world for the aarch64-w64-mingw32 target. The patch depends on
> the binutils changes for the aarch64-w64-mingw32 target that have
> already been merged.
> 
> The binutils should include recent relocation fixes.
> f87eaf8ff3995a5888c6dc4996a20c770e6bcd36
> aarch64: Add new relocations and limit COFF AArch64 relocation offsets
> 
> The series is structured in a way to trivially show that it should not
> affect any other targets.
> 
> In this patch, several changes have been made to support the
> aarch64-w64-mingw32 target for GCC. The modifications include the
> definition of the MS ABI for aarch64, adjustments to FIXED_REGISTERS
> and STATIC_CHAIN_REGNUM for different ABIs, and specific definitions
> for COFF format on AArch64. Additionally, the patch reuses MinGW
>  types and definitions from i386, relocating them to a new
> mingw folder for shared usage between both targets.
> 
> MinGW-specific options have been introduced for AArch64, along with
> override options for aarch64-w64-mingw32. Builtin stack probing for
> override options for aarch64-w64-mingw32. Builtin stack probing for
> AArch64 has been enabled as an alternative for chkstk. Symbol name
> encoding and section information handling for aarch64-w64-mingw32 have
> been incorporated, and the MinGW environment has been added, which
> will also be utilized for defining the Cygwin environment in the
> future.
> 
> The patch includes renaming "x86 Windows Options" to "Cygwin and MinGW
> Options," which now encompasses AArch64 as well. AArch64-specific
> Cygwin and MinGW Options have been introduced for the unique
> requirements of the AArch64 architecture.
> 
> Function type declaration and named sections support have been added.
> The necessary objects for Cygwin and MinGW have been built for the
> aarch64-w64-mingw32 target, and relevant files such as msformat-c.cc
> and winnt-d.cc have been moved to the mingw folder for reuse in
> AArch64.
> 
> Furthermore, the aarch64-w64-mingw32 target has been included in both
> libatomic and libgcc, ensuring support for the AArch64 architecture
> within these libraries. These changes collectively enhance the
> capabilities of GCC for the specified target.
> 
> Coauthors: Zac Walker ,
> Mark Harmstone   and
> Ron Riddle 
> 
> Refactored, prepared, and validated by 
> Radek Barton  and 
> Evgeny Karpov 
> 
> Special thanks to the Linaro GNU toolchain team for internal review
> and assistance in preparing the patch series!
> 
> Regards,
> Evgeny

Thanks for posting this.

I've only read quickly through this patch series and responded where I think 
some action is obviously required.  That doesn't necessarily mean the other 
patches are perfect, though, just that nothing immediately caught my attention.

R.

> 
> 
> Zac Walker (13):
>   Introduce aarch64-w64-mingw32 target
>   aarch64: The aarch64-w64-mingw32 target implements the MS ABI
>   aarch64: Mark x18 register as a fixed register for MS ABI
>   aarch64: Add aarch64-w64-mingw32 COFF
>   Reuse MinGW from i386 for AArch64
>   Rename section and encoding functions from i386 which will be used in
> aarch64
>   Exclude i386 functionality from aarch64 build
>   aarch64: Add Cygwin and MinGW environments for AArch64
>   aarch64: Add SEH to machine_function
>   Rename "x86 Windows Options" to "Cygwin and MinGW Options"
>   aarch64: Build and add objects for Cygwin and MinGW for AArch64
>   aarch64: Add aarch64-w64-mingw32 target to libatomic
>   Add aarch64-w64-mingw32 target to libgcc
> 
>  fixincludes/mkfixinc.sh   |   3 +-
>  gcc/config.gcc|  47 +++--
>  gcc/config/aarch64/aarch64-coff.h |  92 +
>  gcc/config/aarch64/aarch64-opts.h |   7 +
>  gcc/config/aarch64/aarch64-protos.h   |   5 +
>  gcc/config/aarch64/aarch64.h  |  25 ++-
>  gcc/config/aarch64/cygming.h  | 178 ++
>  gcc/config/i386/cygming.h |  18 +-
>  gcc/config/i386/cygming.opt.urls  |  30 ---
>  gcc/config/i386/i386-protos.h

Re: [PATCH v1 13/13] Add aarch64-w64-mingw32 target to libgcc

2024-02-22 Thread Richard Earnshaw (lists)

On 21/02/2024 18:40, Evgeny Karpov wrote:
> 
+aarch64-*-mingw*)

This doesn't match the glob pattern you added to config.gcc in an earlier 
patch, but see my comment on that.  The two should really be consistent with 
each other or you might get build failures late on.

R.

Re: [PATCH v1 10/13] Rename "x86 Windows Options" to "Cygwin and MinGW Options"

2024-02-22 Thread Richard Earnshaw (lists)

On 21/02/2024 18:38, Evgeny Karpov wrote:
> 
For this change you might want to put some form of re-direct in the manual 
under the old name so that anybody used to looking for the old entry will know 
where things have been moved to.  Something like

x86 Windows Options
  See xref(Cygwin and MinGW Options).

R.

Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-22 Thread Richard Earnshaw (lists)

On 21/02/2024 18:36, Evgeny Karpov wrote:
> 
+/* GNU as supports weak symbols on PECOFF.  */
+#ifdef HAVE_GAS_WEAK

Can't we assume this is true?  It was most likely needed on i386 because 
support goes back longer than the assembler had this feature, but it looks like 
it was added in 2000, or thereabouts, so significantly before aarch64 was 
supported in the assembler.

+#ifndef HAVE_GAS_ALIGNED_COMM

And this was added to GCC in 2009, which probably means it predates 
aarch64-coff support in gas as well.

R.

Re: [PATCH v1 03/13] aarch64: Mark x18 register as a fixed register for MS ABI

2024-02-22 Thread Richard Earnshaw (lists)

On 21/02/2024 18:30, Evgeny Karpov wrote:
> 
+   tm_defines="${tm_defines} TARGET_ARM64_MS_ABI=1"

I missed this on first reading...

The GCC port name uses AARCH64, please use that internally rather than other 
names.  The only time when we should be using ARM64 is when it's needed for 
compatibility with other compilers and that doesn't apply here AFAICT.

R.

Re: [PATCH v1 03/13] aarch64: Mark x18 register as a fixed register for MS ABI

2024-02-22 Thread Richard Earnshaw (lists)

On 21/02/2024 18:30, Evgeny Karpov wrote:
> 
+/* X18 reserved for the TEB on Windows.  */
+#ifdef TARGET_ARM64_MS_ABI
+# define FIXED_X18 1
+# define CALL_USED_X18 0
+#else
+# define FIXED_X18 0
+# define CALL_USED_X18 1
+#endif

I'm not overly keen on ifdefs like this (and the one below), it can get quite 
confusing if we have to support more than a couple of ABIs.  Perhaps we could 
create a couple of new headers, one for the EABI (which all existing targets 
would then need to include) and one for the MS ABI.  Then the mingw port would 
use that instead of the EABI header.

An alternative is to make all this dynamic, based on the setting of the 
aarch64_calling_abi enum and to make the adjustments in 
aarch64_conditional_register_usage.

+# define CALL_USED_X18 0

Is that really correct?  If the register is really reserved, but some code 
modifies it anyway, this will cause the compiler to restore the old value at 
the end of a function; generally, for a reserved register, code that knows what 
it's doing would want to make permanent changes to this value.

+#ifdef TARGET_ARM64_MS_ABI
+# define STATIC_CHAIN_REGNUM   R17_REGNUM
+#else
+# define STATIC_CHAIN_REGNUM   R18_REGNUM
+#endif

If we went the enum way, we'd want something like

#define STATIC_CHAIN_REGNUM (calling_abi == AARCH64_CALLING_ABI_MS ? R17_REGNUM 
: R18_REGNUM)

R.

Re: [PATCH v1 02/13] aarch64: The aarch64-w64-mingw32 target implements

2024-02-22 Thread Richard Earnshaw (lists)

On 21/02/2024 18:26, Evgeny Karpov wrote:
> 
+/* Available call ABIs.  */
+enum calling_abi
+{
+  AARCH64_EABI = 0,
+  MS_ABI = 1
+};
+

The convention in this file seems to be that all enum types to start with 
aarch64.  Also, the enumeration values should start with the name of the 
enumeration type in upper case, so:

enum aarch64_calling_abi
{
  AARCH64_CALLING_ABI_EABI,
  AARCH64_CALLING_ABI_MS
};

or something very much like that.

R.

Re: [PATCH v1 01/13] Introduce aarch64-w64-mingw32 target

2024-02-22 Thread Richard Earnshaw (lists)

On 21/02/2024 18:16, Evgeny Karpov wrote:
> 
+aarch64*-*-mingw*)

Other targets are a bit inconsistent here as well, but, as Andrew mentioned, if 
you don't want to handle big-endian, it might be better to match 
aarch64-*-mingw* here.

R.

Re: [PATCH v1 05/13] Reuse MinGW from i386 for AArch64

2024-02-22 Thread Richard Earnshaw (lists)

On 21/02/2024 21:34, rep.dot@gmail.com wrote:
> On 21 February 2024 19:34:43 CET, Evgeny Karpov  
> wrote:
>>
> 
> Please use git send-email. Your mail ends up as empty as here, otherwise.

I don't see anything wrong with it; niether does patchwork 
(https://patchwork.sourceware.org/project/gcc/list/?series=31191) nor does the 
Linaro CI bot.  So perhaps it's your mailer that's misconfigured.

> 
> The ChangeLog has to be expressed in present tense, as mandated by the 
> standard; s/Moved/Move/g etc.

Agreed, but that's a detail that we can get to once the patch has been properly 
reviewed.

> 
> In any sane world ( and in gcc ) to fold, respectively a folder, is something 
> else compared to a directory ( which you probably mean when moving a file 
> from one directory to another directory as you seem to do ).
> 
> Most of the free world has left COFF behind since several decades, so I won't 
> comment on that. YMMV.

This isn't helpful.  Windows platforms use (a derivative of) COFF, so that's 
what the tools need to use when targetting that platform.

R.

Re: [PATCH] ARM: Fix conditional execution [PR113915]

2024-02-21 Thread Richard Earnshaw (lists)

On 21/02/2024 14:34, Wilco Dijkstra wrote:
> 
> By default most patterns can be conditionalized on Arm targets.  However
> Thumb-2 predication requires the "predicable" attribute be explicitly
> set to "yes".  Most patterns are shared between Arm and Thumb(-2) and are
> marked with "predicable".  Given this sharing, it does not make sense to
> use a different default for Arm.  So only consider conditional execution
> of instructions that have the predicable attribute set to yes.  This ensures
> that patterns not explicitly marked as such are never accidentally 
> conditionally executed like in the PR.
> 
> GLIBC codesize was ~0.014% worse due to atomic operations now being
> unconditional and a small number of patterns not setting "predicable".
> 
> Passes regress and bootstrap, OK for commit?
> 
> gcc/ChangeLog:
> PR target/113915
> * config/arm/arm.md (NOCOND): Improve comment.
> * config/arm/arm.cc (arm_final_prescan_insn): Add check for
> PREDICABLE_YES.
> 
> gcc/testsuite/ChangeLog:
> PR target/113915
> * gcc.target/arm/builtin-bswap-1.c: Fix test.
> 
> ---
> 
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 
> c44047c377a802d0c1dc1406df1b88a6b079607b..29771d284831a995adcf9adbb525396fbabb1ea2
>  100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -25610,11 +25610,12 @@ arm_final_prescan_insn (rtx_insn *insn)
> break;
>  
>   case INSN:
> -   /* Instructions using or affecting the condition codes make it
> -  fail.  */
> +   /* Check the instruction is explicitly marked as predicable.
> +  Instructions using or affecting the condition codes are not.  
> */
> scanbody = PATTERN (this_insn);
> if (!(GET_CODE (scanbody) == SET
>   || GET_CODE (scanbody) == PARALLEL)
> +   || get_attr_predicable (this_insn) != PREDICABLE_YES
> || get_attr_conds (this_insn) != CONDS_NOCOND)
>   fail = TRUE;
> break;
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> 5816409f86f1106b410c5e21d77e599b485f85f2..671f093862259c2c0df93a986fc22fa56a8ea6c7
>  100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -307,6 +307,8 @@
>  ;
>  ; NOCOND means that the instruction does not use or alter the condition
>  ;   codes but can be converted into a conditionally exectuted instruction.
> +;   Given that NOCOND is the default for most instructions if omitted,
> +;   the attribute predicable must be set to yes as well.
>  
>  (define_attr "conds" "use,set,clob,unconditional,nocond"
>   (if_then_else

While this is ok, 

> diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c 
> b/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c
> index 
> c1e7740d14d3ca4e93a71e38b12f82c19791a204..3de7cea81c1128c2fe5a9e1216e6b027d26bcab9
>  100644
> --- a/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c
> +++ b/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c
> @@ -5,14 +5,8 @@
> of the instructions.  Add an -mtune option known to facilitate that.  */
>  /* { dg-additional-options "-O2 -mtune=cortex-a53" } */
>  /* { dg-final { scan-assembler-not "orr\[ \t\]" } } */
> -/* { dg-final { scan-assembler-times "revsh\\t" 1 { target { arm_nothumb } } 
> } }  */
> -/* { dg-final { scan-assembler-times "revshne\\t" 1 { target { arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "revsh\\t" 2 { target { ! arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "rev16\\t" 1 { target { arm_nothumb } } 
> } }  */
> -/* { dg-final { scan-assembler-times "rev16ne\\t" 1 { target { arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "rev16\\t" 2 { target { ! arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "rev\\t" 2 { target { arm_nothumb } } } 
> }  */
> -/* { dg-final { scan-assembler-times "revne\\t" 2 { target { arm_nothumb } } 
> } }  */
> -/* { dg-final { scan-assembler-times "rev\\t" 4 { target { ! arm_nothumb } } 
> } }  */
> +/* { dg-final { scan-assembler-times "revsh\\t" 2 } }  */
> +/* { dg-final { scan-assembler-times "rev16\\t" 2 } }  */
> +/* { dg-final { scan-assembler-times "rev\\t" 4 } }  */
>  
>  #include "builtin-bswap.x"

This bit isn't.  The correct fix here is to fix the pattern(s) concerned to add 
the missing predicate.

Note that builtin-bswap.x explicitly mentions predicated mnemonics in the 
comments.

R.

Re: [PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-19 Thread Richard Earnshaw (lists)

On 19/02/2024 10:58, Tamar Christina wrote:
>> -Original Message-
>> From: Tamar Christina
>> Sent: Thursday, February 15, 2024 11:05 AM
>> To: Richard Earnshaw (lists) ; gcc-
>> patc...@gcc.gnu.org
>> Cc: nd ; Marcus Shawcroft ; Kyrylo
>> Tkachov ; Richard Sandiford
>> 
>> Subject: RE: [PATCH]AArch64: xfail modes_1.f90 [PR107071]
>>
>>> -Original Message-
>>> From: Richard Earnshaw (lists) 
>>> Sent: Thursday, February 15, 2024 11:01 AM
>>> To: Tamar Christina ; gcc-patches@gcc.gnu.org
>>> Cc: nd ; Marcus Shawcroft ;
>> Kyrylo
>>> Tkachov ; Richard Sandiford
>>> 
>>> Subject: Re: [PATCH]AArch64: xfail modes_1.f90 [PR107071]
>>>
>>> On 15/02/2024 10:57, Tamar Christina wrote:
>>>> Hi All,
>>>>
>>>> This test has never worked on AArch64 since the day it was committed.  It 
>>>> has
>>>> a number of issues that prevent it from working on AArch64:
>>>>
>>>> 1.  IEEE does not require that FP operations raise a SIGFPE for FP 
>>>> operations,
>>>>     only that an exception is raised somehow.
>>>>
>>>> 2. Most Arm designed cores don't raise SIGFPE and instead set a status 
>>>> register
>>>>    and some partner cores raise a SIGILL instead.
>>>>
>>>> 3. The way it checks for feenableexcept doesn't really work for AArch64.
>>>>
>>>> As such this test doesn't seem to really provide much value on AArch64 so 
>>>> we
>>>> should just xfail it.
>>>>
>>>> Regtested on aarch64-none-linux-gnu and no issues.
>>>>
>>>> Ok for master?
>>>
>>> Wouldn't it be better to just skip the test.  XFAIL just adds clutter to 
>>> verbose
>> output
>>> and suggests that someday the tools might be fixed for this case.
>>>
>>> Better still would be a new dg-requires fp_exceptions_raise_sigfpe as a 
>>> guard for
>>> the test.
>>
> 
> It looks like this is similar to 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314 so
> I'll just similarly skip it.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90 
> b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> index 
> 205c47f38007d06116289c19d6b23cf3bf83bd48..e29d8c678e6e51c3f2e5dac53c7703bb18a99ac4
>  100644
> --- a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> +++ b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> @@ -1,5 +1,5 @@
>  ! { dg-do run }
> -!
> +! { dg-skip-if "PR libfortran/78314" { aarch64*-*-gnu* arm*-*-gnueabi 
> arm*-*-gnueabihf } }
>  ! Test IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES
>  
> Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK, but please give the fortran maintainers 24hrs to comment before pushing.

R.

> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
>   PR fortran/107071
>   * gfortran.dg/ieee/modes_1.f90: skip aarch64, arm.

Re: [PATCH][GCC][Arm] Missing optimization pattern for rev16 on architectures with thumb1

2024-02-15 Thread Richard Earnshaw (lists)

On 12/02/2024 13:48, Matthieu Longo wrote:
> This patch marks a rev16 test as XFAIL for architectures having only Thumb1 
> support. The generated code is functionally correct, but the optimization is 
> disabled when -mthumb is equivalent to Thumb1. Fixing the root issue would 
> requires changes that are not suitable for GCC14 stage 4.
> 
> More information at https://linaro.atlassian.net/browse/GNU-1141
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/arm/rev16_2.c: XFAIL when compiled with Thumb1.

Thanks, I've tweaked the commit message slightly and pushed this.

R.

Re: [PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-15 Thread Richard Earnshaw (lists)

On 15/02/2024 10:57, Tamar Christina wrote:
> Hi All,
> 
> This test has never worked on AArch64 since the day it was committed.  It has
> a number of issues that prevent it from working on AArch64:
> 
> 1.  IEEE does not require that FP operations raise a SIGFPE for FP operations,
>     only that an exception is raised somehow.
> 
> 2. Most Arm designed cores don't raise SIGFPE and instead set a status 
> register
>    and some partner cores raise a SIGILL instead.
> 
> 3. The way it checks for feenableexcept doesn't really work for AArch64.
> 
> As such this test doesn't seem to really provide much value on AArch64 so we
> should just xfail it.
> 
> Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Wouldn't it be better to just skip the test.  XFAIL just adds clutter to 
verbose output and suggests that someday the tools might be fixed for this case.

Better still would be a new dg-requires fp_exceptions_raise_sigfpe as a guard 
for the test.

R.

> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
>     PR fortran/107071
>     * gfortran.dg/ieee/modes_1.f90: xfail aarch64.
> 
> --- inline copy of patch --
> diff --git a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90 
> b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> index 
> 205c47f38007d06116289c19d6b23cf3bf83bd48..3667571969427ae7b2b96684ec1af8b3fdd4985f
>  100644
> --- a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> +++ b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> @@ -1,4 +1,4 @@
> -! { dg-do run }
> +! { dg-do run { xfail { aarch64*-*-* } } }
>  !
>  ! Test IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES
>  
> 
> 
> 
> 
> --

Re: [PATCH] Arm: Fix incorrect tailcall-generation for indirect calls [PR113780]

2024-02-14 Thread Richard Earnshaw (lists)

On 14/02/2024 09:20, Tejas Belagod wrote:
> On 2/7/24 11:41 PM, Richard Earnshaw (lists) wrote:
>> On 07/02/2024 07:59, Tejas Belagod wrote:
>>> This patch fixes a bug that causes indirect calls in PAC-enabled functions
>>> to be tailcalled incorrectly when all argument registers R0-R3 are used.
>>>
>>> Tested on arm-none-eabi for armv8.1-m.main. OK for trunk?
>>>
>>> 2024-02-07  Tejas Belagod  
>>>
>>> PR target/113780
>>> * gcc/config/arm.cc (arm_function_ok_for_sibcall): Don't allow tailcalls
>>>   for indirect calls with 4 or more arguments in pac-enabled functions.
>>>
>>> * gcc.target/arm/pac-sibcall.c: New.
>>> ---
>>>   gcc/config/arm/arm.cc  | 12 
>>>   gcc/testsuite/gcc.target/arm/pac-sibcall.c | 11 +++
>>>   2 files changed, 19 insertions(+), 4 deletions(-)
>>>   create mode 100644 gcc/testsuite/gcc.target/arm/pac-sibcall.c
>>>
>>> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
>>> index c44047c377a..c1f8286a4d4 100644
>>> --- a/gcc/config/arm/arm.cc
>>> +++ b/gcc/config/arm/arm.cc
>>> @@ -7980,10 +7980,14 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
>>>     && DECL_WEAK (decl))
>>>   return false;
>>>   -  /* We cannot do a tailcall for an indirect call by descriptor if all 
>>> the
>>> - argument registers are used because the only register left to load the
>>> - address is IP and it will already contain the static chain.  */
>>> -  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
>>> +  /* We cannot do a tailcall for an indirect call by descriptor or for an
>>> + indirect call in a pac-enabled function if all the argument registers
>>> + are used because the only register left to load the address is IP and
>>> + it will already contain the static chain or the PAC signature in the
>>> + case of PAC-enabled functions.  */
>>
>> This comment is becoming a bit unwieldy.  I suggest restructuring it as:
>>
>> We cannot tailcall an indirect call by descriptor if all the call-clobbered
>> general registers are live (r0-r3 and ip).  This can happen when:
>>    - IP contains the static chain, or
>>    - IP is needed for validating the PAC signature.
>>
>>
>>> +  if (!decl
>>> +  && ((CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
>>> +  || arm_current_function_pac_enabled_p()))
>>>   {
>>>     tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
>>>     CUMULATIVE_ARGS cum;
>>> diff --git a/gcc/testsuite/gcc.target/arm/pac-sibcall.c 
>>> b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
>>> new file mode 100644
>>> index 000..c57bf7a952c
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
>>> @@ -0,0 +1,11 @@
>>> +/* Testing return address signing.  */
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target mbranch_protection_ok } */
>>> +/* { dg-options " -mcpu=cortex-m85 -mbranch-protection=pac-ret+leaf -O2" } 
>>> */
>>
>> No, you can't just add options like this, you need to first check that they 
>> won't result in conflicts with other options on the command line.  See 
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644077.html for an 
>> example of how to handle this.
>>
> Thanks for the review, Richard. Respin attached.
> 
> Thanks,
> Tejas.
> 
>>> +
>>> +void fail(void (*f)(int, int, int, int))
>>> +{
>>> +  f(1, 2, 3, 4);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-not "bx\tip\t@ indirect register sibling 
>>> call" } } */
>>
>> R.
>>
+++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
@@ -0,0 +1,14 @@
+/* If all call-clobbered general registers are live (r0-r3, ip), disable
+   indirect tail-call for a PAC-enabled function.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
This only checks if -mbranch-protection can work with the existing 
architecture/cpu; not with the flags you're about to add below.  You should 
check for arm_arch_v8_1m_main_pacbti_ok instead; then you can assume that 
-mbranch-protection can be added.

+/* { dg-add-options arm_arch_v8_1m_main_pacbti } */
+/* { dg-additional-options "-mbranch-protection=pac-ret+leaf -O2" } */

Otherwise this is OK if you fix the above.

R.

Re: [PATCH] testsuite: Disable test for incompatible Arm targets

2024-02-13 Thread Richard Earnshaw





On 13/02/2024 10:44, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-13?

The alternative approach (that is changing the result a bit) is to drop
the special treatment for arm*-*-*. I'm not sure if this is prefered or
just disable the test for incompatible flags for arm*-*-*.

--

The test assumes it's okay to supply -march=armv7-a+simd, but it depends
on what target you are running the tests for.  For example, running the
GCC testsuite for Cortex-M0 produces the follwing entry in the logs:


Running the testsuite with -mcpu= in runtest/site.exp flags will uncover 
a whole host of problems with tests that try to specify an architecture. 
 It's essentially broken/unsupported at present.


I have some ideas for how to fix this properly, but they will have to 
wait for gcc-15 now.  In the mean time, I'd rather we didn't try to 
paper over the problem by putting random changes into the tests right now.


R.



Testing gcc.dg/pr41574.c
doing compile
Executing on host: arm-none-eabi-gcc .../pr41574.c  -mthumb -march=armv6s-m 
-mcpu=cortex-m0 -mfloat-abi=soft   -fdiagnostics-plain-output  -O2 
-march=armv7-a -mfloat-abi=softfp -mfpu=neon -fno-unsafe-math-optimizations 
-fdump-rtl-combine -ffat-lto-objects -S -o pr41574.s(timeout = 800)
spawn -ignore SIGHUP arm-none-eabi-gcc .../pr41574.c -mthumb -march=armv6s-m 
-mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output -O2 -march=armv7-a 
-mfloat-abi=softfp -mfpu=neon -fno-unsafe-math-optimizations -fdump-rtl-combine 
-ffat-lto-objects -S -o pr41574.s
pid is 9799 -9799
cc1: warning: switch '-mcpu=cortex-m0' conflicts with switch 
'-march=armv7-a+simd'
pid is -1
output is cc1: warning: switch '-mcpu=cortex-m0' conflicts with switch 
'-march=armv7-a+simd'
  status 0
FAIL: gcc.dg/pr41574.c (test for excess errors)
Excess errors:
cc1: warning: switch '-mcpu=cortex-m0' conflicts with switch 
'-march=armv7-a+simd'
PASS: gcc.dg/pr41574.c scan-rtl-dump-not combine "\\(plus:DF \\(mult:DF"

Patch has been verified on Linux.

gcc/testsuite/ChangeLog:

* gcc.dg/pr41574.c: Disable test for Arm targets incompatible
with -march=armv7-a+simd.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.dg/pr41574.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr41574.c b/gcc/testsuite/gcc.dg/pr41574.c
index 062c0044532..f6af0c34273 100644
--- a/gcc/testsuite/gcc.dg/pr41574.c
+++ b/gcc/testsuite/gcc.dg/pr41574.c
@@ -1,6 +1,7 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -march=armv7-a -mfloat-abi=softfp -mfpu=neon 
-fno-unsafe-math-optimizations -fdump-rtl-combine" { target { arm*-*-* } } } */
-/* { dg-options "-O2 -fno-unsafe-math-optimizations -fdump-rtl-combine" { 
target { ! arm*-*-* } } } */
+/* { dg-options "-O2 -fno-unsafe-math-optimizations -fdump-rtl-combine" } */
+/* { dg-require-effective-target arm_arch_v7a_neon_multilib { target { 
arm*-*-* } } } */
+/* { dg-additional-options "-march=armv7-a -mfloat-abi=softfp -mfpu=neon" { 
target { arm*-*-* } } } */
  
  
  static const double one=1.0;

Re: [PATCH] testsuite, arm: Fix testcase arm/pr112337.c to check for the options first

2024-02-09 Thread Richard Earnshaw (lists)

On 30/01/2024 17:07, Saurabh Jha wrote:
> Hey,
> 
> Previously, this test was added to fix this bug: 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337. However, it did not 
> check the compilation options before using them, leading to errors.
> 
> This patch fixes the test by first checking whether it can use the options 
> before using them.
> 
> Tested for arm-none-eabi and found no regressions. The output of check-gcc 
> with RUNTESTFLAGS="arm.exp=*" changed like this:
> 
> Before:
> # of expected passes  5963
> # of unexpected failures  64
> 
> After:
> # of expected passes  5964
> # of unexpected failures  63
> 
> Ok for master?
> 
> Regards,
> Saurabh
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/arm/pr112337.c: Check whether we can use the compilation 
> options before using them.

My apologies for missing this earlier.  It didn't show up in patchwork. That's 
most likely because the attachment is a binary blob instead of text/plain.  
That also means that the Linaro CI system hasn't seen this patch either.  
Please can you fix your mailer to add plain text patch files.

-/* { dg-options "-O2 -march=armv8.1-m.main+fp.dp+mve.fp -mfloat-abi=hard" } */
+/* { dg-require-effective-target arm_hard_ok } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-options "-O2 -mfloat-abi=hard" } */
+/* { dg-add-options arm_v8_1m_mve } */

This is moving in the right direction, but it adds more than necessary now: 
checking for, and adding -mfloat-abi=hard is not necessary any more as 
arm_v8_1m_mve_ok will work out what float-abi flags are needed to make the 
options work. (What's more, it will prevent the test from running if the base 
configuration of the compiler is incompatible with the hard float ABI, which is 
more than we need.).

So please can you re-spin removing the hard-float check and removing that from 
dg-options.

Thanks,
R.

[PATCH] arm: testsuite: fix issues relating to fp16 alternative testing

2024-02-08 Thread Richard Earnshaw


The v*_fp16_xN_1.c tests on Arm have been unstable since they were
added.  This is not a problem with the tests themselves, or even the
patches that were added, but with the testsuite infrastructure.  It
turned out that another set of dg- tests for fp16 were corrupting the
cached set of options used by the new tests, leading to running the
tests with incorrect flags.

So the primary goal of this patch is to fix the incorrect internal
caching of the options needed to enable fp16 alternative format on
Arm: the code was storing the result in the same variable that was
being used for neon_fp16 and this was leading to testsuite instability
for tests that were checking for neon with fp16.

But in cleaning this up I also noted that we weren't then applying the
flags correctly having detected what they were, so we also address
that.

I suspect there are still some further issues to address here, since
the framework does not correctly test that the multilibs and startup
code enable alternative format; but this is still an improvement over
what we had before.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_fp16_alternative_ok_nocache): Use
et_arm_fp16_alternative_flags to cache the result.  Improve test
for FP16 availability.
(add_options_for_arm_fp16_alternative): Use
et_arm_fp16_alternative_flags.
* g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Update dg-* flags.
* g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
* gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
* gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
* gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.
* gcc.target/arm/fp16-compile-alt-1.c: Likewise.
* gcc.target/arm/fp16-compile-alt-10.c: Likewise.
* gcc.target/arm/fp16-compile-alt-11.c: Likewise.
* gcc.target/arm/fp16-compile-alt-12.c: Likewise.
* gcc.target/arm/fp16-compile-alt-2.c: Likewise.
* gcc.target/arm/fp16-compile-alt-3.c: Likewise.
* gcc.target/arm/fp16-compile-alt-4.c: Likewise.
* gcc.target/arm/fp16-compile-alt-5.c: Likewise.
* gcc.target/arm/fp16-compile-alt-6.c: Likewise.
* gcc.target/arm/fp16-compile-alt-7.c: Likewise.
* gcc.target/arm/fp16-compile-alt-8.c: Likewise.
* gcc.target/arm/fp16-compile-alt-9.c: Likewise.
* gcc.target/arm/fp16-rounding-alt-1.c: Likewise.
---
 .../g++.dg/ext/arm-fp16/arm-fp16-ops-3.C |  2 +-
 .../g++.dg/ext/arm-fp16/arm-fp16-ops-4.C |  3 ++-
 .../gcc.dg/torture/arm-fp16-int-convert-alt.c|  2 +-
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c|  2 +-
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c|  3 ++-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c  |  3 ++-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c  |  3 ++-
 .../gcc.target/arm/fp16-compile-alt-1.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-10.c |  3 ++-
 .../gcc.target/arm/fp16-compile-alt-11.c |  3 ++-
 .../gcc.target/arm/fp16-compile-alt-12.c |  2 +-
 .../gcc.target/arm/fp16-compile-alt-2.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-3.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-4.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-5.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-6.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-7.c  |  3 ++-
 .../gcc.target/arm/fp16-compile-alt-8.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-9.c  |  2 +-
 .../gcc.target/arm/fp16-rounding-alt-1.c |  4 +++-
 gcc/testsuite/lib/target-supports.exp| 16 
 21 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
index 29080c7514f..5eceb3074df 100644
--- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
+++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
@@ -1,6 +1,6 @@
 /* Test various operators on __fp16 and mixed __fp16/float operands.  */
 /* { dg-do run { target arm*-*-* } } */
 /* { dg-require-effective-target arm_fp16_alternative_ok } */
-/* { dg-options "-mfp16-format=alternative" } */
+/* { dg-add-options arm_fp16_alternative } */
 
 #include "arm-fp16-ops.h"
diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
index 4be8883faad..d86019f1469 100644
--- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
+++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
@@ -1,6 +1,7 @@
 /* Test various operators on __fp16 and mixed __fp16/float operands.  */
 /* { dg-do run { target arm*-*-* } } */
 /* { dg-require-effective-target arm_fp16_alternative_ok } */
-/* { dg-options "-mfp16-format=alternative -ffast-math" } */
+/* { dg-options "-ffast-math" } */
+/* {

Re: [PATCH] Arm: Fix incorrect tailcall-generation for indirect calls [PR113780]

2024-02-07 Thread Richard Earnshaw (lists)

On 07/02/2024 07:59, Tejas Belagod wrote:
> This patch fixes a bug that causes indirect calls in PAC-enabled functions
> to be tailcalled incorrectly when all argument registers R0-R3 are used.
> 
> Tested on arm-none-eabi for armv8.1-m.main. OK for trunk?
> 
> 2024-02-07  Tejas Belagod  
> 
>   PR target/113780
>   * gcc/config/arm.cc (arm_function_ok_for_sibcall): Don't allow tailcalls
>   for indirect calls with 4 or more arguments in pac-enabled functions.
> 
>   * gcc.target/arm/pac-sibcall.c: New.
> ---
>  gcc/config/arm/arm.cc  | 12 
>  gcc/testsuite/gcc.target/arm/pac-sibcall.c | 11 +++
>  2 files changed, 19 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pac-sibcall.c
> 
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index c44047c377a..c1f8286a4d4 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -7980,10 +7980,14 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
>&& DECL_WEAK (decl))
>  return false;
>  
> -  /* We cannot do a tailcall for an indirect call by descriptor if all the
> - argument registers are used because the only register left to load the
> - address is IP and it will already contain the static chain.  */
> -  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
> +  /* We cannot do a tailcall for an indirect call by descriptor or for an
> + indirect call in a pac-enabled function if all the argument registers
> + are used because the only register left to load the address is IP and
> + it will already contain the static chain or the PAC signature in the
> + case of PAC-enabled functions.  */

This comment is becoming a bit unwieldy.  I suggest restructuring it as:

We cannot tailcall an indirect call by descriptor if all the call-clobbered
general registers are live (r0-r3 and ip).  This can happen when:
  - IP contains the static chain, or
  - IP is needed for validating the PAC signature.


> +  if (!decl
> +  && ((CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
> +   || arm_current_function_pac_enabled_p()))
>  {
>tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
>CUMULATIVE_ARGS cum;
> diff --git a/gcc/testsuite/gcc.target/arm/pac-sibcall.c 
> b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
> new file mode 100644
> index 000..c57bf7a952c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pac-sibcall.c
> @@ -0,0 +1,11 @@
> +/* Testing return address signing.  */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target mbranch_protection_ok } */
> +/* { dg-options " -mcpu=cortex-m85 -mbranch-protection=pac-ret+leaf -O2" } */

No, you can't just add options like this, you need to first check that they 
won't result in conflicts with other options on the command line.  See 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644077.html for an 
example of how to handle this.

> +
> +void fail(void (*f)(int, int, int, int))
> +{
> +  f(1, 2, 3, 4);
> +}
> +
> +/* { dg-final { scan-assembler-not "bx\tip\t@ indirect register sibling 
> call" } } */

R.

Re: [PATCH v2] arm: Fix missing bti instruction for virtual thunks

2024-02-02 Thread Richard Earnshaw (lists)

On 26/01/2024 15:31, Richard Ball wrote:
> v2: Formatting and test options fix.
> 
> Adds missing bti instruction at the beginning of a virtual
> thunk, when bti is enabled.
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.cc (arm_output_mi_thunk): Emit
>   insn for bti_c when bti is enabled.
> 
> gcc/testsuite/ChangeLog:
> 
> * lib/target-supports.exp: Add v8_1_m_main_pacbti.
> * g++.target/arm/bti_thunk.C: New test.

OK, thanks.

R.

Re: [PATCH v3 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2024-01-31 Thread Richard Earnshaw (lists)

On 30/01/2024 14:09, Andre Simoes Dias Vieira wrote:
> Hi Richard,
> 
> Thanks for the reviews, I'm making these changes but just a heads up.
> 
> When hardcoding LR_REGNUM like this we need to change the way we compare the 
> register in doloop_condition_get. This function currently compares the rtx 
> nodes by address, which I think happens to be fine before we assign hard 
> registers, as I suspect we always share the rtx node for the same pseudo, but 
> when assigning registers it seems like we create copies, so things like:
> `XEXP (inc_src, 0) == reg` will fail for
> inc_src: (plus (reg LR) (const_int -n)'
> reg: (reg LR)
> 
> Instead I will substitute the operand '==' with calls to 'rtx_equal_p (op1, 
> op2, NULL)'.

Yes, that's fine.

R.

> 
> Sound good?
> 
> Kind regards,
> Andre
> 
> 
> From: Richard Earnshaw (lists) 
> Sent: Tuesday, January 30, 2024 11:36 AM
> To: Andre Simoes Dias Vieira; gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov; Stam Markianos-Wright
> Subject: Re: [PATCH v3 2/2] arm: Add support for MVE Tail-Predicated Low 
> Overhead Loops
> 
> On 19/01/2024 14:40, Andre Vieira wrote:
>>
>> Respin after comments from Kyrill and rebase. I also removed an if-then-else
>> construct in arm_mve_check_reg_origin_is_num_elems similar to the other 
>> functions
>> Kyrill pointed out.
>>
>> After an earlier comment from Richard Sandiford I also added comments to the
>> two tail predication patterns added to explain the need for the unspecs.
> 
> [missing ChangeLog]
> 
> I'm just going to focus on loop-doloop.c in this reply, I'll respond to the 
> other bits in a follow-up.
> 
>   2)  (set (reg) (plus (reg) (const_int -1))
> - (set (pc) (if_then_else (reg != 0)
> -(label_ref (label))
> -(pc))).
> +(set (pc) (if_then_else (reg != 0)
> +(label_ref (label))
> +(pc))).
> 
>   Some targets (ARM) do the comparison before the branch, as in the
>   following form:
> 
> - 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0)))
> -   (set (reg) (plus (reg) (const_int -1)))])
> -(set (pc) (if_then_else (cc == NE)
> ...
> 
> 
> This comment is becoming confusing.  Really the text leading up to 3)... 
> should be inside 3.  Something like:
> 
>   3) Some targets (ARM) do the comparison before the branch, as in the
>   following form:
> 
>   (parallel [(set (cc) (compare (plus (reg) (const_int -1)) 0))
>  (set (reg) (plus (reg) (const_int -1)))])
>   (set (pc) (if_then_else (cc == NE)
>   (label_ref (label))
>   (pc)))])
> 
> 
> The same issue on the comment structure also applies to the new point 4...
> 
> +  The ARM target also supports a special case of a counter that 
> decrements
> +  by `n` and terminating in a GTU condition.  In that case, the compare 
> and
> +  branch are all part of one insn, containing an UNSPEC:
> +
> +  4) (parallel [
> +   (set (pc)
> +   (if_then_else (gtu (unspec:SI [(plus:SI (reg:SI 14 lr)
> +   (const_int -n))])
> +  (const_int n-1]))
> +   (label_ref)
> +   (pc)))
> +   (set (reg:SI 14 lr)
> +(plus:SI (reg:SI 14 lr)
> + (const_int -n)))
> + */
> 
> I think this needs a bit more clarification.  Specifically that this 
> construct supports a predicated vectorized do loop.  Also, the placement of 
> the unspec inside the comparison is ugnly and unnecessary.  It should be 
> sufficient to have the unspec inside a USE expression, which the mid-end can 
> then ignore entirely.  So
> 
> (parallel
>  [(set (pc) (if_then_else (gtu (plus (reg) (const_int -n))
>(const_int n-1))
>   (label_ref) (pc)))
>   (set (reg) (plus (reg) (const_int -n)))
>   (additional clobbers and uses)])
> 
> For Arm, we then add a (use (unspec [(const_int 0)] N)) that is specific to 
> this pattern to stop anything else from matching it.
> 
> Note that we don't need to mention that the register is 'LR' or the modes, 
> those are specific to a particular backend, not the generic pattern we want 
> to match.
> 
> +  || !CONST_INT_P (XEXP (inc_src, 1))
> +  || INTVAL (XEXP (inc_src, 1)) >= 0)
>  re

Re: [PATCH v3 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2024-01-30 Thread Richard Earnshaw (lists)

On 19/01/2024 14:40, Andre Vieira wrote:
> 
> Respin after comments from Kyrill and rebase. I also removed an if-then-else
> construct in arm_mve_check_reg_origin_is_num_elems similar to the other 
> functions
> Kyrill pointed out.
> 
> After an earlier comment from Richard Sandiford I also added comments to the
> two tail predication patterns added to explain the need for the unspecs.

[missing ChangeLog]

I'm just going to focus on loop-doloop.c in this reply, I'll respond to the 
other bits in a follow-up.

  2)  (set (reg) (plus (reg) (const_int -1))
- (set (pc) (if_then_else (reg != 0)
-(label_ref (label))
-(pc))).  
+(set (pc) (if_then_else (reg != 0)
+(label_ref (label))
+(pc))).

  Some targets (ARM) do the comparison before the branch, as in the
  following form:

- 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0)))
-   (set (reg) (plus (reg) (const_int -1)))])
-(set (pc) (if_then_else (cc == NE)
...

This comment is becoming confusing.  Really the text leading up to 3)... should 
be inside 3.  Something like:

  3) Some targets (ARM) do the comparison before the branch, as in the
  following form:

  (parallel [(set (cc) (compare (plus (reg) (const_int -1)) 0))
 (set (reg) (plus (reg) (const_int -1)))])
  (set (pc) (if_then_else (cc == NE)
  (label_ref (label))
  (pc)))])

The same issue on the comment structure also applies to the new point 4...

+  The ARM target also supports a special case of a counter that decrements
+  by `n` and terminating in a GTU condition.  In that case, the compare and
+  branch are all part of one insn, containing an UNSPEC:
+
+  4) (parallel [
+   (set (pc)
+   (if_then_else (gtu (unspec:SI [(plus:SI (reg:SI 14 lr)
+   (const_int -n))])
+  (const_int n-1]))
+   (label_ref)
+   (pc)))
+   (set (reg:SI 14 lr)
+(plus:SI (reg:SI 14 lr)
+ (const_int -n)))
+ */

I think this needs a bit more clarification.  Specifically that this construct 
supports a predicated vectorized do loop.  Also, the placement of the unspec 
inside the comparison is ugnly and unnecessary.  It should be sufficient to 
have the unspec inside a USE expression, which the mid-end can then ignore 
entirely.  So

(parallel
 [(set (pc) (if_then_else (gtu (plus (reg) (const_int -n))
   (const_int n-1))
  (label_ref) (pc)))
  (set (reg) (plus (reg) (const_int -n)))
  (additional clobbers and uses)])

For Arm, we then add a (use (unspec [(const_int 0)] N)) that is specific to 
this pattern to stop anything else from matching it.

Note that we don't need to mention that the register is 'LR' or the modes, 
those are specific to a particular backend, not the generic pattern we want to 
match.

+  || !CONST_INT_P (XEXP (inc_src, 1))
+  || INTVAL (XEXP (inc_src, 1)) >= 0)
 return 0;
+  int dec_num = abs (INTVAL (XEXP (inc_src, 1)));

We can just use '-INTVAL(...)' here, we've verified just above that the 
constant is negative.

-  if ((XEXP (condition, 0) == reg)
+  /* For the ARM special case of having a GTU: re-form the condition without
+ the unspec for the benefit of the middle-end.  */
+  if (GET_CODE (condition) == GTU)
+{
+  condition = gen_rtx_fmt_ee (GTU, VOIDmode, inc_src,
+ GEN_INT (dec_num - 1));
+  return condition;
+}

If you make the change I mentioned above, this re-forming isn't needed any 
more, so the arm-specific comment goes away

-   {
+{
  if (GET_CODE (pattern) != PARALLEL)
  /*  For the second form we expect:

You've fixed the indentation of the brace (good), but the body of the braced 
expression needs re-indenting as well.

R.

Re: [PATCH v3 1/2] arm: Add define_attr to to create a mapping between MVE predicated and unpredicated insns

2024-01-30 Thread Richard Earnshaw (lists)

On 19/01/2024 14:40, Andre Vieira wrote:
> 
> Reposting for testing purposes, no changes from v2 (other than rebase).

We seem to have lost the ChangeLog for this hunk :(

The code itself looks OK, though.

Re: [PATCH][GCC][Arm] Add pattern for bswap + rotate -> rev16 [Bug 108933]

2024-01-29 Thread Richard Earnshaw


On 29/01/2024 14:14, Matthieu Longo wrote:

Hi Richard,

Please find below the new patch where I addressed your comments and 
updated the changelog.


rev16 pattern was not recognised anymore as a change in the bswap tree
pass was introducing a new GIMPLE form, not recognized by the assembly
final transformation pass.

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

gcc/ChangeLog:

 PR target/108933
 * config/arm/arm.md (arm_rev16si2): Convert to define_insn.
 Correct generated RTL.
 (arm_rev16si2_alt1): Correctly handle conditional execution.
 (arm_rev16si2_alt2): Likewise.

gcc/testsuite/ChangeLog:

 PR target/108933
 * gcc.target/arm/rev16.c: Moved to...
 * gcc.target/arm/rev16_1.c: ...here.
 * gcc.target/arm/rev16_2.c: New test to check that rev16 is
 emitted.


Thanks.  I've tweaked the commit message very slightly and pushed this.

Could you please prepare backports for gcc-11 thru 13?  It should just 
be a matter of cherry-picking the commit.


R.



On 2024-01-22 16:25, Richard Earnshaw (lists) wrote:

On 22/01/2024 12:18, Matthieu Longo wrote:

rev16 pattern was not recognised anymore as a change in the bswap tree
pass was introducing a new GIMPLE form, not recognized by the assembly
final transformation pass.

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

gcc/ChangeLog:

 PR target/108933
 * config/arm/arm.md (*arm_rev16si2_alt3): new pattern to 
convert

   a bswap + rotate by 16 bits into rev16


ChangeLog entries need to be written as sentences, so start with a 
capital letter and end with a full stop; continuation lines should 
start in column 8 (one hard tab, don't use spaces).  But in this case, 
"New pattern." is sufficient.




gcc/testsuite/ChangeLog:

 PR target/108933
 * gcc.target/arm/rev16.c: Moved to...
 * gcc.target/arm/rev16_1.c: ...here.
 * gcc.target/arm/rev16_2.c: New test to check that rev16 is
   emitted.



+;; Similar pattern to match (rotate (bswap) 16)
+(define_insn "*arm_rev16si2_alt3"
+  [(set (match_operand:SI 0 "register_operand" "=l,r")
+    (rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" 
"l,r"))

+ (const_int 16)))]
+  "arm_arch6"
+  "rev16\\t%0, %1"
+  [(set_attr "arch" "t,32")
+   (set_attr "length" "2,4")
+   (set_attr "type" "rev")]
+)
+

Unfortunately, this is insufficient.  When generating Arm or Thumb2 
code (but not thumb1) we also have to handle conditional execution: we 
need to have '%?' in the output template at the point where a 
condition code might be needed.  That means we need separate output 
templates for all three alternatives (as we need a 16-bit variant for 
thumb2 that's conditional and a 16-bit for thumb1 that isn't).  See 
the output of arm_rev16 for a guide of what is really needed.


I note that the arm_rev16si2_alt1, and arm_rev16si2_alt2 patterns are 
incorrect in this regard as well; that will need fixing.


I also see that arm_rev16si2 currently expands to the alt1 variant 
above; given that the preferred canonical form would now appear to use 
bswap + rotate, we should change that as well.  In fact, we can merge 
your new pattern with the expand entirely and eliminate the need to 
call gen_arm_rev16si2_alt1.  Something like:


(define_insn "arm_rev16si2"
   [(set (match_operand:SI 0 "s_register_operand")
 (rotate:SI (bswap:SI (match_operand:SI 1 
"s_register_operand")) (const_int 16))]

   "arm_arch6"
   "@
   rev16...
   ...


R.

Re: [PATCH] Make gcc.target/arm/bics_3.c testcase a bit more generic [PR113542]

2024-01-25 Thread Richard Earnshaw (lists)

On 25/01/2024 10:29, Maxim Kuvyrkov wrote:
> After fwprop improvement in r14-8319-g86de9b66480, codegen in
> bics_3.c test changed from "bics" to "bic" instruction, with
> the overall instruction stream remaining at the same quality.
> 
> This patch makes the scan-assembler directive accept both
> "bics" and "bic".
> 
> BEFORE r14-8319-g86de9b66480:
>   bicsr0, r0, r1 @ 9  [c=4 l=4]  *andsi_notsi_si_compare0_scratch
>   mov r0, #1  @ 23[c=4 l=4]  *thumb2_movsi_vfp/1
>   it  eq
>   moveq   r0, #0  @ 26[c=8 l=4]  *p *thumb2_movsi_vfp/2
>   bx  lr  @ 29[c=8 l=4]  *thumb2_return
> 
> AFTER r14-8319-g86de9b66480:
>   bic r0, r0, r1  @ 8 [c=4 l=4]  andsi_notsi_si
>   subsr0, r0, #0  @ 22[c=4 l=4]  cmpsi2_addneg/0
>   it  ne
>   movne   r0, #1  @ 23[c=8 l=4]  *p *thumb2_movsi_vfp/2
>   bx  lr  @ 26[c=8 l=4]  *thumb2_return
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/113542
>   * gcc.target/arm/bics_3.c: Update scan-assembler directive.
> ---
>  gcc/testsuite/gcc.target/arm/bics_3.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/bics_3.c 
> b/gcc/testsuite/gcc.target/arm/bics_3.c
> index e056b264e15..c5bed3c92d2 100644
> --- a/gcc/testsuite/gcc.target/arm/bics_3.c
> +++ b/gcc/testsuite/gcc.target/arm/bics_3.c
> @@ -35,6 +35,6 @@ main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 
> 2 } } */
> -/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, 
> .sl #2" 1 } } */
> +/* { dg-final { scan-assembler-times "bics?\tr\[0-9\]+, r\[0-9\]+, 
> r\[0-9\]+" 2 } } */
> +/* { dg-final { scan-assembler-times "bics?\tr\[0-9\]+, r\[0-9\]+, 
> r\[0-9\]+, .sl #2" 1 } } */
>  


The test was added (r6-823-g0454e698401a3e) specifically to check that a BICS 
instruction was being generated.  Whether or not that is right is somewhat 
debatable, but this change seems to be papering over a different issue.

Either we should generate BICS, making this change incorrect, or we should 
disable the test for thumb code on the basis that this isn't really a win.

But really, we should fix the compiler to do better here.  We really want 
something like

BICS  r0, r0, r1  // r0 is 0 or non-zero
MOVNE r0, #1  // convert all non-zero to 1

in Arm state (ie using the BICS instruction to set the result to zero); and in 
thumb2, perhaps something like:

BICS  r0, r0, r1
ITne
MOVNE r0, #1

or maybe even better:

BIC  r0, r0, r1
SUBS r1, r0, #1
SBC  r0, r0, r1

which is slightly better than BICS because SUBS breaks a condition-code chain 
(all the flag bits are set).

There are similar quality issues for other NE(arith-op, 0) cases; we just don't 
have tests for those.

R.

Re: [PATCH] arm: Fix missing bti instruction for virtual thunks

2024-01-24 Thread Richard Earnshaw (lists)

On 23/01/2024 15:53, Richard Ball wrote:
> Adds missing bti instruction at the beginning of a virtual
> thunk, when bti is enabled.
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.cc (arm_output_mi_thunk): Emit
>   insn for bti_c when bti is enabled.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.target/arm/bti_thunk.C: New test.


diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
e5a944486d7bd583627b0e22dfe8f95862e975bb..91eee8be7c1a59118fbf443557561fb3e0689d61
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -29257,6 +29257,8 @@ arm_output_mi_thunk (FILE *file, tree thunk, 
HOST_WIDE_INT delta,
   const char *fnname = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk));
 
   assemble_start_function (thunk, fnname);
+  if (aarch_bti_enabled ())
+emit_insn (aarch_gen_bti_c());

Missing space between ...bit_c and the parenthesis.

   if (TARGET_32BIT)
 arm32_output_mi_thunk (file, thunk, delta, vcall_offset, function);
   else

diff --git a/gcc/testsuite/g++.target/arm/bti_thunk.C 
b/gcc/testsuite/g++.target/arm/bti_thunk.C
new file mode 100644
index 
..5c4a8e5a8d74581eca2b877c000a5b34ddca0e9b
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/bti_thunk.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.1-m.main+pacbti -O1 -mbranch-protection=bti 
--save-temps" } */

You can't just add options like this; they may not work with other options 
passed by the testsuite framework.  Instead, you should a suitable entry to 
lib/target-supports.exp in the table starting "foreach { armfunc armflag 
armdefs } {" that tests whether the options can be safely added, and then use 
dg-require-effective-target and dg-add-options for your new set of options.

\ No newline at end of file

Please add one :)

R.

Re: [PATCH][GCC][Arm] Add pattern for bswap + rotate -> rev16 [Bug 108933]

2024-01-22 Thread Richard Earnshaw (lists)

On 22/01/2024 12:18, Matthieu Longo wrote:
> rev16 pattern was not recognised anymore as a change in the bswap tree
> pass was introducing a new GIMPLE form, not recognized by the assembly
> final transformation pass.
> 
> More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933
> 
> gcc/ChangeLog:
> 
>     PR target/108933
>     * config/arm/arm.md (*arm_rev16si2_alt3): new pattern to convert
>   a bswap + rotate by 16 bits into rev16

ChangeLog entries need to be written as sentences, so start with a capital 
letter and end with a full stop; continuation lines should start in column 8 
(one hard tab, don't use spaces).  But in this case, "New pattern." is 
sufficient.

> 
> gcc/testsuite/ChangeLog:
> 
>     PR target/108933
>     * gcc.target/arm/rev16.c: Moved to...
>     * gcc.target/arm/rev16_1.c: ...here.
>     * gcc.target/arm/rev16_2.c: New test to check that rev16 is
>   emitted.

+;; Similar pattern to match (rotate (bswap) 16)
+(define_insn "*arm_rev16si2_alt3"
+  [(set (match_operand:SI 0 "register_operand" "=l,r")
+(rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" "l,r"))
+ (const_int 16)))]
+  "arm_arch6"
+  "rev16\\t%0, %1"
+  [(set_attr "arch" "t,32")
+   (set_attr "length" "2,4")
+   (set_attr "type" "rev")]
+)
+

Unfortunately, this is insufficient.  When generating Arm or Thumb2 code (but 
not thumb1) we also have to handle conditional execution: we need to have '%?' 
in the output template at the point where a condition code might be needed.  
That means we need separate output templates for all three alternatives (as we 
need a 16-bit variant for thumb2 that's conditional and a 16-bit for thumb1 
that isn't).  See the output of arm_rev16 for a guide of what is really needed.

I note that the arm_rev16si2_alt1, and arm_rev16si2_alt2 patterns are incorrect 
in this regard as well; that will need fixing.

I also see that arm_rev16si2 currently expands to the alt1 variant above; given 
that the preferred canonical form would now appear to use bswap + rotate, we 
should change that as well.  In fact, we can merge your new pattern with the 
expand entirely and eliminate the need to call gen_arm_rev16si2_alt1.  
Something like:

(define_insn "arm_rev16si2"
  [(set (match_operand:SI 0 "s_register_operand")
(rotate:SI (bswap:SI (match_operand:SI 1 "s_register_operand")) 
(const_int 16))]
  "arm_arch6"
  "@
  rev16...
  ...

R.

Re: [PATCH] arm: Fix parsecpu.awk for aliases [PR113030]

2024-01-22 Thread Richard Earnshaw (lists)

On 21/01/2024 07:29, Andrew Pinski wrote:
> So the problem here is the 2 functions check_cpu and check_arch use
> the wrong variable to check if an alias is valid for that cpu/arch.
> check_cpu uses cpu_optaliases instead of cpu_opt_alias. cpu_optaliases
> is an array of index'ed by the cpuname that contains all of the valid aliases
> for that cpu but cpu_opt_alias is an double index array which is index'ed
> by cpuname and the alias which provides what is the alias for that option.
> Similar thing happens for check_arch and arch_optaliases vs arch_optaliases.
> 
> Tested by running:
> ```
> awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+simd" 
> config/arm/arm-cpus.in
> awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon" 
> config/arm/arm-cpus.in
> awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon-vfpv3" 
> config/arm/arm-cpus.in
> ```
> And they don't return error back.
> 
> gcc/ChangeLog:
> 
>   PR target/113030
>   * config/arm/parsecpu.awk (check_cpu): Use cpu_opt_alias
>   instead of cpu_optaliases.
>   (check_arch): Use arch_opt_alias instead of arch_optaliases.

OK

Thanks,

R.

> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/arm/parsecpu.awk | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
> index ddd4f3b440a..384462bdb5b 100644
> --- a/gcc/config/arm/parsecpu.awk
> +++ b/gcc/config/arm/parsecpu.awk
> @@ -529,7 +529,7 @@ function check_cpu (name) {
>  
>  for (n = 2; n <= exts; n++) {
>   if (!((cpu_name, extensions[n]) in cpu_opt_remove)  \
> - && !((cpu_name, extensions[n]) in cpu_optaliases)) {
> + && !((cpu_name, extensions[n]) in cpu_opt_alias)) {
>   return "error"
>   }
>  }
> @@ -552,7 +552,7 @@ function check_arch (name) {
>  
>  for (n = 2; n <= exts; n++) {
>   if (!((extensions[1], extensions[n]) in arch_opt_remove)\
> - && !((extensions[1], extensions[n]) in arch_optaliases)) {
> + && !((extensions[1], extensions[n]) in arch_opt_alias)) {
>   return "error"
>   }
>  }

Re: [PATCH v3 00/12] [GCC] arm: vld1q vst1 vst1q vst1 intrinsics

2024-01-12 Thread Richard Earnshaw (lists)

On 02/01/2024 09:23, ezra.sito...@arm.com wrote:
> From: Ezra Sitorus 
> 
> Add vld1q, vst1, vst1q and vst1 intrinsics to arm port.
> 
> Ezra Sitorus (12):
>   [GCC] arm: vld1q_types_x2 ACLE intrinsics
>   [GCC] arm: vld1q_types_x3 ACLE intrinsics
>   [GCC] arm: vld1q_types_x4 ACLE intrinsics
>   [GCC] arm: vst1_types_x2 ACLE intrinsics
>   [GCC] arm: vst1_types_x3 ACLE intrinsics
>   [GCC] arm: vst1_types_x4 ACLE intrinsics
>   [GCC] arm: vst1q_types_x2 ACLE intrinsics
>   [GCC] arm: vst1q_types_x3 ACLE intrinsics
>   [GCC] arm: vst1q_types_x4 ACLE intrinsics
>   [GCC] arm: vld1_types_x2 ACLE intrinsics
>   [GCC] arm: vld1_types_x3 ACLE intrinsics
>   [GCC] arm: vld1_types_x4 ACLE intrinsics
> 
>  gcc/config/arm/arm_neon.h | 2032 ++---
>  gcc/config/arm/arm_neon_builtins.def  |   12 +
>  gcc/config/arm/iterators.md   |6 +
>  gcc/config/arm/neon.md|  249 ++
>  gcc/config/arm/unspecs.md |8 +
>  .../gcc.target/arm/simd/vld1_base_xN_1.c  |  176 ++
>  .../gcc.target/arm/simd/vld1_bf16_xN_1.c  |   23 +
>  .../gcc.target/arm/simd/vld1_fp16_xN_1.c  |   23 +
>  .../gcc.target/arm/simd/vld1_p64_xN_1.c   |   23 +
>  .../gcc.target/arm/simd/vld1q_base_xN_1.c |  183 ++
>  .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |   24 +
>  .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |   24 +
>  .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |   24 +
>  .../gcc.target/arm/simd/vst1_base_xN_1.c  |  176 ++
>  .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |   22 +
>  .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |   23 +
>  .../gcc.target/arm/simd/vst1_p64_xN_1.c   |   23 +
>  .../gcc.target/arm/simd/vst1q_base_xN_1.c |  185 ++
>  .../gcc.target/arm/simd/vst1q_bf16_xN_1.c |   24 +
>  .../gcc.target/arm/simd/vst1q_fp16_xN_1.c |   24 +
>  .../gcc.target/arm/simd/vst1q_p64_xN_1.c  |   24 +
>  21 files changed, 3018 insertions(+), 290 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_base_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_bf16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_fp16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_p64_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_base_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_bf16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_fp16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_p64_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_base_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_bf16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_fp16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_p64_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_base_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_bf16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_fp16_xN_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_p64_xN_1.c
> 

Thanks, I've pushed this series.

Reviewing this series did highlight a couple of issues with the existing code 
base (not your patch); I'll follow up on these separately.

R.

Re: [PATCH v2 7/7] aarch64,arm: Move branch-protection data to targets

2024-01-11 Thread Richard Earnshaw (lists)

On 11/01/2024 14:43, Szabolcs Nagy wrote:
> The 12/07/2023 13:13, Richard Earnshaw wrote:
>> On 03/11/2023 15:36, Szabolcs Nagy wrote:
>>> * config/aarch64/aarch64.cc (aarch_handle_no_branch_protection): Copy.
>>> (aarch_handle_standard_branch_protection): Copy.
>>> (aarch_handle_pac_ret_protection): Copy.
>>> (aarch_handle_pac_ret_leaf): Copy.
>>> (aarch_handle_pac_ret_b_key): Copy.
>>> (aarch_handle_bti_protection): Copy.
>>
>> I think all of the above functions that have been moved back from
>> aarch-common should be renamed back to aarch64_..., unless they are directly
>> referenced statically by code in aarch-common.c.
> 
> done.
> 
>>> +const struct aarch_branch_protect_type aarch_branch_protect_types[] = {
>>
>> can this be made static now?  And maybe pass the structure as a parameter if
>> that's not done already.
> 
> done in v4.
> 
>> It would be nice if, when we raise an error, we could print out the list of
>> valid options (and modifiers), much like we do on Arm for -march/-mcpu.
>>
>> eg.
>> $ gcc -mcpu=crotex-a8
>> cc1: error: unrecognised -mcpu target: crotex-a8
>> cc1: note: valid arguments are: arm8 arm810 strongarm strongarm110 fa526
>> [...rest of list]; did you mean ‘cortex-a8’?
> 
> i implemented this with candidates_list_and_hint but it does
> not work very well if the typo is in a subtype, so i think
> this should be done in a separate patch if at all.
> 

I'd build the candidates list from all the types + subtypes, so that the 
suggestion code has a full list to pick from; but fair enough.

R.

Re: [libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-10 Thread Richard Earnshaw





On 08/01/2024 16:07, Roger Sayle wrote:


Bootstrapping GCC on arm-linux-gnueabihf with --with-arch=armv6 currently
has a large number of FAILs in libatomic (regressions since last time I
attempted this).  The failure mode is related to IFUNC handling with the
file tas_8_2_.o containing an unresolved reference to the function
libat_test_and_set_1_i2.

Bearing in mind I've no idea what's going on, the following one line
change, to build tas_1_2_.o when building tas_8_2_.o, resolves the problem
for me and restores the libatomic testsuite to 44 expected passes and 5
unsupported tests [from 22 unexpected failures and 22 unresolved testcases].

If this looks like the correct fix, I'm not confident with rebuilding
Makefile.in with correct version of automake, so I'd very much appreciate
it if someone/the reviewer/mainainer could please check this in for me.
Thanks in advance.


2024-01-08  Roger Sayle  

libatomic/ChangeLog
 * Makefile.am: Build tas_1_2_.o on ARCH_ARM_LINUX
 * Makefile.in: Regenerate.


Roger
--



Hi Roger,

I don't really understand all this make foo :( so I'm not sure if this 
is the right fix either.  If this is, as you say, a regression, have you 
been able to track down when it first started to occur?  That might also 
help me to understand what changed to cause this.


Perhaps we should have a PR for this, to make tracking the fixes easier.

R.

Re: [PATCH][GCC][Arm] Define __ARM_FEATURE_BF16 when +bf16 feature is enabled

2024-01-10 Thread Richard Earnshaw





On 08/01/2024 17:21, Matthieu Longo wrote:

Hi,

Arm GCC backend does not define __ARM_FEATURE_BF16 when +bf16 is 
specified (via -march option, or target pragma) whereas it is supposed 
to be tested before including arm_bf16.h (as specified in ACLE document: 
https://arm-software.github.io/acle/main/acle.html#arm_bf16h).


gcc/ChangeLog:

     * config/arm/arm-c.cc (arm_cpu_builtins): define 
__ARM_FEATURE_BF16

     * config/arm/arm.h: define TARGET_BF16

Ok for master ?

Matthieu
index 
2e181bf7f36bab1209d5358e65d9513541683632..21ca22ac71119eda4ff01709aa95002ca13b1813 
100644

--- a/gcc/config/arm/arm-c.cc
+++ b/gcc/config/arm/arm-c.cc
@@ -425,12 +425,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   arm_arch_cde_coproc);

   def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
+
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16", TARGET_BF16);
+  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
+ TARGET_BF16_FP);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
  TARGET_BF16_FP);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC",
  TARGET_BF16_SIMD);
-  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
- TARGET_BF16_FP || TARGET_BF16_SIMD);

Why is the definition of __ARM_BF16_FORMAT_ALTERNATIVE changed?  And why 
is there explanation of that change?  It doesn't seem directly related 
to $subject.


R.

 }

 void

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2597 matches

Mail list logo