[PATCH] Fix ICE in rtl check due to CONST_WIDE_INT in CONST_VECTOR_DUPLICATE_P

2024-06-10 Thread liuhongt
In theory, const_wide_int can also be handle with extra check for each 
components of the HOST_WIDE_INT array, and the check is need for both
shift and bit_and operands.
I assume the optimization opportnunity is rare, so the patch just add
extra check to make sure GET_MODE_INNER (mode) can fix into a
HOST_WIDE_INT.

gcc/ChangeLog:

PR target/115384
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Only do the simplification of (AND (ASHIFTRT A imm) mask)
to (LSHIFTRT A imm) when inner mode fits HOST_WIDE_INT.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115384.c: New test.
---
 gcc/simplify-rtx.cc  |  4 +++-
 gcc/testsuite/gcc.target/i386/pr115384.c | 12 
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr115384.c

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 9bc3ef9ad9f..4992bee7506 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -4074,7 +4074,9 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
  || (GET_CODE (XEXP (op0, 1)) == CONST_VECTOR
  && CONST_VECTOR_DUPLICATE_P (XEXP (op0, 1
  && GET_CODE (op1) == CONST_VECTOR
- && CONST_VECTOR_DUPLICATE_P (op1))
+ && CONST_VECTOR_DUPLICATE_P (op1)
+ && (GET_MODE_PRECISION (GET_MODE_INNER (mode))
+ <= HOST_BITS_PER_WIDE_INT))
{
  unsigned HOST_WIDE_INT shift_count
= (CONST_INT_P (XEXP (op0, 1))
diff --git a/gcc/testsuite/gcc.target/i386/pr115384.c 
b/gcc/testsuite/gcc.target/i386/pr115384.c
new file mode 100644
index 000..3ba7a0b8115
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115384.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O" } */
+
+typedef __attribute__((__vector_size__(sizeof(__int128 __int128 W;
+
+W w;
+
+void
+foo()
+{
+  w = w >> 4 & 18446744073709551600llu;
+}
-- 
2.31.1



Re: [PATCH] AVX-512: Pacify -Wshift-overflow=2. [PR115409]

2024-06-10 Thread Hongtao Liu
On Mon, Jun 10, 2024 at 2:37 PM Collin Funk  wrote:
>
> A shift of 31 on a signed int is undefined behavior.  Since unsigned
> int is 32-bits wide this change fixes it and silences the warning.
Ok.
>
> gcc/ChangeLog:
>
> PR target/115409
> * config/i386/avx512fp16intrin.h (_mm512_conj_pch): Make the
> constant unsigned before shifting.
> * config/i386/avx512fp16vlintrin.h (_mm256_conj_pch): Likewise.
> (_mm_conj_pch): Likewise.
>
> Signed-off-by: Collin Funk 
> ---
>  gcc/config/i386/avx512fp16intrin.h   | 2 +-
>  gcc/config/i386/avx512fp16vlintrin.h | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h 
> b/gcc/config/i386/avx512fp16intrin.h
> index f86050b2087..1869a920dd3 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -3355,7 +3355,7 @@ extern __inline __m512h
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_conj_pch (__m512h __A)
>  {
> -  return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 
> (1<<31));
> +  return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 
> (1U<<31));
>  }
>
>  extern __inline __m512h
> diff --git a/gcc/config/i386/avx512fp16vlintrin.h 
> b/gcc/config/i386/avx512fp16vlintrin.h
> index a1e1cb567ff..405a06bbb9e 100644
> --- a/gcc/config/i386/avx512fp16vlintrin.h
> +++ b/gcc/config/i386/avx512fp16vlintrin.h
> @@ -181,7 +181,7 @@ extern __inline __m256h
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_conj_pch (__m256h __A)
>  {
> -  return (__m256h) _mm256_xor_epi32 ((__m256i) __A, _mm256_avx512_set1_epi32 
> (1<<31));
> +  return (__m256h) _mm256_xor_epi32 ((__m256i) __A, _mm256_avx512_set1_epi32 
> (1U<<31));
>  }
>
>  extern __inline __m256h
> @@ -209,7 +209,7 @@ extern __inline __m128h
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_conj_pch (__m128h __A)
>  {
> -  return (__m128h) _mm_xor_epi32 ((__m128i) __A, _mm_avx512_set1_epi32 
> (1<<31));
> +  return (__m128h) _mm_xor_epi32 ((__m128i) __A, _mm_avx512_set1_epi32 
> (1U<<31));
>  }
>
>  extern __inline __m128h
> --
> 2.45.2
>


-- 
BR,
Hongtao


[committed] [RISC-V] Drop dead test

2024-06-10 Thread Jeff Law
This test is no longer useful.  It doesn't test what it was originally 
intended to test and there's really no way to recover it sanely.


We agreed in the patchwork meeting last week that if we want to test Zfa 
that we'll write a new test for that.  Similarly if we want to do deeper 
testing of the non-Zfa sequences in this space that we'd write new tests 
for those as well (execution tests in particular).


So dropping this test.

Jeffcommit 95161c6abfbd7ba9fab0b538ccc885f5980efbee
Author: Jeff Law 
Date:   Mon Jun 10 22:39:40 2024 -0600

[committed] [RISC-V] Drop dead round_32 test

This test is no longer useful.  It doesn't test what it was originally 
intended
to test and there's really no way to recover it sanely.

We agreed in the patchwork meeting last week that if we want to test Zfa 
that
we'll write a new test for that.  Similarly if we want to do deeper testing 
of
the non-Zfa sequences in this space that we'd write new tests for those as 
well
(execution tests in particular).

So dropping this test.

gcc/testsuite
* gcc.target/riscv/round_32.c: Delete.

diff --git a/gcc/testsuite/gcc.target/riscv/round_32.c 
b/gcc/testsuite/gcc.target/riscv/round_32.c
deleted file mode 100644
index 88ff77aff2e..000
--- a/gcc/testsuite/gcc.target/riscv/round_32.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* { dg-do compile { target { riscv32*-*-* } } } */
-/* { dg-require-effective-target glibc } */
-/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-math-errno 
-funsafe-math-optimizations -fno-inline" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
-
-#include "round.c"
-
-/* { dg-final { scan-assembler-times {\mfcvt.w.s} 15 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.s.w} 5 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.d.w} 65 } } */
-/* { dg-final { scan-assembler-times {\mfcvt.w.d} 15 } } */
-/* { dg-final { scan-assembler-times {,rup} 6 } } */
-/* { dg-final { scan-assembler-times {,rmm} 6 } } */
-/* { dg-final { scan-assembler-times {,rdn} 6 } } */
-/* { dg-final { scan-assembler-times {,rtz} 6 } } */
-/* { dg-final { scan-assembler-not {\mfcvt.l.d} } } */
-/* { dg-final { scan-assembler-not {\mfcvt.d.l} } } */
-/* { dg-final { scan-assembler-not "\\sceil\\s" } } */
-/* { dg-final { scan-assembler-not "\\sfloor\\s" } } */
-/* { dg-final { scan-assembler-not "\\sround\\s" } } */
-/* { dg-final { scan-assembler-not "\\snearbyint\\s" } } */
-/* { dg-final { scan-assembler-not "\\srint\\s" } } */
-/* { dg-final { scan-assembler-not "\\stail\\s" } } */


Re: [PATCH v3 0/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-10 Thread Jeff Law




On 6/10/24 3:46 PM, Patrick O'Neill wrote:

The A extension has been split into two parts: Zaamo and Zalrsc.
This patch adds basic support by making the A extension imply Zaamo and
Zalrsc.

Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
Ratification: https://jira.riscv.org/browse/RVS-1995

v2:
Rebased and updated some testcases that rely on the ISA string.

v3:
Regex-ify temp registers in added testcases.
Remove unintentional whitespace changes.
Add riscv_{a|ztso|zaamo|zalrsc} docs to sourcebuild.texi (and move core-v bi
extension doc into appropriate section).

Edwin Lu (1):
   RISC-V: Add basic Zaamo and Zalrsc support

Patrick O'Neill (2):
   RISC-V: Add Zalrsc and Zaamo testsuite support
   RISC-V: Add Zalrsc amo-op patterns

  gcc/common/config/riscv/riscv-common.cc   |  11 +-
  gcc/config/riscv/arch-canonicalize|   1 +
  gcc/config/riscv/riscv.opt|   6 +-
  gcc/config/riscv/sync.md  | 152 +++---
  gcc/doc/sourcebuild.texi  |  16 +-
  .../riscv/amo-table-a-6-amo-add-1.c   |   2 +-
  .../riscv/amo-table-a-6-amo-add-2.c   |   2 +-
  .../riscv/amo-table-a-6-amo-add-3.c   |   2 +-
  .../riscv/amo-table-a-6-amo-add-4.c   |   2 +-
  .../riscv/amo-table-a-6-amo-add-5.c   |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-1.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-2.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-3.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-4.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-5.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-6.c  |   2 +-
  .../riscv/amo-table-a-6-compare-exchange-7.c  |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-1.c   |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-2.c   |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-3.c   |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-4.c   |   2 +-
  .../riscv/amo-table-a-6-subword-amo-add-5.c   |   2 +-
  .../riscv/amo-table-ztso-amo-add-1.c  |   2 +-
  .../riscv/amo-table-ztso-amo-add-2.c  |   2 +-
  .../riscv/amo-table-ztso-amo-add-3.c  |   2 +-
  .../riscv/amo-table-ztso-amo-add-4.c  |   2 +-
  .../riscv/amo-table-ztso-amo-add-5.c  |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-1.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-2.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-3.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-4.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-5.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-6.c |   2 +-
  .../riscv/amo-table-ztso-compare-exchange-7.c |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-1.c  |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-2.c  |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-3.c  |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-4.c  |   2 +-
  .../riscv/amo-table-ztso-subword-amo-add-5.c  |   2 +-
  .../riscv/amo-zaamo-preferred-over-zalrsc.c   |  17 ++
  .../gcc.target/riscv/amo-zalrsc-amo-add-1.c   |  19 +++
  .../gcc.target/riscv/amo-zalrsc-amo-add-2.c   |  19 +++
  .../gcc.target/riscv/amo-zalrsc-amo-add-3.c   |  19 +++
  .../gcc.target/riscv/amo-zalrsc-amo-add-4.c   |  19 +++
  .../gcc.target/riscv/amo-zalrsc-amo-add-5.c   |  19 +++
  gcc/testsuite/gcc.target/riscv/attribute-15.c |   2 +-
  gcc/testsuite/gcc.target/riscv/attribute-16.c |   2 +-
  gcc/testsuite/gcc.target/riscv/attribute-17.c |   2 +-
  gcc/testsuite/gcc.target/riscv/attribute-18.c |   2 +-
  gcc/testsuite/gcc.target/riscv/pr110696.c |   2 +-
  .../gcc.target/riscv/rvv/base/pr114352-1.c|   4 +-
  .../gcc.target/riscv/rvv/base/pr114352-3.c|   8 +-
  gcc/testsuite/lib/target-supports.exp |  48 +-
  53 files changed, 366 insertions(+), 70 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-3.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-4.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-5.c

This series is OK for the trunk.

jeff



Re: [PATCH v3 0/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-10 Thread Jeff Law




On 6/10/24 6:15 PM, Andrea Parri wrote:

On Mon, Jun 10, 2024 at 02:46:54PM -0700, Patrick O'Neill wrote:

The A extension has been split into two parts: Zaamo and Zalrsc.
This patch adds basic support by making the A extension imply Zaamo and
Zalrsc.

Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
Ratification: https://jira.riscv.org/browse/RVS-1995

v2:
Rebased and updated some testcases that rely on the ISA string.

v3:
Regex-ify temp registers in added testcases.
Remove unintentional whitespace changes.
Add riscv_{a|ztso|zaamo|zalrsc} docs to sourcebuild.texi (and move core-v bi
extension doc into appropriate section).

Edwin Lu (1):
   RISC-V: Add basic Zaamo and Zalrsc support

Patrick O'Neill (2):
   RISC-V: Add Zalrsc and Zaamo testsuite support
   RISC-V: Add Zalrsc amo-op patterns


While providing a proper/detailed review of the series goes above my
"GCC internals" skills, I've applied the series and checked that the
generated code for some atomic operations meet expectations (expecta-
tions which, w/ "only Zaamo", are arguably quite low as mentioned in
v2 and elsewhere):
Thanks for taking the time.  We realize you're not a GCC expert, but 
having an extra pair of eyes on the atomics is always appreciated.




Tested-by: Andrea Parri 

   Andrea


P.S. Unrelated to the changes at stake, but perhaps worth mentioning:
w/ and w/o these changes, the following

[ ... ]
I'll leave this to Patrick to decide if he wants to update.  I'm always 
hesitant to weaken this stuff as I'm sure there's somebody, somewhere 
that assumes the stronger primitives.


Jeff



Re: [PATCH] c-family: Introduce the -Winvalid-noreturn flag from clang with extra tuneability

2024-06-10 Thread Jason Merrill

On 6/10/24 03:13, Julian Waters wrote:

Hi Jason,

Thanks for the reply. I'm a little bit overwhelmed with university at
the moment, would it be ok if I delay implementing this a little bit?


Sure, we're still early in GCC 15 development, no time pressure.


On Tue, Jun 4, 2024 at 1:04 AM Jason Merrill  wrote:


On 6/1/24 11:31, Julian Waters wrote:

Hi Jason,

Thanks for the reply! I'll address your comments soon. I have a
question, if there is an option defined in c.opt as an Enum, like
fstrong-eval-order, and the -no variant of the option is passed, would
the Var somehow reflect the negated option? Eg

Winvalid-noreturn=
C ObjC C++ ObjC++ Var(warn_invalid_noreturn) Joined
Enum(invalid_noreturn) Warning

Enum
Name(invalid_noreturn) Type(int)

EnumValue
Enum(invalid_noreturn) String(explicit) Value(0)


-fstrong-eval-order has

fstrong-eval-order
C++ ObjC++ Common Alias(fstrong-eval-order=, all, none)

to represent that plain -fstrong-eval-order is equivalent to
-fstrong-eval-order=all, and -fno-strong-eval-order is equivalent to =none.


Would warn_invalid_noreturn then != 0 if
-Wno-invalid-noreturn=explicit is passed? Or is there a way to make a
warning call depend on 2 different OPT_ entries?


Typically = options will specify RejectNegative so the driver will
reject e.g. -Wno-invalid-noreturn=explicit.

Jason


best regards,
Julian

On Sat, Jun 1, 2024 at 4:57 AM Jason Merrill  wrote:


On 5/29/24 09:58, Julian Waters wrote:

Currently, gcc warns about noreturn marked functions that return both 
explicitly and implicitly, with no way to turn this warning off. clang does 
have an option for these classes of warnings, -Winvalid-noreturn. However, we 
can do better. Instead of just having 1 option that switches the warnings for 
both on and off, we can define an extra layer of granularity, and have a 
separate options for implicit returns and explicit returns, as in 
-Winvalid-return=explicit and -Winvalid-noreturn=implicit. This patch adds both 
to gcc, for compatibility with clang.


Thanks!


Do note that I am relatively new to gcc's codebase, and as such couldn't figure 
out how to cleanly define a general -Winvalid-noreturn warning that switch both 
on and off, for better compatibility with clang. If someone should point out 
how to do so, I'll happily rewrite my patch.


See -fstrong-eval-order for an example of an option that can be used
with or without =arg.


I also do not have write access to gcc, and will need help pushing this patch 
once the green light is given


Good to know, I can take care of that.


best regards,
Julian

gcc/c-family/ChangeLog:

* c.opt: Introduce -Winvalid-noreturn=explicit and 
-Winvalid-noreturn=implicit

gcc/ChangeLog:

* tree-cfg.cc (pass_warn_function_return::execute): Use it

gcc/c/ChangeLog:

* c-typeck.cc (c_finish_return): Use it
* gimple-parser.cc (c_finish_gimple_return): Use it

gcc/config/mingw/ChangeLog:

* mingw32.h (EXTRA_OS_CPP_BUILTINS): Fix semicolons

gcc/cp/ChangeLog:

* coroutines.cc (finish_co_return_stmt): Use it
* typeck.cc (check_return_expr): Use it

gcc/doc/ChangeLog:

* invoke.texi: Document new options

   From 4daf884f8bbc1e318ba93121a6fdf4139da80b64 Mon Sep 17 00:00:00 2001
From: TheShermanTanker 
Date: Wed, 29 May 2024 21:32:08 +0800
Subject: [PATCH] Introduce the -Winvalid-noreturn flag from clang with extra
tuneability


The rationale and ChangeLog entries should be part of the commit message
(and so the git format-patch output).



Signed-off-by: TheShermanTanker 


A DCO sign-off can't use a pseudonym, sorry; please either sign off
using your real name or file a copyright assignment for the pseudonym
with the FSF.

See https://gcc.gnu.org/contribute.html#legal for more detail.


---
gcc/c-family/c.opt |  8 
gcc/c/c-typeck.cc  |  2 +-
gcc/c/gimple-parser.cc |  2 +-
gcc/config/mingw/mingw32.h |  6 +++---
gcc/cp/coroutines.cc   |  2 +-
gcc/cp/typeck.cc   |  2 +-
gcc/doc/invoke.texi| 13 +
gcc/tree-cfg.cc|  2 +-
8 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index fb34c3b7031..32a2859fdcc 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -886,6 +886,14 @@ Winvalid-constexpr
C++ ObjC++ Var(warn_invalid_constexpr) Init(-1) Warning
Warn when a function never produces a constant expression.

+Winvalid-noreturn=explicit
+C ObjC C++ ObjC++ Warning
+Warn when a function marked noreturn returns explicitly.
+
+Winvalid-noreturn=implicit
+C ObjC C++ ObjC++ Warning
+Warn when a function marked noreturn returns implicitly.
+
Winvalid-offsetof
C++ ObjC++ Var(warn_invalid_offsetof) Init(1) Warning
Warn about invalid uses of the \"offsetof\" macro.
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index ad4c7add562..1941fbc44cb 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ 

Re: [PATCH] c++: remove Concepts TS code

2024-06-10 Thread Jason Merrill

On 6/10/24 11:13, Marek Polacek wrote:

On Mon, Jun 10, 2024 at 10:22:11AM -0400, Patrick Palka wrote:

On Fri, 7 Jun 2024, Marek Polacek wrote:

@@ -3940,9 +3936,6 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 parameter pack (14.6.3), or the type-specifier-seq of a type-id that
 is a pack expansion, the invented template parameter is a template
 parameter pack.  */


This comment should be removed too I think.


Removed in my local tree.
  

-  if (flag_concepts_ts && ppd->type_pack_expansion_p && is_auto (t)


(BTW this seems to be the only actual user of type_pack_expansion_p so we
can in turn remove that field too.)


Oh neat.  I can do that as a follow-up, unless y'all think it should be
part of this patch.  Thanks,


It probably makes sense for it to be part of this patch.


One exception I'm aware of is template-introductions, as in:

  template
  concept C = true;

  C{T} void foo ();

where we warn by default, but accept the code, and my patch does not
remove the support just yet.


I think let's go ahead and remove it as well.


+// ??? This used to be a link test with Concepts TS, but now we
+// get:
+// undefined reference to `_Z2f5ITk1C1XEvT_Q1DIS1_E'
+// undefined reference to `_Z2f6ITk1C1XEvT_Q1DIS1_E'
+// so it's a compile test only.


That means the test is failing, and we shouldn't in general change tests 
to stop testing the thing that fails; better to xfail.


In this case, however, C++20 doesn't establish the equivalence that it's 
testing; that's another thing that wasn't adopted from the Concepts TS.


Note that this area is in question currently; see CWG2802.  But I think 
the equivalence is unlikely to return.


So let's move main() to the bottom of the test and test for the 
ambiguity errors that we get because they aren't equivalent.



--- a/gcc/testsuite/g++.dg/concepts/pr67595.C
+++ /dev/null
@@ -1,14 +0,0 @@
-// { dg-do compile { target c++17_only } }
-// { dg-options "-fconcepts-ts" }
-
-template  concept bool allocatable = requires{{new X}->X *; };
-template  concept bool semiregular = allocatable;
-template  concept bool readable = requires{requires semiregular;};
-template  int weak_input_iterator = requires{{0}->readable;};
-template  bool input_iterator{weak_input_iterator}; // { dg-prune-output 
"narrowing conversion" }
-template  bool forward_iterator{input_iterator};
-template  bool bidirectional_iterator{forward_iterator};
-template 
-concept bool random_access_iterator{bidirectional_iterator}; // { dg-error 
"constant" }
-void fn1(random_access_iterator);
-int main() { fn1(0); }  // { dg-error "" }


Why remove this test?  The main issue I see is that {new X}->X* needs to 
change to {new X}->convertible_to (or same_as)



+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires5.C
@@ -1,7 +1,5 @@
 // { dg-do compile { target c++20 } }
-// { dg-additional-options "-fconcepts-ts -fconcepts-diagnostics-depth=2" }
-
-// Test conversion requirements (not in C++20)


This one could get the same adjustment instead of adding dg-errors.  Or 
perhaps the error could suggest that adjustment, and this testcase could 
check that?


Jason



RE: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Li, Pan2
Got it, thanks. Let me prepare the patch after test.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, June 11, 2024 9:42 AM
To: Li, Pan2 ; Sam James 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI 
match



On 6/10/24 7:28 PM, Li, Pan2 wrote:
> Hi Sam,
> 
>> This testcases ICEs for me on x86-64 too (without your patch) with just -O2.
>> Can you move it out of the riscv suite? (I suspect the other fails on x86-64 
>> too).
> 
> Sure thing, but do you have any suggestion about where should I put these 2 
> cases?
> There are sorts of sub-directories under gcc/testsuite, I am not very 
> familiar that where
> is the best reasonable location.
gcc.dg/torture would be the most natural location I think.

jeff



Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Jeff Law




On 6/10/24 7:28 PM, Li, Pan2 wrote:

Hi Sam,


This testcases ICEs for me on x86-64 too (without your patch) with just -O2.
Can you move it out of the riscv suite? (I suspect the other fails on x86-64 
too).


Sure thing, but do you have any suggestion about where should I put these 2 
cases?
There are sorts of sub-directories under gcc/testsuite, I am not very familiar 
that where
is the best reasonable location.

gcc.dg/torture would be the most natural location I think.

jeff



RE: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Li, Pan2
Hi Sam,

> This testcases ICEs for me on x86-64 too (without your patch) with just -O2.
> Can you move it out of the riscv suite? (I suspect the other fails on x86-64 
> too).

Sure thing, but do you have any suggestion about where should I put these 2 
cases? 
There are sorts of sub-directories under gcc/testsuite, I am not very familiar 
that where
is the best reasonable location.

Pan

-Original Message-
From: Sam James  
Sent: Monday, June 10, 2024 11:33 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.guent...@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI 
match

pan2...@intel.com writes:

> From: Pan Li 
>
> When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
> to replace the PHI node.  Unfortunately,  I made a mistake that insert
> the gcall to before the last stmt of the bb.  See below gimple,  the PHI
> is located at no.1 but we insert the gcall (aka no.9) to the end of
> the bb.  Then the use of _9 in no.2 will have no def and will trigger
> ICE when verify_ssa.
>
>   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>   2. prephitmp_36 = (char *) _9;
>   3. buf.write_base = string_13(D);
>   4. buf.write_ptr = string_13(D);
>   5. buf.write_end = prephitmp_36;
>   6. buf.written = 0;
>   7. buf.mode = 3;
>   8. _7 = buf.write_end;
>   9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb 
> by mistake
>
> This patch would like to insert the gcall to before the start of the bb
> stmt.  To ensure the possible use of PHI_result will have a def exists.
> After this patch the above gimple will be:
>
>   0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
> by mistake
>   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>   2. prephitmp_36 = (char *) _9;
>   3. buf.write_base = string_13(D);
>   4. buf.write_ptr = string_13(D);
>   5. buf.write_end = prephitmp_36;
>   6. buf.written = 0;
>   7. buf.mode = 3;
>   8. _7 = buf.write_end;
>
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test with newlib.
> * The rv64gcv build with glibc.
> * The x86 regression test with newlib.
> * The x86 bootstrap test with newlib.
>
>   PR target/115387
>
> gcc/ChangeLog:
>
>   * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Take
>   the gsi of start_bb instead of last_bb.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/pr115387-1.c: New test.
>   * gcc.target/riscv/pr115387-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/gcc.target/riscv/pr115387-1.c | 35 +
>  gcc/testsuite/gcc.target/riscv/pr115387-2.c | 18 +++
>  gcc/tree-ssa-math-opts.cc   |  2 +-
>  3 files changed, 54 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-2.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> new file mode 100644
> index 000..a1c926977c4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> @@ -0,0 +1,35 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#define PRINTF_CHK 0x34
> +
> +typedef unsigned long uintptr_t;
> +
> +struct __printf_buffer {
> +  char *write_ptr;
> +  int status;
> +};
> +
> +extern void __printf_buffer_init_end (struct __printf_buffer *, char *, char 
> *);
> +
> +void
> +test (char *string, unsigned long maxlen, unsigned mode_flags)
> +{
> +  struct __printf_buffer buf;
> +
> +  if ((mode_flags & PRINTF_CHK) != 0)
> +{
> +  string[0] = '\0';
> +  uintptr_t end;
> +
> +  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
> + end = -1;
> +
> +  __printf_buffer_init_end (, string, (char *) end);
> +}
> +  else
> +__printf_buffer_init_end (, string, (char *) ~(uintptr_t) 0);
> +
> +  *buf.write_ptr = '\0';
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-2.c 
> b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
> new file mode 100644
> index 000..7183bf18dfd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
> @@ -0,0 +1,18 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include 
> +#include 
> +
> +char *
> +test (char *string, size_t maxlen)
> +{
> +  string[0] = '\0';
> +  uintptr_t end;
> +
> +  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
> +end = -1;
> +
> +  return (char *) end;
> +}

This testcases ICEs for me on x86-64 too (without your patch) with just -O2.

Can you move it out of the riscv suite? (I suspect the other fails on x86-64 
too).

> diff --git a/gcc/tree-ssa-math-opts.cc 

RE: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Li, Pan2
Thank a lot, Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, June 11, 2024 4:15 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI 
match



On 6/10/24 8:49 AM, pan2...@intel.com wrote:
> When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
> to replace the PHI node.  Unfortunately,  I made a mistake that insert
> the gcall to before the last stmt of the bb.  See below gimple,  the PHI
> is located at no.1 but we insert the gcall (aka no.9) to the end of
> the bb.  Then the use of _9 in no.2 will have no def and will trigger
> ICE when verify_ssa.
> 
>1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>2. prephitmp_36 = (char *) _9;
>3. buf.write_base = string_13(D);
>4. buf.write_ptr = string_13(D);
>5. buf.write_end = prephitmp_36;
>6. buf.written = 0;
>7. buf.mode = 3;
>8. _7 = buf.write_end;
>9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb 
> by mistake
> 
> This patch would like to insert the gcall to before the start of the bb
> stmt.  To ensure the possible use of PHI_result will have a def exists.
> After this patch the above gimple will be:
> 
>0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
> by mistake
>1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>2. prephitmp_36 = (char *) _9;
>3. buf.write_base = string_13(D);
>4. buf.write_ptr = string_13(D);
>5. buf.write_end = prephitmp_36;
>6. buf.written = 0;
>7. buf.mode = 3;
>8. _7 = buf.write_end;
> 
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test with newlib.
> * The rv64gcv build with glibc.
> * The x86 regression test with newlib.
> * The x86 bootstrap test with newlib.
> 
>   PR target/115387
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Take
>   the gsi of start_bb instead of last_bb.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/pr115387-1.c: New test.
>   * gcc.target/riscv/pr115387-2.c: New test.
I did a fresh x86_64 bootstrap and regression test and pushed this.

jeff



Re: [x86 PATCH] PR target/115397: AVX512 ternlog vs. -m32 -fPIC constant pool.

2024-06-10 Thread Hongtao Liu
On Mon, Jun 10, 2024 at 3:20 PM Roger Sayle  wrote:
>
>
> This patch fixes PR target/115397, a recent regression caused by my
> ternlog patch that results in an ICE (building numpy) with -m32 -fPIC.
> The problem is that ix86_broadcast_from_constant, which calls
> get_pool_constant, doesn't handle the UNSPEC_GOTOFF that's created by
> calling validize_mem when using -fPIC on i686.  The logic here is a bit
> convoluted (and my future patches will clean some of this up), but the
> simplest fix is to call ix86_broadcast_from_constant between the calls
> to force_const_mem and the call to validize_mem.
>
> Perhaps a better solution might be to call targetm.delegitimize_address
> from the middle-end's get_pool_constant, but ultimately the best approach
> would be to not place things in the constant pool if we don't need to.
> My plans to move (broadcast) constant handling from expand to split1
> should simplify this.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
Ok.
>
>
> 2024-06-10  Roger Sayle  
>
> gcc/ChangeLog
> PR target/115397
> * config/i386/i386-expand.cc (ix86_expand_ternlog): Move call to
> ix86_broadcast_from_constant before call to validize_mem, but after
> call to force_const_mem.
>
> gcc/testsuite/ChangeLog
> PR target/115397
> * gcc.target/i386/pr115397.c: New test case.
>
>
> Thanks in advance (and sorry for any inconvenience),
> Roger
>


-- 
BR,
Hongtao


Re: [PATCH v3 0/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-10 Thread Andrea Parri
On Mon, Jun 10, 2024 at 02:46:54PM -0700, Patrick O'Neill wrote:
> The A extension has been split into two parts: Zaamo and Zalrsc.
> This patch adds basic support by making the A extension imply Zaamo and
> Zalrsc.
> 
> Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
> Ratification: https://jira.riscv.org/browse/RVS-1995
> 
> v2:
> Rebased and updated some testcases that rely on the ISA string.
> 
> v3:
> Regex-ify temp registers in added testcases.
> Remove unintentional whitespace changes.
> Add riscv_{a|ztso|zaamo|zalrsc} docs to sourcebuild.texi (and move core-v bi
> extension doc into appropriate section).
> 
> Edwin Lu (1):
>   RISC-V: Add basic Zaamo and Zalrsc support
> 
> Patrick O'Neill (2):
>   RISC-V: Add Zalrsc and Zaamo testsuite support
>   RISC-V: Add Zalrsc amo-op patterns

While providing a proper/detailed review of the series goes above my
"GCC internals" skills, I've applied the series and checked that the
generated code for some atomic operations meet expectations (expecta-
tions which, w/ "only Zaamo", are arguably quite low as mentioned in
v2 and elsewhere):

Tested-by: Andrea Parri 

  Andrea


P.S. Unrelated to the changes at stake, but perhaps worth mentioning:
w/ and w/o these changes, the following

#include 

void foo(atomic_flag *x)
{
atomic_flag_clear_explicit(x, memory_order_seq_cst);
}

... gets mapped to

foo:
fence   rw,rw
sb  zero,0(a0)
fence   rw,rw
ret

(w/o and w/ Ztso) while the current psABI spec is suggesting slightly
"less synchronization", respectively,

(w/o Ztso)
foo:
fence   rw,w<- "release" fence
sb  zero,0(a0)
fence   rw,rw
ret

(w/ Ztso)
foo:
sb  zero,0(a0)
fence   rw,rw
ret


[pushed] modula2: Fix typos, grammar, and a link

2024-06-10 Thread Gerald Pfeifer
Pushed.

Gerald


gcc:
* doc/gm2.texi (Documentation): Fix typos, grammar, and a link.
---
 gcc/doc/gm2.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/gm2.texi b/gcc/doc/gm2.texi
index 8661fcb8728..c532339fbb8 100644
--- a/gcc/doc/gm2.texi
+++ b/gcc/doc/gm2.texi
@@ -2935,9 +2935,9 @@ you wish to see something different please email
 @node Documentation, Regression tests, Release map, Using
 @section Documentation
 
-The GNU Modula-2 documentation is available on line
-@url{https://gcc.gnu.org/onlinedocs}
-or in the pdf, info, html file format.
+The GNU Modula-2 documentation is available online at
+@url{https://gcc.gnu.org/onlinedocs/}
+in the PDF, info, and HTML file formats.
 
 @node Regression tests, Limitations, Documentation, Using
 @section Regression tests for gm2 in the repository
-- 
2.45.2


[pushed] wwwdocs: news: Use https for our Wiki

2024-06-10 Thread Gerald Pfeifer
Pushed.

Gerald
---
 htdocs/news.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/news.html b/htdocs/news.html
index 5f652d90..de92bdf6 100644
--- a/htdocs/news.html
+++ b/htdocs/news.html
@@ -384,7 +384,7 @@
 
 The Vtable Verification Feature is now in GCC
 [2013-09-08] wwwdocs:
-The http://gcc.gnu.org/wiki/vtv;>vtable verification
+The https://gcc.gnu.org/wiki/vtv;>vtable verification
 branch has been merged into trunk.  This work was contributed by
 Caroline Tice, Luis Lozano and Geoff Pike of Google, and
 Benjamin Kosnik of Red Hat.
-- 
2.45.2


[PATCH v3 1/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-10 Thread Patrick O'Neill
From: Edwin Lu 

There is a proposal to split the A extension into two parts: Zaamo and Zalrsc.
This patch adds basic support by making the A extension imply Zaamo and
Zalrsc.

Proposal: https://github.com/riscv/riscv-zaamo-zalrsc/tags

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Zaamo and Zalrsc.
* config/riscv/arch-canonicalize: Make A imply Zaamo and Zalrsc.
* config/riscv/riscv.opt: Add Zaamo and Zalrsc
* config/riscv/sync.md: Convert TARGET_ATOMIC to TARGET_ZAAMO and
TARGET_ZALRSC.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-15.c: Adjust expected arch string.
* gcc.target/riscv/attribute-16.c: Ditto.
* gcc.target/riscv/attribute-17.c: Ditto.
* gcc.target/riscv/attribute-18.c: Ditto.
* gcc.target/riscv/pr110696.c: Ditto.

Signed-off-by: Edwin Lu 
Co-authored-by: Patrick O'Neill 
---
 gcc/common/config/riscv/riscv-common.cc   | 11 +--
 gcc/config/riscv/arch-canonicalize|  1 +
 gcc/config/riscv/riscv.opt|  6 +++-
 gcc/config/riscv/sync.md  | 30 +--
 gcc/testsuite/gcc.target/riscv/attribute-15.c |  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-16.c |  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-17.c |  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-18.c |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110696.c |  2 +-
 .../gcc.target/riscv/rvv/base/pr114352-1.c|  4 +--
 .../gcc.target/riscv/rvv/base/pr114352-3.c|  8 ++---
 11 files changed, 41 insertions(+), 29 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 88204393fde..78dfd6b1470 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -79,6 +79,9 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"f", "zicsr"},
   {"d", "zicsr"},

+  {"a", "zaamo"},
+  {"a", "zalrsc"},
+
   {"zdinx", "zfinx"},
   {"zfinx", "zicsr"},
   {"zdinx", "zicsr"},
@@ -255,6 +258,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"za64rs",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"za128rs", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zaamo", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zalrsc", ISA_SPEC_CLASS_NONE, 1, 0},

   {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1616,9 +1621,11 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zifencei", _options::x_riscv_zi_subext, MASK_ZIFENCEI},
   {"zicond",   _options::x_riscv_zi_subext, MASK_ZICOND},

-  {"za64rs", _options::x_riscv_za_subext, MASK_ZA64RS},
+  {"za64rs",  _options::x_riscv_za_subext, MASK_ZA64RS},
   {"za128rs", _options::x_riscv_za_subext, MASK_ZA128RS},
-  {"zawrs", _options::x_riscv_za_subext, MASK_ZAWRS},
+  {"zawrs",   _options::x_riscv_za_subext, MASK_ZAWRS},
+  {"zaamo",   _options::x_riscv_za_subext, MASK_ZAAMO},
+  {"zalrsc",  _options::x_riscv_za_subext, MASK_ZALRSC},

   {"zba",_options::x_riscv_zb_subext, MASK_ZBA},
   {"zbb",_options::x_riscv_zb_subext, MASK_ZBB},
diff --git a/gcc/config/riscv/arch-canonicalize 
b/gcc/config/riscv/arch-canonicalize
index 8f7d040cdeb..6c10d1aa81b 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -40,6 +40,7 @@ LONG_EXT_PREFIXES = ['z', 's', 'h', 'x']
 #
 IMPLIED_EXT = {
   "d" : ["f", "zicsr"],
+  "a" : ["zaamo", "zalrsc"],
   "f" : ["zicsr"],
   "zdinx" : ["zfinx", "zicsr"],
   "zfinx" : ["zicsr"],
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 78cb1c37e69..b13e993c47a 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -256,7 +256,11 @@ Mask(ZICCRSE) Var(riscv_zi_subext)
 TargetVariable
 int riscv_za_subext

-Mask(ZAWRS) Var(riscv_za_subext)
+Mask(ZAWRS)  Var(riscv_za_subext)
+
+Mask(ZAAMO)  Var(riscv_za_subext)
+
+Mask(ZALRSC) Var(riscv_za_subext)

 Mask(ZA64RS)  Var(riscv_za_subext)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 6f0b5aae08d..c9544176ead 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -93,7 +93,7 @@
 (match_operand:GPR 1 "reg_or_0_operand" "rJ"))
   (match_operand:SI 2 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
-  "TARGET_ATOMIC"
+  "TARGET_ZAAMO"
   "amo.%A2\tzero,%z1,%0"
   [(set_attr "type" "atomic")
(set (attr "length") (const_int 4))])
@@ -107,7 +107,7 @@
 (match_operand:GPR 2 "reg_or_0_operand" "rJ"))
   (match_operand:SI 3 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
-  "TARGET_ATOMIC"
+  "TARGET_ZAAMO"
   "amo.%A3\t%0,%z2,%1"
   [(set_attr "type" "atomic")
(set (attr "length") (const_int 4))])
@@ -125,7 +125,7 @@
 (match_operand:SI 5 "register_operand" "rI")  ;; not_mask
 (clobber (match_scratch:SI 6 "="))  ;; tmp_1
 (clobber 

[PATCH v3 0/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-10 Thread Patrick O'Neill
The A extension has been split into two parts: Zaamo and Zalrsc.
This patch adds basic support by making the A extension imply Zaamo and
Zalrsc.

Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
Ratification: https://jira.riscv.org/browse/RVS-1995

v2:
Rebased and updated some testcases that rely on the ISA string.

v3:
Regex-ify temp registers in added testcases.
Remove unintentional whitespace changes.
Add riscv_{a|ztso|zaamo|zalrsc} docs to sourcebuild.texi (and move core-v bi
extension doc into appropriate section).

Edwin Lu (1):
  RISC-V: Add basic Zaamo and Zalrsc support

Patrick O'Neill (2):
  RISC-V: Add Zalrsc and Zaamo testsuite support
  RISC-V: Add Zalrsc amo-op patterns

 gcc/common/config/riscv/riscv-common.cc   |  11 +-
 gcc/config/riscv/arch-canonicalize|   1 +
 gcc/config/riscv/riscv.opt|   6 +-
 gcc/config/riscv/sync.md  | 152 +++---
 gcc/doc/sourcebuild.texi  |  16 +-
 .../riscv/amo-table-a-6-amo-add-1.c   |   2 +-
 .../riscv/amo-table-a-6-amo-add-2.c   |   2 +-
 .../riscv/amo-table-a-6-amo-add-3.c   |   2 +-
 .../riscv/amo-table-a-6-amo-add-4.c   |   2 +-
 .../riscv/amo-table-a-6-amo-add-5.c   |   2 +-
 .../riscv/amo-table-a-6-compare-exchange-1.c  |   2 +-
 .../riscv/amo-table-a-6-compare-exchange-2.c  |   2 +-
 .../riscv/amo-table-a-6-compare-exchange-3.c  |   2 +-
 .../riscv/amo-table-a-6-compare-exchange-4.c  |   2 +-
 .../riscv/amo-table-a-6-compare-exchange-5.c  |   2 +-
 .../riscv/amo-table-a-6-compare-exchange-6.c  |   2 +-
 .../riscv/amo-table-a-6-compare-exchange-7.c  |   2 +-
 .../riscv/amo-table-a-6-subword-amo-add-1.c   |   2 +-
 .../riscv/amo-table-a-6-subword-amo-add-2.c   |   2 +-
 .../riscv/amo-table-a-6-subword-amo-add-3.c   |   2 +-
 .../riscv/amo-table-a-6-subword-amo-add-4.c   |   2 +-
 .../riscv/amo-table-a-6-subword-amo-add-5.c   |   2 +-
 .../riscv/amo-table-ztso-amo-add-1.c  |   2 +-
 .../riscv/amo-table-ztso-amo-add-2.c  |   2 +-
 .../riscv/amo-table-ztso-amo-add-3.c  |   2 +-
 .../riscv/amo-table-ztso-amo-add-4.c  |   2 +-
 .../riscv/amo-table-ztso-amo-add-5.c  |   2 +-
 .../riscv/amo-table-ztso-compare-exchange-1.c |   2 +-
 .../riscv/amo-table-ztso-compare-exchange-2.c |   2 +-
 .../riscv/amo-table-ztso-compare-exchange-3.c |   2 +-
 .../riscv/amo-table-ztso-compare-exchange-4.c |   2 +-
 .../riscv/amo-table-ztso-compare-exchange-5.c |   2 +-
 .../riscv/amo-table-ztso-compare-exchange-6.c |   2 +-
 .../riscv/amo-table-ztso-compare-exchange-7.c |   2 +-
 .../riscv/amo-table-ztso-subword-amo-add-1.c  |   2 +-
 .../riscv/amo-table-ztso-subword-amo-add-2.c  |   2 +-
 .../riscv/amo-table-ztso-subword-amo-add-3.c  |   2 +-
 .../riscv/amo-table-ztso-subword-amo-add-4.c  |   2 +-
 .../riscv/amo-table-ztso-subword-amo-add-5.c  |   2 +-
 .../riscv/amo-zaamo-preferred-over-zalrsc.c   |  17 ++
 .../gcc.target/riscv/amo-zalrsc-amo-add-1.c   |  19 +++
 .../gcc.target/riscv/amo-zalrsc-amo-add-2.c   |  19 +++
 .../gcc.target/riscv/amo-zalrsc-amo-add-3.c   |  19 +++
 .../gcc.target/riscv/amo-zalrsc-amo-add-4.c   |  19 +++
 .../gcc.target/riscv/amo-zalrsc-amo-add-5.c   |  19 +++
 gcc/testsuite/gcc.target/riscv/attribute-15.c |   2 +-
 gcc/testsuite/gcc.target/riscv/attribute-16.c |   2 +-
 gcc/testsuite/gcc.target/riscv/attribute-17.c |   2 +-
 gcc/testsuite/gcc.target/riscv/attribute-18.c |   2 +-
 gcc/testsuite/gcc.target/riscv/pr110696.c |   2 +-
 .../gcc.target/riscv/rvv/base/pr114352-1.c|   4 +-
 .../gcc.target/riscv/rvv/base/pr114352-3.c|   8 +-
 gcc/testsuite/lib/target-supports.exp |  48 +-
 53 files changed, 366 insertions(+), 70 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-5.c

--
2.34.1



[PATCH v3 3/3] RISC-V: Add Zalrsc amo-op patterns

2024-06-10 Thread Patrick O'Neill
All amo patterns can be represented with lrsc sequences.
Add these patterns as a fallback when Zaamo is not enabled.

gcc/ChangeLog:

* config/riscv/sync.md (atomic_): New expand 
pattern.
(amo_atomic_): Rename amo pattern.
(atomic_fetch_): New lrsc sequence pattern.
(lrsc_atomic_): New expand pattern.
(amo_atomic_fetch_): Rename amo pattern.
(lrsc_atomic_fetch_): New lrsc sequence pattern.
(atomic_exchange): New expand pattern.
(amo_atomic_exchange): Rename amo pattern.
(lrsc_atomic_exchange): New lrsc sequence pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-1.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-2.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-3.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-4.c: New test.
* gcc.target/riscv/amo-zalrsc-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
---
rv64imfdc_zalrsc has the same testsuite results as rv64imafdc after this
patch is applied.
---
v3 Changelog:
Use more flexible regex for temp register.
---
 gcc/config/riscv/sync.md  | 124 +-
 .../riscv/amo-zaamo-preferred-over-zalrsc.c   |  17 +++
 .../gcc.target/riscv/amo-zalrsc-amo-add-1.c   |  19 +++
 .../gcc.target/riscv/amo-zalrsc-amo-add-2.c   |  19 +++
 .../gcc.target/riscv/amo-zalrsc-amo-add-3.c   |  19 +++
 .../gcc.target/riscv/amo-zalrsc-amo-add-4.c   |  19 +++
 .../gcc.target/riscv/amo-zalrsc-amo-add-5.c   |  19 +++
 7 files changed, 231 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-5.c

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index c9544176ead..4df9d0b5a5f 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -86,7 +86,24 @@
 DONE;
   })

-(define_insn "atomic_"
+;; AMO ops
+
+(define_expand "atomic_"
+  [(any_atomic:GPR (match_operand:GPR 0 "memory_operand");; mem location
+  (match_operand:GPR 1 "reg_or_0_operand")) ;; value for op
+   (match_operand:SI 2 "const_int_operand")];; model
+  "TARGET_ZAAMO || TARGET_ZALRSC"
+{
+  if (TARGET_ZAAMO)
+emit_insn (gen_amo_atomic_ (operands[0], operands[1],
+   operands[2]));
+  else
+emit_insn (gen_lrsc_atomic_ (operands[0], operands[1],
+operands[2]));
+  DONE;
+})
+
+(define_insn "amo_atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
(unspec_volatile:GPR
  [(any_atomic:GPR (match_dup 0)
@@ -98,7 +115,44 @@
   [(set_attr "type" "atomic")
(set (attr "length") (const_int 4))])

-(define_insn "atomic_fetch_"
+(define_insn "lrsc_atomic_"
+  [(set (match_operand:GPR 0 "memory_operand" "+A")
+   (unspec_volatile:GPR
+ [(any_atomic:GPR (match_dup 0)
+(match_operand:GPR 1 "reg_or_0_operand" "rJ"))
+  (match_operand:SI 2 "const_int_operand")] ;; model
+UNSPEC_SYNC_OLD_OP))
+   (clobber (match_scratch:GPR 3 "="))]   ;; tmp_1
+  "!TARGET_ZAAMO && TARGET_ZALRSC"
+  {
+return "1:\;"
+  "lr.%I2\t%3, %0\;"
+  "\t%3, %3, %1\;"
+  "sc.%J2\t%3, %3, %0\;"
+  "bnez\t%3, 1b";
+  }
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 16))])
+
+;; AMO fetch ops
+
+(define_expand "atomic_fetch_"
+  [(match_operand:GPR 0 "register_operand") ;; old value at mem
+   (any_atomic:GPR (match_operand:GPR 1 "memory_operand");; mem location
+  (match_operand:GPR 2 "reg_or_0_operand")) ;; value for op
+   (match_operand:SI 3 "const_int_operand")];; model
+  "TARGET_ZAAMO || TARGET_ZALRSC"
+  {
+if (TARGET_ZAAMO)
+  emit_insn (gen_amo_atomic_fetch_ (operands[0], 
operands[1],
+   operands[2], 
operands[3]));
+else
+  emit_insn (gen_lrsc_atomic_fetch_ (operands[0], 
operands[1],
+operands[2], 
operands[3]));
+DONE;
+  })
+
+(define_insn "amo_atomic_fetch_"
   [(set (match_operand:GPR 0 "register_operand" "=")
(match_operand:GPR 1 "memory_operand" "+A"))
(set (match_dup 1)
@@ -112,6 +166,27 @@
   [(set_attr "type" "atomic")
(set (attr "length") (const_int 4))])

+(define_insn "lrsc_atomic_fetch_"
+  [(set (match_operand:GPR 0 "register_operand" "=")
+   (match_operand:GPR 1 

[PATCH v3 2/3] RISC-V: Add Zalrsc and Zaamo testsuite support

2024-06-10 Thread Patrick O'Neill
Convert testsuite infrastructure to use Zalrsc and Zaamo rather than A.

gcc/ChangeLog:

* doc/sourcebuild.texi: Add docs for atomic extension testsuite infra.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: Use Zaamo rather than A.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Use Zalrsc rather
than A.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Use Zaamo rather
than A.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add Zaamo option.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Use Zalrsc 
rather
than A.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.
* lib/target-supports.exp: Add testsuite infrastructure support for
Zaamo and Zalrsc.

Signed-off-by: Patrick O'Neill 
---
v3 Changelog:
Add docs for atomic related testsuite infra (riscv_a, etc.).
---
 gcc/doc/sourcebuild.texi  | 16 ++-
 .../riscv/amo-table-a-6-amo-add-1.c   |  2 +-
 .../riscv/amo-table-a-6-amo-add-2.c   |  2 +-
 .../riscv/amo-table-a-6-amo-add-3.c   |  2 +-
 .../riscv/amo-table-a-6-amo-add-4.c   |  2 +-
 .../riscv/amo-table-a-6-amo-add-5.c   |  2 +-
 .../riscv/amo-table-a-6-compare-exchange-1.c  |  2 +-
 .../riscv/amo-table-a-6-compare-exchange-2.c  |  2 +-
 .../riscv/amo-table-a-6-compare-exchange-3.c  |  2 +-
 .../riscv/amo-table-a-6-compare-exchange-4.c  |  2 +-
 .../riscv/amo-table-a-6-compare-exchange-5.c  |  2 +-
 .../riscv/amo-table-a-6-compare-exchange-6.c  |  2 +-
 .../riscv/amo-table-a-6-compare-exchange-7.c  |  2 +-
 .../riscv/amo-table-a-6-subword-amo-add-1.c   |  2 +-
 .../riscv/amo-table-a-6-subword-amo-add-2.c   |  2 +-
 .../riscv/amo-table-a-6-subword-amo-add-3.c   |  2 +-
 .../riscv/amo-table-a-6-subword-amo-add-4.c   |  2 +-
 .../riscv/amo-table-a-6-subword-amo-add-5.c   |  2 +-
 .../riscv/amo-table-ztso-amo-add-1.c  |  2 +-
 .../riscv/amo-table-ztso-amo-add-2.c  |  2 +-
 .../riscv/amo-table-ztso-amo-add-3.c  |  2 +-
 .../riscv/amo-table-ztso-amo-add-4.c  |  2 +-
 .../riscv/amo-table-ztso-amo-add-5.c  |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-1.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-2.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-3.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-4.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-5.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-6.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-7.c |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-1.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-2.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-3.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-4.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-5.c  |  2 +-
 gcc/testsuite/lib/target-supports.exp | 48 ++-
 36 files changed, 95 insertions(+), 37 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index e997dbec333..e37fb85f3b3 100644
--- 

[to-be-committed] [RISC-V] Improve (1 << N) | C for rv64

2024-06-10 Thread Jeff Law

Another improvement for generating Zbs instructions.

In this case we're looking at stuff like (1 << N) | C where N varies and 
C is a single bit constant.


In this pattern the (1 << N) happens in SImode, but is zero extended out 
to DImode before the bit manipulation.  The fact that we're modifying a 
DImode object in the logical op is important as it means we don't have 
to worry about whether or not the resulting value is sign extended from 
SI to DI.


This has run through Ventana's CI system.  I'll wait for it to roll 
through pre-commit CI before moving forward.


Jeff



gcc/
* bitmanip.md ((1 << N) | C)): New splitter for IOR/XOR of
a single bit an a DImode object.

gcc/testsuite/

* gcc.target/riscv/zbs-zext.c: New test.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 6c2736454aa..3cc244898e7 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -727,6 +727,21 @@ (define_insn "*bsetidisi"
   "bseti\t%0,%1,%S2"
   [(set_attr "type" "bitmanip")])
 
+;; We can easily handle zero extensions
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+(any_or:DI (zero_extend:DI
+(ashift:SI (const_int 1)
+   (match_operand:QI 1 "register_operand")))
+  (match_operand:DI 2 "single_bit_mask_operand")))
+   (clobber (match_operand:DI 3 "register_operand"))]
+  "TARGET_64BIT && TARGET_ZBS"
+  [(set (match_dup 3)
+(match_dup 2))
+   (set (match_dup 0)
+ (any_or:DI (ashift:DI (const_int 1) (match_dup 1))
+   (match_dup 3)))])
+
 (define_insn "*bclr"
   [(set (match_operand:X 0 "register_operand" "=r")
(and:X (rotate:X (const_int -2)
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-zext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-zext.c
new file mode 100644
index 000..5773b15d298
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-zext.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bset (const uint32_t i)
+{
+  uint64_t checks = 8;
+  checks |= 1U << i;
+  return checks;
+}
+
+uint64_t binv (const uint32_t i)
+{
+  uint64_t checks = 8;
+  checks ^= 1U << i;
+  return checks;
+}
+
+uint64_t bclr (const uint32_t i)
+{
+  uint64_t checks = 10;
+  checks &= ~(1U << i);
+  return checks;
+}
+
+/* { dg-final { scan-assembler-times "bset\t" 1 } } */
+/* { dg-final { scan-assembler-times "binv\t" 1 } } */
+/* { dg-final { scan-assembler-times "bclr\t" 1 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */


Re: [PATCH] Move array_bounds warnings into it's own pass.

2024-06-10 Thread Andrew MacLeod

pushed as 74ee12ff68243bb177fb8653474dff80c3792139


fyi, the 2 testcases depending on the VRP flag were:

c-c++-common/Warray-bounds-2.c   (-warray-bounds  -fno-tree-vrp :-P)
and
g++.dg/warn/string1.C   (-O1 -Wall)

Andrew

On 6/10/24 16:12, Jeff Law wrote:



On 6/10/24 1:24 PM, Andrew MacLeod wrote:
The array bounds warning pass was originally attached to the VRP pass 
because it wanted to leverage the context sensitive ranges available 
there.


With ranger, we can make it a pass of its own for very little cost. 
This patch does that. It removes the array_bounds_checker from VRP 
and makes it a solo pass that runs immediately after VRP1.


The original version had VRP add any un-executable edge flags it 
found, but I could not find a case where after VRP cleans up the CFG 
the new pass needed that.  I also did not find a case where 
activating SCEV again for the warning pass made a difference after 
VRP had run.  So this patch does neither of those things.


It simple enough to later add SCEV and loop analysis again if it 
turns out to be important.


My primary motivation for removing it was to remove the second DOM 
walk the checker performs which depends on on-demand ranges 
pre-cached by ranger.   This prevented VRP from choosing an 
alternative VRP solution when basic block counts are very high (PR  
114855).  I also know Siddesh want to experiment with moving the pass 
later in the pipeline as well, which will make that task much simpler 
as a secondary rationale.


I didn't want to mess with the internal code much. For a multitude of 
reasons.  I did change it so that it always uses the current 
range_query object instead of passing one in to the constructor.  And 
then I cleaned up the VRP code ot no longer take a flag on whether to 
invoke the warning code or not.


The final bit is the pass is set to only run when flag_tree_vrp is 
on.. I did this primarily to preserve existing functionality, and 
some tests depended on it.  ie, would turn on -warray-bounds and 
disables tree-vrp pass (which means the  bounds checker doesnt run) 
... which changes the expected warnings from the strlen pass.    I'm 
not going there.    there are  also tests which run at -O1 and -Wall 
that do not expect the bounds checker to run either.   So this 
dependence on the vrp flag is documented in the code an preserves 
existing behavior.


Does anyone have any issues with any of this?
No, in fact, quite the opposite.  I think we very much want the 
warning out of VRP into its own little pass that we can put wherever 
it makes sense in the pipeline rather than having it be tied to VRP.


I'd probably look at the -O1 vs -Wall stuff independently so that we 
could (in theory) eventually remove the dependence on flag_vrp.


jeff






Re: [PATCH] Move array_bounds warnings into it's own pass.

2024-06-10 Thread Andrew MacLeod



On 6/10/24 16:12, Jeff Law wrote:





Does anyone have any issues with any of this?
No, in fact, quite the opposite.  I think we very much want the 
warning out of VRP into its own little pass that we can put wherever 
it makes sense in the pipeline rather than having it be tied to VRP.


I'd probably look at the -O1 vs -Wall stuff independently so that we 
could (in theory) eventually remove the dependence on flag_vrp.



I figured as much about removing the dependence on flag_tree_vrp .  I'm 
just tried of poking at it :-)  It shouldn't be difficult.


I'll check it in now and see what fallout ensues.


Andrew



Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Jeff Law




On 6/10/24 8:49 AM, pan2...@intel.com wrote:

When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
to replace the PHI node.  Unfortunately,  I made a mistake that insert
the gcall to before the last stmt of the bb.  See below gimple,  the PHI
is located at no.1 but we insert the gcall (aka no.9) to the end of
the bb.  Then the use of _9 in no.2 will have no def and will trigger
ICE when verify_ssa.

   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
   2. prephitmp_36 = (char *) _9;
   3. buf.write_base = string_13(D);
   4. buf.write_ptr = string_13(D);
   5. buf.write_end = prephitmp_36;
   6. buf.written = 0;
   7. buf.mode = 3;
   8. _7 = buf.write_end;
   9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb by 
mistake

This patch would like to insert the gcall to before the start of the bb
stmt.  To ensure the possible use of PHI_result will have a def exists.
After this patch the above gimple will be:

   0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
by mistake
   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
   2. prephitmp_36 = (char *) _9;
   3. buf.write_base = string_13(D);
   4. buf.write_ptr = string_13(D);
   5. buf.write_end = prephitmp_36;
   6. buf.written = 0;
   7. buf.mode = 3;
   8. _7 = buf.write_end;

The below test suites are passed for this patch:
* The rv64gcv fully regression test with newlib.
* The rv64gcv build with glibc.
* The x86 regression test with newlib.
* The x86 bootstrap test with newlib.

PR target/115387

gcc/ChangeLog:

* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Take
the gsi of start_bb instead of last_bb.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: New test.
* gcc.target/riscv/pr115387-2.c: New test.

I did a fresh x86_64 bootstrap and regression test and pushed this.

jeff



Re: [PATCH] Move array_bounds warnings into it's own pass.

2024-06-10 Thread Jeff Law




On 6/10/24 1:24 PM, Andrew MacLeod wrote:
The array bounds warning pass was originally attached to the VRP pass 
because it wanted to leverage the context sensitive ranges available there.


With ranger, we can make it a pass of its own for very little cost. This 
patch does that. It removes the array_bounds_checker from VRP and makes 
it a solo pass that runs immediately after VRP1.


The original version had VRP add any un-executable edge flags it found, 
but I could not find a case where after VRP cleans up the CFG the new 
pass needed that.  I also did not find a case where activating SCEV 
again for the warning pass made a difference after VRP had run.  So this 
patch does neither of those things.


It simple enough to later add SCEV and loop analysis again if it turns 
out to be important.


My primary motivation for removing it was to remove the second DOM walk 
the checker performs which depends on on-demand ranges pre-cached by 
ranger.   This prevented VRP from choosing an alternative VRP solution 
when basic block counts are very high (PR  114855).  I also know Siddesh 
want to experiment with moving the pass later in the pipeline as well, 
which will make that task much simpler as a secondary rationale.


I didn't want to mess with the internal code much. For a multitude of 
reasons.  I did change it so that it always uses the current range_query 
object instead of passing one in to the constructor.  And then I cleaned 
up the VRP code ot no longer take a flag on whether to invoke the 
warning code or not.


The final bit is the pass is set to only run when flag_tree_vrp is on.. 
I did this primarily to preserve existing functionality, and some tests 
depended on it.  ie, would turn on -warray-bounds and disables tree-vrp 
pass (which means the  bounds checker doesnt run) ... which changes the 
expected warnings from the strlen pass.    I'm not going there.    there 
are  also tests which run at -O1 and -Wall that do not expect the bounds 
checker to run either.   So this dependence on the vrp flag is 
documented in the code an preserves existing behavior.


Does anyone have any issues with any of this?
No, in fact, quite the opposite.  I think we very much want the warning 
out of VRP into its own little pass that we can put wherever it makes 
sense in the pipeline rather than having it be tied to VRP.


I'd probably look at the -O1 vs -Wall stuff independently so that we 
could (in theory) eventually remove the dependence on flag_vrp.


jeff




RE: [PATCH] aarch64: Improve popcount for bytes [PR113042]

2024-06-10 Thread Andrew Pinski (QUIC)


> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Monday, June 10, 2024 12:26 AM
> To: Andrew Pinski (QUIC) ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: Improve popcount for bytes
> [PR113042]
> 
> Hi Andrew
> 
> -Original Message-
> From: Andrew Pinski  >
> Date: Monday, 10 June 2024 at 06:05
> To: "gcc-patches@gcc.gnu.org  patc...@gcc.gnu.org>"  >
> Cc: Andrew Pinski  >
> Subject: [PATCH] aarch64: Improve popcount for bytes
> [PR113042]
> 
> 
> For popcount for bytes, we don't need the reduction addition
> after the vector cnt instruction as we are only counting one
> byte's popcount.
> This implements a new define_expand to handle that.
> 
> 
> Bootstrapped and tested on aarch64-linux-gnu with no
> regressions.
> 
> 
> PR target/113042
> 
> 
> gcc/ChangeLog:
> 
> 
> * config/aarch64/aarch64.md (popcountqi2): New pattern.
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 
> * gcc.target/aarch64/popcnt5.c: New test.
> 
> 
> Signed-off-by: Andrew Pinski  >
> ---
> gcc/config/aarch64/aarch64.md | 26
> ++
> gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19
> 
> 2 files changed, 45 insertions(+)
> create mode 100644
> gcc/testsuite/gcc.target/aarch64/popcnt5.c
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md index
> 389a1906e23..ebaf7ec9970 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5358,6 +5358,32 @@ (define_expand
> "popcount2"
> }
> })
> 
> 
> +/* Popcount for byte can remove the reduction part after the
> popcount.
> + For optimization reasons, enabling this for CSSC. */
> (define_expand
> +"popcountqi2"
> + [(set (match_operand:QI 0 "register_operand" "=w")
> (popcount:QI
> +(match_operand:QI 1 "register_operand" "w")))]
> "TARGET_CSSC ||
> +TARGET_SIMD"
> +{
> + rtx in = operands[1];
> + rtx out = operands[0];
> + if (TARGET_CSSC)
> + {
> + rtx tmp = gen_reg_rtx (SImode);
> + rtx out1 = gen_reg_rtx (SImode);
> + emit_insn (gen_zero_extendqisi2 (tmp, in));  emit_insn
> +(gen_popcountsi2 (out1, tmp));  emit_move_insn (out,
> gen_lowpart
> +(QImode, out1));  DONE;  }  rtx v = gen_reg_rtx (V8QImode);
> rtx v1 =
> +gen_reg_rtx (V8QImode);  emit_move_insn (v, gen_lowpart
> (V8QImode,
> +in));  emit_insn (gen_popcountv8qi2 (v1, v));
> emit_move_insn (out,
> +gen_lowpart (QImode, v1));  DONE;
> +})
> 
> TBH I'd rather merge it with the GPI popcount pattern that
> looks almost identical. You could extend it with the ALLI
> iterator and handle HImode as well quite easily.

I was thinking about that beforehand, but I was trying for the simplified patch 
at the time.
Anyways I posted the updated version: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654115.html

And it includes the CSSC testcases too to make sure the generated code is 
correct.

Thanks,
Andrew Pinski



> Thanks,
> Kyrill
> 
> 
> +
> (define_insn "clrsb2"
> [(set (match_operand:GPI 0 "register_operand" "=r")
> (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt5.c
> b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
> new file mode 100644
> index 000..406369d9b29
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +/* PR target/113042 */
> +
> +#pragma GCC target "+nocssc"
> +
> +/*
> +** h8:
> +** ldr b[0-9]+, \[x0\]
> +** cnt v[0-9]+.8b, v[0-9]+.8b
> +** smov w0, v[0-9]+.b\[0\]
> +** ret
> +*/
> +/* We should not need the addv here since we only need a
> byte popcount.
> +*/
> +
> +unsigned h8 (const unsigned char *a) {
> + return __builtin_popcountg (a[0]);
> +}
> --
> 2.42.0
> 
> 
> 
> 



[PATCH] Move array_bounds warnings into it's own pass.

2024-06-10 Thread Andrew MacLeod
The array bounds warning pass was originally attached to the VRP pass 
because it wanted to leverage the context sensitive ranges available there.


With ranger, we can make it a pass of its own for very little cost.  
This patch does that. It removes the array_bounds_checker from VRP and 
makes it a solo pass that runs immediately after VRP1.


The original version had VRP add any un-executable edge flags it found, 
but I could not find a case where after VRP cleans up the CFG the new 
pass needed that.  I also did not find a case where activating SCEV 
again for the warning pass made a difference after VRP had run.  So this 
patch does neither of those things.


It simple enough to later add SCEV and loop analysis again if it turns 
out to be important.


My primary motivation for removing it was to remove the second DOM walk 
the checker performs which depends on on-demand ranges pre-cached by  
ranger.   This prevented VRP from choosing an alternative VRP solution 
when basic block counts are very high (PR  114855).  I also know Siddesh 
want to experiment with moving the pass later in the pipeline as well, 
which will make that task much simpler as a secondary rationale.


I didn't want to mess with the internal code much. For a multitude of 
reasons.  I did change it so that it always uses the current range_query 
object instead of passing one in to the constructor.  And then I cleaned 
up the VRP code ot no longer take a flag on whether to invoke the 
warning code or not.


The final bit is the pass is set to only run when flag_tree_vrp is on..  
I did this primarily to preserve existing functionality, and some tests 
depended on it.  ie, would turn on -warray-bounds and disables tree-vrp 
pass (which means the  bounds checker doesnt run) ... which changes the 
expected warnings from the strlen pass.    I'm not going there.    there 
are  also tests which run at -O1 and -Wall that do not expect the bounds 
checker to run either.   So this dependence on the vrp flag is 
documented in the code an preserves existing behavior.


Does anyone have any issues with any of this?

Bootstraps on x86_64-pc-linux-gnu with no regressions.

Andrew
From aa0259784b0c0884956a627b78c0f5025d76c931 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 5 Jun 2024 15:12:27 -0400
Subject: [PATCH 2/2] Move array_bounds warnings into it's own pass.

Array bounds checking is currently tied to VRP.  This causes issues with
using laternate VRP algorithms as well as experimenting with moving
the location of the warnings later.   This moves it to its own pass
and cleans up the vrp_pass object.

	* gimple-array-bounds.cc (array_bounds_checker::array_bounds_checker):
	Always use current range_query.
	(pass_data_array_bounds): New.
	(pass_array_bounds): New.
	(make_pass_array_bounds): New.
	* gimple-array-bounds.h  (array_bounds_checker): Adjust prototype.
	* timevar.def (TV_TREE_ARRAY_BOUNDS): New timevar.
	* tree-pass.h (make_pass_array_bounds): Add prototype.
	* tree-vrp.cc (execute_ranger_vrp): Remove warning param and do
	not invoke array bounds warning pass.
	(pass_vrp::pass_vrp): Adjust params.
	(pass_vrp::close): Adjust parameters.
	(pass_vrp::warn_array_bounds_p): Remove.
	(make_pass_vrp): Remove warning param.
	(make_pass_early_vrp): Remove warning param.
	(make_pass_fast_vrp): Remove warning param.
---
 gcc/gimple-array-bounds.cc | 63 +-
 gcc/gimple-array-bounds.h  |  2 +-
 gcc/passes.def |  1 +
 gcc/timevar.def|  1 +
 gcc/tree-pass.h|  1 +
 gcc/tree-vrp.cc| 40 +---
 6 files changed, 67 insertions(+), 41 deletions(-)

diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 008071cd546..1d14abf3ca1 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -38,10 +38,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "domwalk.h"
 #include "tree-cfg.h"
 #include "attribs.h"
+#include "tree-pass.h"
+#include "gimple-range.h"
 
-array_bounds_checker::array_bounds_checker (struct function *func,
-	range_query *qry)
-  : fun (func), m_ptr_qry (qry)
+// Always use the current range query for the bounds checker.
+array_bounds_checker::array_bounds_checker (struct function *func)
+  : fun (func), m_ptr_qry (get_range_query (func))
 {
   /* No-op.  */
 }
@@ -838,11 +840,7 @@ class check_array_bounds_dom_walker : public dom_walker
 {
 public:
   check_array_bounds_dom_walker (array_bounds_checker *checker)
-: dom_walker (CDI_DOMINATORS,
-		  /* Discover non-executable edges, preserving EDGE_EXECUTABLE
-		 flags, so that we can merge in information on
-		 non-executable edges from vrp_folder .  */
-		  REACHABLE_BLOCKS_PRESERVING_FLAGS),
+: dom_walker (CDI_DOMINATORS, REACHABLE_BLOCKS),
 checker (checker) { }
   ~check_array_bounds_dom_walker () {}
 
@@ -888,3 +886,52 @@ array_bounds_checker::check ()
   check_array_bounds_dom_walker w (this);
   

[PATCH v2] aarch64: Improve popcount for bytes [PR113042]

2024-06-10 Thread Andrew Pinski
For popcount for bytes, we don't need the reduction addition
after the vector cnt instruction as we are only counting one
byte's popcount.
This changes the popcount extend to cover all ALLI rather than GPI.

Changes since v1:
* v2 - Use ALLI iterator and combine all into one pattern.
   Add new testcases popcnt[6-8].c.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

PR target/113042

gcc/ChangeLog:

* config/aarch64/aarch64.md (popcount2): Update pattern
to support ALLI modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt5.c: New test.
* gcc.target/aarch64/popcnt6.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md  | 52 +++---
 gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19 
 gcc/testsuite/gcc.target/aarch64/popcnt6.c | 19 
 gcc/testsuite/gcc.target/aarch64/popcnt7.c | 18 
 gcc/testsuite/gcc.target/aarch64/popcnt8.c | 18 
 5 files changed, 119 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt8.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 389a1906e23..dd88fd891b5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5332,28 +5332,66 @@ (define_insn "*aarch64_popcount2_cssc_insn"
 ;; MOV w0, v2.b[0]
 
 (define_expand "popcount2"
-  [(set (match_operand:GPI 0 "register_operand")
-   (popcount:GPI (match_operand:GPI 1 "register_operand")))]
+  [(set (match_operand:ALLI 0 "register_operand")
+   (popcount:ALLI (match_operand:ALLI 1 "register_operand")))]
   "TARGET_CSSC || TARGET_SIMD"
 {
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if (TARGET_CSSC
+  && (mode == HImode
+  || mode == QImode))
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  rtx out1 = gen_reg_rtx (SImode);
+  if (mode == HImode)
+emit_insn (gen_zero_extendhisi2 (tmp, in));
+  else
+emit_insn (gen_zero_extendqisi2 (tmp, in));
+  emit_insn (gen_popcountsi2 (out1, tmp));
+  emit_move_insn (out, gen_lowpart (mode, out1));
+  DONE;
+}
   if (!TARGET_CSSC)
 {
   rtx v = gen_reg_rtx (V8QImode);
   rtx v1 = gen_reg_rtx (V8QImode);
   rtx in = operands[1];
   rtx out = operands[0];
-  if(mode == SImode)
+  /* SImode and HImode should be zero extended to DImode. */
+  if (mode == SImode || mode == HImode)
{
  rtx tmp;
  tmp = gen_reg_rtx (DImode);
- /* If we have SImode, zero extend to DImode, pop count does
-not change if we have extra zeros. */
- emit_insn (gen_zero_extendsidi2 (tmp, in));
+ /* If we have SImode, zero extend to DImode,
+pop count does not change if we have extra zeros. */
+ if (mode == SImode)
+   emit_insn (gen_zero_extendsidi2 (tmp, in));
+ else
+   emit_insn (gen_zero_extendhidi2 (tmp, in));
  in = tmp;
}
   emit_move_insn (v, gen_lowpart (V8QImode, in));
   emit_insn (gen_popcountv8qi2 (v1, v));
-  emit_insn (gen_aarch64_zero_extend_reduc_plus_v8qi (out, v1));
+  /* QImode, just extract from the v8qi vector.  */
+  if (mode == QImode)
+   {
+ emit_move_insn (out, gen_lowpart (QImode, v1));
+   }
+  /* HI and SI, reduction is zero extended to SImode. */
+  else if (mode == SImode || mode == HImode)
+   {
+ rtx out1;
+ out1 = gen_reg_rtx (SImode);
+ emit_insn (gen_aarch64_zero_extendsi_reduc_plus_v8qi (out1, v1));
+ emit_move_insn (out, gen_lowpart (mode, out1));
+   }
+  /* DImode, reduction is zero extended to DImode. */
+  else
+   {
+ gcc_assert (mode == DImode);
+ emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v8qi (out, v1));
+   }
   DONE;
 }
 })
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt5.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
new file mode 100644
index 000..406369d9b29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h8:
+** ldr b[0-9]+, \[x0\]
+** cnt v[0-9]+.8b, v[0-9]+.8b
+** smovw0, v[0-9]+.b\[0\]
+** ret
+*/
+/* We should not need the addv here since we only need a byte popcount. */
+
+unsigned h8 (const unsigned char *a) {
+ return __builtin_popcountg (a[0]);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt6.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt6.c
new file mode 100644
index 000..e882cb24126
--- /dev/null
+++ 

Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-10 Thread Jeff Law




On 6/10/24 12:27 PM, Philipp Tomsich wrote:



This change is what I briefly hinted as "the complete solution" that
we had on the drawing board when we briefly talked last November in
Santa Clara.
I haven't any recollection of that part of the discussion, but I was a 
bit frazzled as you probably noticed.




  We have looked at all of SPEC2017, especially for coverage (i.e.,
making sure we see a significant number of uses of the transformation)
and correctness.  The gcc_r and parest_r components triggered in a
number of "interesting" ways (e.g., motivating the case of
load-elimination).  If it helps, we could share the statistics for how
often the pass triggers on compiling each of the SPEC2017 components?
Definitely helpful.  I may be able to juggle some priorities internally 
to lend a larger hand on testing and helping move this forward.  It's an 
area we're definitely interested in.


Jeff


Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-10 Thread Philipp Tomsich
On Mon, 10 Jun 2024 at 20:03, Jeff Law  wrote:
>
>
>
> On 6/10/24 1:55 AM, Manolis Tsamis wrote:
>
> >>
> > There was an older submission of a load-pair specific pass but this is
> > a complete reimplementation and indeed significantly more general.
> > Apart from being target independant, it addresses a number of
> > important restrictions and can handle multiple store forwardings per
> > load.
> > It should be noted that it cannot handle the load-pair cases as these
> > need special handling, but that's something we're planning to do in
> > the future by reusing this infrastructure.
> ACK.  Thanks for the additional background.
>
>
> >
> >>
> >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> >>> index 4e8967fd8ab..c769744d178 100644
> >>> --- a/gcc/doc/invoke.texi
> >>> +++ b/gcc/doc/invoke.texi
> >>> @@ -12657,6 +12657,15 @@ loop unrolling.
> >>>This option is enabled by default at optimization levels @option{-O1},
> >>>@option{-O2}, @option{-O3}, @option{-Os}.
> >>>
> >>> +@opindex favoid-store-forwarding
> >>> +@item -favoid-store-forwarding
> >>> +@itemx -fno-avoid-store-forwarding
> >>> +Many CPUs will stall for many cycles when a load partially depends on 
> >>> previous
> >>> +smaller stores.  This pass tries to detect such cases and avoid the 
> >>> penalty by
> >>> +changing the order of the load and store and then fixing up the loaded 
> >>> value.
> >>> +
> >>> +Disabled by default.
> >> Is there any particular reason why this would be off by default at -O1
> >> or higher?  It would seem to me that on modern cores that this
> >> transformation should easily be a win.  Even on an old in-order core,
> >> avoiding the load with the bit insert is likely profitable, just not as
> >> much so.
> >>
> > I don't have a strong opinion for that but I believe Richard's
> > suggestion to decide this on a per-target basis also makes a lot of
> > sense.
> > Deciding whether the transformation is profitable is tightly tied to
> > the architecture in question (i.e. how large the stall is and what
> > sort of bit-insert instructions are available).
> > In order to make this more widely applicable, I think we'll need a
> > target hook that decides in which case the forwarded stores incur a
> > penalty and thus the transformation makes sense.
> You and Richi are probably right.   I'm not a big fan of passes being
> enabled/disabled on particular targets, but it may make sense here.
>
>
>
> > Afaik, for each CPU there may be cases that store forwarding is
> > handled efficiently.
> Absolutely.   But forwarding from a smaller store to a wider load is
> painful from a hardware standpoint and if we can avoid it from a codegen
> standpoint, we should.

This change is what I briefly hinted as "the complete solution" that
we had on the drawing board when we briefly talked last November in
Santa Clara.

> Did y'all look at spec2017 at all for this patch?  I've got our hardware
> guys to expose a signal for this case so that we can (in a month or so)
> get some hard data on how often it's happening in spec2017 and evaluate
> how this patch helps the most affected workloads.  But if y'all already
> have some data we can use it as a starting point.

 We have looked at all of SPEC2017, especially for coverage (i.e.,
making sure we see a significant number of uses of the transformation)
and correctness.  The gcc_r and parest_r components triggered in a
number of "interesting" ways (e.g., motivating the case of
load-elimination).  If it helps, we could share the statistics for how
often the pass triggers on compiling each of the SPEC2017 components?

Philipp.


Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-10 Thread Jeff Law




On 6/10/24 1:55 AM, Manolis Tsamis wrote:




There was an older submission of a load-pair specific pass but this is
a complete reimplementation and indeed significantly more general.
Apart from being target independant, it addresses a number of
important restrictions and can handle multiple store forwardings per
load.
It should be noted that it cannot handle the load-pair cases as these
need special handling, but that's something we're planning to do in
the future by reusing this infrastructure.

ACK.  Thanks for the additional background.







diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4e8967fd8ab..c769744d178 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12657,6 +12657,15 @@ loop unrolling.
   This option is enabled by default at optimization levels @option{-O1},
   @option{-O2}, @option{-O3}, @option{-Os}.

+@opindex favoid-store-forwarding
+@item -favoid-store-forwarding
+@itemx -fno-avoid-store-forwarding
+Many CPUs will stall for many cycles when a load partially depends on previous
+smaller stores.  This pass tries to detect such cases and avoid the penalty by
+changing the order of the load and store and then fixing up the loaded value.
+
+Disabled by default.

Is there any particular reason why this would be off by default at -O1
or higher?  It would seem to me that on modern cores that this
transformation should easily be a win.  Even on an old in-order core,
avoiding the load with the bit insert is likely profitable, just not as
much so.


I don't have a strong opinion for that but I believe Richard's
suggestion to decide this on a per-target basis also makes a lot of
sense.
Deciding whether the transformation is profitable is tightly tied to
the architecture in question (i.e. how large the stall is and what
sort of bit-insert instructions are available).
In order to make this more widely applicable, I think we'll need a
target hook that decides in which case the forwarded stores incur a
penalty and thus the transformation makes sense.
You and Richi are probably right.   I'm not a big fan of passes being 
enabled/disabled on particular targets, but it may make sense here.





Afaik, for each CPU there may be cases that store forwarding is
handled efficiently.
Absolutely.   But forwarding from a smaller store to a wider load is 
painful from a hardware standpoint and if we can avoid it from a codegen 
standpoint, we should.


Did y'all look at spec2017 at all for this patch?  I've got our hardware 
guys to expose a signal for this case so that we can (in a month or so) 
get some hard data on how often it's happening in spec2017 and evaluate 
how this patch helps the most affected workloads.  But if y'all already 
have some data we can use it as a starting point.


jeff


Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-06-10 Thread Richard Sandiford
Robin Dapp  writes:
>> Is there any way we can avoid using pattern_cost here?  Using it means
>> that we can make use of targetm.insn_cost for the jump but circumvent
>> it for the condition, giving a bit of a mixed metric.
>> 
>> (I realise there are existing calls to pattern_cost in ifcvt.cc,
>> but if possible I think we should try to avoid adding more.)
>
> Yes, I believe there is.  In addition, what I did with
> if_info->cond wasn't what I intended to do.
>
> The whole point of the exercise is that noce_convert_multiple_sets
> can re-use the CC comparison that is already present (because it
> is used in the jump pattern).  Therefore I want to split costs
> into a jump part and a CC-setting part so the final costing
> decision for multiple sets can be:
>
>  insn_cost (jump) + n * insn_cost (set)
> vs
>  n * insn_cost ("cmov")
>
> Still, the original costs should be:
>  insn_cost (set_cc) + insn_cost (jump)
> and with the split we can just remove insn_cost (set_cc) before
> the multiple-set cost comparison and re-add it afterwards.
>
> For non-CC targets this is not necessary.
>
> So what I'd hope is better is to use
> insn_cost (if_info.earliest_cond)
> which is indeed the CC-set/comparison if it exists.

I agree that's probably good enough in practice.  It doesn't cope
with things like:

/* Handle sequences like:

   (set op0 (xor X Y))
   ...(eq|ne op0 (const_int 0))...

   in which case:

   (eq op0 (const_int 0)) reduces to (eq X Y)
   (ne op0 (const_int 0)) reduces to (ne X Y)

   This is the form used by MIPS16, for example.  */

but then neither does the current code.  But...

> The attached v2 was bootstrapped and regtested on x86, aarch64 and
> power10 and regtested on riscv64.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
>   * ifcvt.cc (noce_process_if_block): Subtract condition pattern
>   cost if applicable.
>   (noce_find_if_block): Use insn_cost and pattern_cost for
>   original cost.
> ---
>  gcc/ifcvt.cc | 31 ---
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 58ed42673e5..ebb838fd82c 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3931,16 +3931,16 @@ noce_process_if_block (struct noce_if_info *if_info)
>   to calculate a value for x.
>   ??? For future expansion, further expand the "multiple X" rules.  */
>  
> -  /* First look for multiple SETS.  The original costs already include
> - a base cost of COSTS_N_INSNS (2): one instruction for the compare
> - (which we will be needing either way) and one instruction for the
> - branch.  When comparing costs we want to use the branch instruction
> - cost and the sets vs. the cmovs generated here.  Therefore subtract
> - the costs of the compare before checking.
> - ??? Actually, instead of the branch instruction costs we might want
> - to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
> -
> -  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
> +  /* First look for multiple SETS.
> + The original costs already include costs for the jump insn as well
> + as for a CC comparison if there is any.
> + We want to allow the backend to re-use the existing CC comparison
> + and therefore don't consider it for the cost comparison (as it is
> + then needed for both the jump as well as the cmov sequence).  */
> +
> +  unsigned potential_cost = if_info->original_cost;
> +  if (if_info->cond_earliest && if_info->jump != if_info->cond_earliest)
> +potential_cost -= insn_cost (if_info->cond_earliest, if_info->speed_p);
>unsigned old_cost = if_info->original_cost;
>if (!else_bb
>&& HAVE_conditional_move

...why do we do the adjustment here?  Doesn't noce_convert_multiple_sets_1
know for certain (or at least with more certainty) whether any of the
new instructions use the old CC result?  It seems like we could record
that and do the adjustment around the call to
targetm.noce_conversion_profitable_p.

> @@ -4703,11 +4703,12 @@ noce_find_if_block (basic_block test_bb, edge 
> then_edge, edge else_edge,
>  = targetm.max_noce_ifcvt_seq_cost (then_edge);
>/* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
>   that they are valid to transform.  We can't easily get back to the insn
> - for COND (and it may not exist if we had to canonicalize to get COND),
> - and jump_insns are always given a cost of 1 by seq_cost, so treat
> - both instructions as having cost COSTS_N_INSNS (1).  */
> -  if_info.original_cost = COSTS_N_INSNS (2);
> -
> + for COND (and it may not exist if we had to canonicalize to get COND).
> + jump insn that is costed via insn_cost.  It is assumed that the
^^
Looks like this part of the comment got a bit garbled.

Thanks,
Richard

> + costs of a jump insn are 

Re: [PATCH v3 6/6] aarch64: Add DLL import/export to AArch64 target

2024-06-10 Thread Richard Sandiford
Thanks for the update.  Parts 1-5 look good to me.  Some minor comments
below about part 6:

Evgeny Karpov  writes:
> This patch reuses the MinGW implementation to enable DLL import/export
> functionality for the aarch64-w64-mingw32 target. It also modifies
> environment configurations for MinGW.
>
> gcc/ChangeLog:
>
>   * config.gcc: Add winnt-dll.o, which contains the DLL
>   import/export implementation.
>   * config/aarch64/aarch64.cc (aarch64_legitimize_pe_coff_symbol):
>   Add a conditional function that reuses the MinGW implementation
>   for COFF and does nothing otherwise.
>   (aarch64_load_symref_appropriately): Add dllimport
>   implementation.
>   (aarch64_expand_call): Likewise.
>   (aarch64_legitimize_address): Likewise.
>   * config/aarch64/cygming.h (SYMBOL_FLAG_DLLIMPORT): Modify MinGW
>   environment to support DLL import/export.
>   (SYMBOL_FLAG_DLLEXPORT): Likewise.
>   (SYMBOL_REF_DLLIMPORT_P): Likewise.
>   (SYMBOL_FLAG_STUBVAR): Likewise.
>   (SYMBOL_REF_STUBVAR_P): Likewise.
>   (TARGET_VALID_DLLIMPORT_ATTRIBUTE_P): Likewise.
>   (TARGET_ASM_FILE_END): Likewise.
>   (SUB_TARGET_RECORD_STUB): Likewise.
>   (GOT_ALIAS_SET): Likewise.
>   (PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED): Likewise.
>   (HAVE_64BIT_POINTERS): Likewise.
> ---
>  gcc/config.gcc|  4 +++-
>  gcc/config/aarch64/aarch64.cc | 37 +++
>  gcc/config/aarch64/cygming.h  | 26 ++--
>  3 files changed, 64 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index d053b98efa8..331285b7b6d 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1276,10 +1276,12 @@ aarch64-*-mingw*)
>   tm_file="${tm_file} mingw/mingw32.h"
>   tm_file="${tm_file} mingw/mingw-stdint.h"
>   tm_file="${tm_file} mingw/winnt.h"
> + tm_file="${tm_file} mingw/winnt-dll.h"
>   tmake_file="${tmake_file} aarch64/t-aarch64"
>   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
> + target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
>   extra_options="${extra_options} mingw/cygming.opt mingw/mingw.opt"
> - extra_objs="${extra_objs} winnt.o"
> + extra_objs="${extra_objs} winnt.o winnt-dll.o"
>   c_target_objs="${c_target_objs} msformat-c.o"
>   d_target_objs="${d_target_objs} winnt-d.o"
>   tmake_file="${tmake_file} mingw/t-cygming"
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 3418e57218f..5706b9aeb6b 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -860,6 +860,10 @@ static const attribute_spec aarch64_gnu_attributes[] =
>{ "Advanced SIMD type", 1, 1, false, true,  false, true,  NULL, NULL },
>{ "SVE type",3, 3, false, true,  false, true,  NULL, NULL 
> },
>{ "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL },
> +#if TARGET_DLLIMPORT_DECL_ATTRIBUTES
> +  { "dllimport", 0, 0, false, false, false, false, handle_dll_attribute, 
> NULL },
> +  { "dllexport", 0, 0, false, false, false, false, handle_dll_attribute, 
> NULL },
> +#endif
>  #ifdef SUBTARGET_ATTRIBUTE_TABLE
>SUBTARGET_ATTRIBUTE_TABLE
>  #endif
> @@ -2819,6 +2823,15 @@ tls_symbolic_operand_type (rtx addr)
>return tls_kind;
>  }
>  
> +rtx aarch64_legitimize_pe_coff_symbol (rtx addr, bool inreg)
> +{
> +#if TARGET_PECOFF
> +  return legitimize_pe_coff_symbol (addr, inreg);
> +#else
> +  return NULL_RTX;
> +#endif
> +}
> +

I wondered whether we should try to abstract this behind
SUBTARGET_* stuff, e.g.:

  SUBTARGET_LEGITIMIZE_ADDRESS(ADDR) (the inreg==true case)
  SUBTARGET_LEGITIMIZE_CALLEE(ADDR)  (the inreg==false case)

But I don't think it falls out naturally with the way GCC's code is
organised.  I agree having direct references to PECOFF is probably the
least worst option under the circumstances.

Since there is no AArch64-specific handling, I think it'd be
better to have:

#if !TARGET_PECOFF
rtx legitimize_pe_coff_symbol (rtx, bool) { return NULL_RTX; }
#endif

This avoids warning about unused arguments in the !TARGET_PECOFF case.

>  /* We'll allow lo_sum's in addresses in our legitimate addresses
> so that combine would take care of combining addresses where
> necessary, but for generation purposes, we'll generate the address
> @@ -2865,6 +2878,17 @@ static void
>  aarch64_load_symref_appropriately (rtx dest, rtx imm,
>  enum aarch64_symbol_type type)
>  {
> +  /* If legitimize returns a value
> + copy it directly to the destination and return.  */

I don't think the comment really adds anything.

> +
> +  rtx tmp = aarch64_legitimize_pe_coff_symbol (imm, true);
> +

Sorry for pushing personal preference, but I think it's slightly
easier to read without this blank line (following the style used
later in aarch64_legitimize_address).

> +  if (tmp)
> +   

Re: [PATCH v2 2/3] RISC-V: Add Zalrsc and Zaamo testsuite support

2024-06-10 Thread Patrick O'Neill



On 6/7/24 16:04, Jeff Law wrote:



On 6/3/24 3:53 PM, Patrick O'Neill wrote:

Convert testsuite infrastructure to use Zalrsc and Zaamo rather than A.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: Use Zaamo rather 
than A.

* gcc.target/riscv/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Use Zalrsc 
rather

than A.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Use Zaamo 
rather

than A.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add Zaamo option.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Use 
Zalrsc rather

than A.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.
* lib/target-supports.exp: Add testsuite infrastructure support for
Zaamo and Zalrsc.
So there's a lot of whitespace changes going on in target-supports.exp 
that make it harder to find the real changes.


There's always a bit of a judgement call for that kind of thing. This 
one probably goes past would generally recommend, meaning that the 
formatting stuff would be a separate patch.


A reasonable starting point would be if you're not changing the 
function in question, then fixing formatting in it probably should be 
a distinct patch.


You probably should update the docs in sourcebuild.texi for the new 
target-supports tests.


So OK for the trunk (including the whitespace fixes) with a suitable 
change to sourcebuild.texi.


Sorry about that - the whitespace changes snuck in when resolving a 
merge conflict and were unintentional.


I'll post a v3 with the sourcebuild.texi changes and patch 3/3 changes 
later today.


I'll split the target-supports.exp trailing whitespace removal into a 
separate patch after this series lands.


Patrick



jeff


Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-10 Thread Jeff Law




On 6/10/24 8:52 AM, Li, Pan2 wrote:

Not sure if below float eq implement in sail-riscv is useful or not, but looks 
like some special handling for nan, as well as snan.

https://github.com/riscv/sail-riscv/blob/master/c_emulator/SoftFloat-3e/source/f32_eq.c

Yes, but it's symmetrical, which is what we'd want to see.

jeff



Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-10 Thread Jeff Law




On 6/10/24 10:16 AM, Demin Han wrote:

Hi,

I‘m on vacation rencently.
I will return in a few days and summit new patch with the test.

No problem.  Enjoy your vacation, this can certainly wait until you return.

jeff



Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Jeff Law




On 6/10/24 8:49 AM, pan2...@intel.com wrote:

From: Pan Li 

When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
to replace the PHI node.  Unfortunately,  I made a mistake that insert
the gcall to before the last stmt of the bb.  See below gimple,  the PHI
is located at no.1 but we insert the gcall (aka no.9) to the end of
the bb.  Then the use of _9 in no.2 will have no def and will trigger
ICE when verify_ssa.

   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
   2. prephitmp_36 = (char *) _9;
   3. buf.write_base = string_13(D);
   4. buf.write_ptr = string_13(D);
   5. buf.write_end = prephitmp_36;
   6. buf.written = 0;
   7. buf.mode = 3;
   8. _7 = buf.write_end;
   9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb by 
mistake

This patch would like to insert the gcall to before the start of the bb
stmt.  To ensure the possible use of PHI_result will have a def exists.
After this patch the above gimple will be:

   0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
by mistake
   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
   2. prephitmp_36 = (char *) _9;
   3. buf.write_base = string_13(D);
   4. buf.write_ptr = string_13(D);
   5. buf.write_end = prephitmp_36;
   6. buf.written = 0;
   7. buf.mode = 3;
   8. _7 = buf.write_end;

The below test suites are passed for this patch:
* The rv64gcv fully regression test with newlib.
* The rv64gcv build with glibc.
* The x86 regression test with newlib.
* The x86 bootstrap test with newlib.
So the patch looks fine.  I'm just trying to parse the testing.  If you 
did an x86 bootstrap & regression test, you wouldn't be using newlib. 
That would be a native bootstrap & regression test which would use 
whatever C library is already installed on the system.  I'm assuming 
that's what you did.


If my assumption is correct, then this is fine for the trunk.

jeff



Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-10 Thread Demin Han
Hi,

I‘m on vacation rencently.
I will return in a few days and summit new patch with the test.

Regards,
Demin







发件人: Jeff Law 
发送时间: 星期一, 六月 10, 2024 9:49 下午
收件人: Robin Dapp ; Demin Han ; 
钟居哲 ; gcc-patches 
抄送: kito.cheng ; Li, Pan2 
主题: Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern



On 6/10/24 1:33 AM, Robin Dapp wrote:
>> But isn't canonicalization of EQ/NE safe, even for IEEE NaN and +-0.0?
>>
>> target = (a == b) ? x : y
>> target = (a != b) ? y : x
>>
>> Are equivalent, even for IEEE IIRC.
>
> Yes, that should be fine.  My concern was not that we do a
> canonicalization but that we might not do it for some of the
> vector cases.  In particular when one of the operands is wrapped
> in a vec_duplicate and we end up with it first rather than
> second.
>
> My general feeling is that the patch is good but I wasn't entirely
> sure about all cases (in particular in case we transform something
> after expand).  That's why I would have liked to see at least some
> small test cases for it along with the patch (for the combinations
> we don't test yet).
Ah, OK.

Demin, can you some additional test coverage, guided by Robin's concerns
above?

Thanks,
jeff



[PING][PATCH v2] docs: Update function multiversioning documentation

2024-06-10 Thread Andrew Carlotti


On Tue, Apr 30, 2024 at 05:10:45PM +0100, Andrew Carlotti wrote:
> Add target_version attribute to Common Function Attributes and update
> target and target_clones documentation.  Move shared detail and examples
> to the Function Multiversioning page.  Add target-specific details to
> target-specific pages.
> 
> ---
> 
> Changes since v1:
> - Various typo fixes.
> - Reordered content in 'Function multiversioning' section to put 
> implementation
>   details at the end (as suggested in review).
> - Dropped links to outdated wiki page, and a couple of other unhelpful
>   sentences that the previous version preserved.
> 
> I've built and rechecked the info output.  Ok for master?  And is this ok for
> the GCC-14 branch too?
> 
> gcc/ChangeLog:
> 
>   * doc/extend.texi (Common Function Attributes): Update target
>   and target_clones documentation, and add target_version.
>   (AArch64 Function Attributes): Add ACLE reference and list
>   supported features.
>   (PowerPC Function Attributes): List supported features.
>   (x86 Function Attributes): Mention function multiversioning.
>   (Function Multiversioning): Update, and move shared detail here.
> 
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 
> e290265d68d33f86a7e7ee9882cc0fd6bed00143..fefac70b5fffc350bf23db74a8fc88fa3bb99bd5
>  100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -4178,17 +4178,16 @@ and @option{-Wanalyzer-tainted-size}.
>  Multiple target back ends implement the @code{target} attribute
>  to specify that a function is to
>  be compiled with different target options than specified on the
> -command line.  The original target command-line options are ignored.
> -One or more strings can be provided as arguments.
> -Each string consists of one or more comma-separated suffixes to
> -the @code{-m} prefix jointly forming the name of a machine-dependent
> -option.  @xref{Submodel Options,,Machine-Dependent Options}.
> -
> +command line.  One or more strings can be provided as arguments.
>  The @code{target} attribute can be used for instance to have a function
>  compiled with a different ISA (instruction set architecture) than the
> -default.  @samp{#pragma GCC target} can be used to specify target-specific
> -options for more than one function.  @xref{Function Specific Option Pragmas},
> -for details about the pragma.
> +default.
> +
> +The options supported by the @code{target} attribute are specific to each
> +target; refer to @ref{x86 Function Attributes}, @ref{PowerPC Function
> +Attributes}, @ref{ARM Function Attributes}, @ref{AArch64 Function 
> Attributes},
> +@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> +for details.
>  
>  For instance, on an x86, you could declare one function with the
>  @code{target("sse4.1,arch=core2")} attribute and another with
> @@ -4211,39 +4210,26 @@ multiple options is equivalent to separating the 
> option suffixes with
>  a comma (@samp{,}) within a single string.  Spaces are not permitted
>  within the strings.
>  
> -The options supported are specific to each target; refer to @ref{x86
> -Function Attributes}, @ref{PowerPC Function Attributes},
> -@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
> -@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
> -for details.
> +@samp{#pragma GCC target} can be used to specify target-specific
> +options for more than one function.  @xref{Function Specific Option Pragmas},
> +for details about the pragma.
> +
> +On x86, the @code{target} attribute can also be used to create multiple
> +versions of a function, compiled with different target-specific options.
> +@xref{Function Multiversioning} for more details.
>  
>  @cindex @code{target_clones} function attribute
>  @item target_clones (@var{options})
>  The @code{target_clones} attribute is used to specify that a function
> -be cloned into multiple versions compiled with different target options
> -than specified on the command line.  The supported options and restrictions
> -are the same as for @code{target} attribute.
> -
> -For instance, on an x86, you could compile a function with
> -@code{target_clones("sse4.1,avx")}.  GCC creates two function clones,
> -one compiled with @option{-msse4.1} and another with @option{-mavx}.
> -
> -On a PowerPC, you can compile a function with
> -@code{target_clones("cpu=power9,default")}.  GCC will create two
> -function clones, one compiled with @option{-mcpu=power9} and another
> -with the default options.  GCC must be configured to use GLIBC 2.23 or
> -newer in order to use the @code{target_clones} attribute.
> -
> -It also creates a resolver function (see
> -the @code{ifunc} attribute above) that dynamically selects a clone
> -suitable for current architecture.  The resolver is created only if there
> -is a usage of a function with @code{target_clones} attribute.
> -
> -Note that any subsequent call of a function without @code{target_clone}
> 

Re: [PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread Sam James
pan2...@intel.com writes:

> From: Pan Li 
>
> When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
> to replace the PHI node.  Unfortunately,  I made a mistake that insert
> the gcall to before the last stmt of the bb.  See below gimple,  the PHI
> is located at no.1 but we insert the gcall (aka no.9) to the end of
> the bb.  Then the use of _9 in no.2 will have no def and will trigger
> ICE when verify_ssa.
>
>   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>   2. prephitmp_36 = (char *) _9;
>   3. buf.write_base = string_13(D);
>   4. buf.write_ptr = string_13(D);
>   5. buf.write_end = prephitmp_36;
>   6. buf.written = 0;
>   7. buf.mode = 3;
>   8. _7 = buf.write_end;
>   9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb 
> by mistake
>
> This patch would like to insert the gcall to before the start of the bb
> stmt.  To ensure the possible use of PHI_result will have a def exists.
> After this patch the above gimple will be:
>
>   0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb 
> by mistake
>   1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be 
> deleted.
>   2. prephitmp_36 = (char *) _9;
>   3. buf.write_base = string_13(D);
>   4. buf.write_ptr = string_13(D);
>   5. buf.write_end = prephitmp_36;
>   6. buf.written = 0;
>   7. buf.mode = 3;
>   8. _7 = buf.write_end;
>
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test with newlib.
> * The rv64gcv build with glibc.
> * The x86 regression test with newlib.
> * The x86 bootstrap test with newlib.
>
>   PR target/115387
>
> gcc/ChangeLog:
>
>   * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Take
>   the gsi of start_bb instead of last_bb.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/pr115387-1.c: New test.
>   * gcc.target/riscv/pr115387-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/gcc.target/riscv/pr115387-1.c | 35 +
>  gcc/testsuite/gcc.target/riscv/pr115387-2.c | 18 +++
>  gcc/tree-ssa-math-opts.cc   |  2 +-
>  3 files changed, 54 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-2.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> new file mode 100644
> index 000..a1c926977c4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> @@ -0,0 +1,35 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#define PRINTF_CHK 0x34
> +
> +typedef unsigned long uintptr_t;
> +
> +struct __printf_buffer {
> +  char *write_ptr;
> +  int status;
> +};
> +
> +extern void __printf_buffer_init_end (struct __printf_buffer *, char *, char 
> *);
> +
> +void
> +test (char *string, unsigned long maxlen, unsigned mode_flags)
> +{
> +  struct __printf_buffer buf;
> +
> +  if ((mode_flags & PRINTF_CHK) != 0)
> +{
> +  string[0] = '\0';
> +  uintptr_t end;
> +
> +  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
> + end = -1;
> +
> +  __printf_buffer_init_end (, string, (char *) end);
> +}
> +  else
> +__printf_buffer_init_end (, string, (char *) ~(uintptr_t) 0);
> +
> +  *buf.write_ptr = '\0';
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-2.c 
> b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
> new file mode 100644
> index 000..7183bf18dfd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
> @@ -0,0 +1,18 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include 
> +#include 
> +
> +char *
> +test (char *string, size_t maxlen)
> +{
> +  string[0] = '\0';
> +  uintptr_t end;
> +
> +  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
> +end = -1;
> +
> +  return (char *) end;
> +}

This testcases ICEs for me on x86-64 too (without your patch) with just -O2.

Can you move it out of the riscv suite? (I suspect the other fails on x86-64 
too).

> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 173b0366f5e..fbb8e0ea306 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -6102,7 +6102,7 @@ math_opts_dom_walker::after_dom_children (basic_block 
> bb)
>for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);
>  gsi_next ())
>  {
> -  gimple_stmt_iterator gsi = gsi_last_bb (bb);
> +  gimple_stmt_iterator gsi = gsi_start_bb (bb);
>match_unsigned_saturation_add (, psi.phi ());
>  }


signature.asc
Description: PGP signature


[PUSHED] Fix pr115388.c: plain char could be unsigned by default [PR115415]

2024-06-10 Thread Andrew Pinski
This is a simple fix to the testcase as plain `char` could be
unsigned by default on some targets (e.g. aarch64 and powerpc).

Committed as obvious after quick test of the testcase on both aarch64 and 
x86_64.

gcc/testsuite/ChangeLog:

PR testsuite/115415
PR tree-optimization/115388
* gcc.dg/torture/pr115388.c: Use `signed char` directly instead
of plain `char`.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr115388.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr115388.c 
b/gcc/testsuite/gcc.dg/torture/pr115388.c
index c7c902888da..17b3f1bcd90 100644
--- a/gcc/testsuite/gcc.dg/torture/pr115388.c
+++ b/gcc/testsuite/gcc.dg/torture/pr115388.c
@@ -2,7 +2,7 @@
 
 int printf(const char *, ...);
 int a[10], b, c, d[0], h, i, j, k, l;
-char e = -1, g;
+signed char e = -1, g;
 volatile int f;
 static void n() {
   while (e >= 0)
-- 
2.43.0



Re: [PATCH] c++: remove Concepts TS code

2024-06-10 Thread Marek Polacek
On Mon, Jun 10, 2024 at 10:22:11AM -0400, Patrick Palka wrote:
> On Fri, 7 Jun 2024, Marek Polacek wrote:
> > @@ -3940,9 +3936,6 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
> > void* data)
> >  parameter pack (14.6.3), or the type-specifier-seq of a type-id that
> >  is a pack expansion, the invented template parameter is a template
> >  parameter pack.  */
> 
> This comment should be removed too I think.

Removed in my local tree.
 
> > -  if (flag_concepts_ts && ppd->type_pack_expansion_p && is_auto (t)
> 
> (BTW this seems to be the only actual user of type_pack_expansion_p so we
> can in turn remove that field too.)

Oh neat.  I can do that as a follow-up, unless y'all think it should be
part of this patch.  Thanks,

Marek



RE: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-10 Thread Li, Pan2
Not sure if below float eq implement in sail-riscv is useful or not, but looks 
like some special handling for nan, as well as snan.

https://github.com/riscv/sail-riscv/blob/master/c_emulator/SoftFloat-3e/source/f32_eq.c

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, June 10, 2024 9:50 PM
To: Robin Dapp ; Demin Han ; 
钟居哲 ; gcc-patches 
Cc: kito.cheng ; Li, Pan2 
Subject: Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern



On 6/10/24 1:33 AM, Robin Dapp wrote:
>> But isn't canonicalization of EQ/NE safe, even for IEEE NaN and +-0.0?
>>
>> target = (a == b) ? x : y
>> target = (a != b) ? y : x
>>
>> Are equivalent, even for IEEE IIRC.
> 
> Yes, that should be fine.  My concern was not that we do a
> canonicalization but that we might not do it for some of the
> vector cases.  In particular when one of the operands is wrapped
> in a vec_duplicate and we end up with it first rather than
> second.
> 
> My general feeling is that the patch is good but I wasn't entirely
> sure about all cases (in particular in case we transform something
> after expand).  That's why I would have liked to see at least some
> small test cases for it along with the patch (for the combinations
> we don't test yet).
Ah, OK.

Demin, can you some additional test coverage, guided by Robin's concerns 
above?

Thanks,
jeff



[PATCH v1] Widening-Mul: Fix one ICE of gcall insertion for PHI match

2024-06-10 Thread pan2 . li
From: Pan Li 

When enabled the PHI handing for COND_EXPR,  we need to insert the gcall
to replace the PHI node.  Unfortunately,  I made a mistake that insert
the gcall to before the last stmt of the bb.  See below gimple,  the PHI
is located at no.1 but we insert the gcall (aka no.9) to the end of
the bb.  Then the use of _9 in no.2 will have no def and will trigger
ICE when verify_ssa.

  1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
  2. prephitmp_36 = (char *) _9;
  3. buf.write_base = string_13(D);
  4. buf.write_ptr = string_13(D);
  5. buf.write_end = prephitmp_36;
  6. buf.written = 0;
  7. buf.mode = 3;
  8. _7 = buf.write_end;
  9. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to last bb by 
mistake

This patch would like to insert the gcall to before the start of the bb
stmt.  To ensure the possible use of PHI_result will have a def exists.
After this patch the above gimple will be:

  0. _9 = .SAT_ADD (string.0_2, maxlen_15(D));   // Insert gcall to start bb by 
mistake
  1. # _9 = PHI <_3(4), 18446744073709551615(3)> // The PHI node to be deleted.
  2. prephitmp_36 = (char *) _9;
  3. buf.write_base = string_13(D);
  4. buf.write_ptr = string_13(D);
  5. buf.write_end = prephitmp_36;
  6. buf.written = 0;
  7. buf.mode = 3;
  8. _7 = buf.write_end;

The below test suites are passed for this patch:
* The rv64gcv fully regression test with newlib.
* The rv64gcv build with glibc.
* The x86 regression test with newlib.
* The x86 bootstrap test with newlib.

PR target/115387

gcc/ChangeLog:

* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Take
the gsi of start_bb instead of last_bb.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: New test.
* gcc.target/riscv/pr115387-2.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/pr115387-1.c | 35 +
 gcc/testsuite/gcc.target/riscv/pr115387-2.c | 18 +++
 gcc/tree-ssa-math-opts.cc   |  2 +-
 3 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr115387-2.c

diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c 
b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
new file mode 100644
index 000..a1c926977c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115387-1.c
@@ -0,0 +1,35 @@
+/* Test there is no ICE when compile.  */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#define PRINTF_CHK 0x34
+
+typedef unsigned long uintptr_t;
+
+struct __printf_buffer {
+  char *write_ptr;
+  int status;
+};
+
+extern void __printf_buffer_init_end (struct __printf_buffer *, char *, char 
*);
+
+void
+test (char *string, unsigned long maxlen, unsigned mode_flags)
+{
+  struct __printf_buffer buf;
+
+  if ((mode_flags & PRINTF_CHK) != 0)
+{
+  string[0] = '\0';
+  uintptr_t end;
+
+  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
+   end = -1;
+
+  __printf_buffer_init_end (, string, (char *) end);
+}
+  else
+__printf_buffer_init_end (, string, (char *) ~(uintptr_t) 0);
+
+  *buf.write_ptr = '\0';
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-2.c 
b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
new file mode 100644
index 000..7183bf18dfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr115387-2.c
@@ -0,0 +1,18 @@
+/* Test there is no ICE when compile.  */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include 
+#include 
+
+char *
+test (char *string, size_t maxlen)
+{
+  string[0] = '\0';
+  uintptr_t end;
+
+  if (__builtin_add_overflow ((uintptr_t) string, maxlen, ))
+end = -1;
+
+  return (char *) end;
+}
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 173b0366f5e..fbb8e0ea306 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -6102,7 +6102,7 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
   for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);
 gsi_next ())
 {
-  gimple_stmt_iterator gsi = gsi_last_bb (bb);
+  gimple_stmt_iterator gsi = gsi_start_bb (bb);
   match_unsigned_saturation_add (, psi.phi ());
 }
 
-- 
2.34.1



Re: [PATCH] htdocs/contribute.html: correct disctinct->distinct spelling

2024-06-10 Thread Gerald Pfeifer
On Sat, 2 Dec 2023, Jonny Grant wrote:
> Correct a spelling mistake this page:
> https://gcc.gnu.org/contribute.html

Superseded by 

  Author: Jonathan Grant 
  Date:   Sat Jun 8 21:26:04 2024 +0200

*: Correct spelling

which I just pushed.

Gerald


Re: [PATCH] htdocs: correct spelling and use https in examples

2024-06-10 Thread Gerald Pfeifer
On Wed, 6 Dec 2023, Jonny Grant wrote:
> ChangeLog:
> 
>   htdocs: correct spelling and use https in examples.

I noticed this hasn't been applied yet, so went ahead and pushed (nearly 
all of) it.

Just the "use https in examples" part feels orthogonal, so better a 
separate issue, and I'm not sure we should be making that change?

Thank you, and sorry for the delay. Below the patch as pushed (with proper 
attribution).

Gerald


>From 3f37542935165c3dc74f41d4c11c3621f8386b59 Mon Sep 17 00:00:00 2001
From: Jonathan Grant 
Date: Sat, 8 Jun 2024 21:26:04 +0200
Subject: [PATCH] *: Correct spelling

---
 htdocs/bugs/management.html   | 2 +-
 htdocs/codingrationale.html   | 2 +-
 htdocs/contribute.html| 6 +++---
 htdocs/gccmission.html| 2 +-
 htdocs/projects/cfg.html  | 2 +-
 htdocs/projects/cli.html  | 2 +-
 htdocs/projects/cxx-reflection/index.html | 2 +-
 htdocs/projects/optimize.html | 6 +++---
 htdocs/projects/tree-profiling.html   | 2 +-
 htdocs/testing/index.html | 2 +-
 10 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/htdocs/bugs/management.html b/htdocs/bugs/management.html
index 28dfa76a..b2bb740e 100644
--- a/htdocs/bugs/management.html
+++ b/htdocs/bugs/management.html
@@ -64,7 +64,7 @@ perspective, these are the relevant ones and what their 
values mean:
 The status and resolution fields define and track the life cycle of a
 bug.  In addition to their https://gcc.gnu.org/bugzilla/page.cgi?id=fields.html;>regular
-descriptions, we also use two adition status values:
+descriptions, we also use two additional status values:
 
 
 
diff --git a/htdocs/codingrationale.html b/htdocs/codingrationale.html
index 6cc76885..c51c9da4 100644
--- a/htdocs/codingrationale.html
+++ b/htdocs/codingrationale.html
@@ -155,7 +155,7 @@ Wide use of implicit conversion can cause some very 
surprising results.
 
 
 C++03 has no explicit conversion operators,
-and hence using them cannot avoid suprises.
+and hence using them cannot avoid surprises.
 Wait for C++11.
 
 
diff --git a/htdocs/contribute.html b/htdocs/contribute.html
index e8137edc..7d85d885 100644
--- a/htdocs/contribute.html
+++ b/htdocs/contribute.html
@@ -300,7 +300,7 @@ followed by a colon.  For example,
 
 
 Some large components may be subdivided into sub-components.  If
-the subcomponent name is not disctinct in its own right, you can use the
+the subcomponent name is not distinct in its own right, you can use the
 form component/sub-component:.
 
 Series identifier
@@ -330,7 +330,7 @@ the commit message so that Bugzilla will correctly notice 
the
 commit.  If your patch relates to two bugs, then write
 [PRn, PRm].  For multiple
 bugs, just cite the most relevant one in the summary and use an
-elipsis instead of the second, or subsequent PR numbers; list all the
+ellipsis instead of the second, or subsequent PR numbers; list all the
 related PRs in the body of the commit message in the normal way.
 
 It is not necessary to cite bugs that are closed as duplicates of
@@ -355,7 +355,7 @@ together.
 If you submit a new version of a patch series, then you should
 start a new email thread (don't reply to the original patch series).
 This avoids email threads becoming confused between discussions of the
-first and subsequent revisions of the patch set.  Your cover leter
+first and subsequent revisions of the patch set.  Your cover letter
 (0/nnn) should explain clearly what has been changed between
 the two patch series.  Also state if some of the patches are unchanged
 between revisions; this saves maintainers having to re-review the
diff --git a/htdocs/gccmission.html b/htdocs/gccmission.html
index 58a12755..1124fe9f 100644
--- a/htdocs/gccmission.html
+++ b/htdocs/gccmission.html
@@ -55,7 +55,7 @@ GCC.
  Patches will be considered equally based on their
  technical merits.
  All individuals and companies are welcome to contribute
- as long as they accept the groundrules.
+ as long as they accept the ground rules.
  
 Open mailing lists.
 Developer friendly tools and procedures (i.e. [version control], multiple
diff --git a/htdocs/projects/cfg.html b/htdocs/projects/cfg.html
index b1ee1f34..b695766e 100644
--- a/htdocs/projects/cfg.html
+++ b/htdocs/projects/cfg.html
@@ -83,7 +83,7 @@ to peel more than one iteration.
 
 The current loop optimizer uses information passed by the front end
 to discover loop constructs to simplify flow analysis.
-It is difficult to keep the information up-to-date and nowday
+It is difficult to keep the information up-to-date and nowadays
 it is easy to implement the loop discovery code on CFG.
 
 
diff --git a/htdocs/projects/cli.html b/htdocs/projects/cli.html
index 47ddb362..4f0baa0b 100644
--- a/htdocs/projects/cli.html
+++ b/htdocs/projects/cli.html
@@ -152,7 +152,7 @@ front end and the CLI binutils (both Mono based and DotGnu 
based) .
 
 The CLI back end
 

[PATCH v3 0/2] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]

2024-06-10 Thread Torbjörn SVENSSON


Hi,

Changes in v3:
Droped special case for thumb1_extendqisi2 as it's only thumb1_extendhisi2 that
causes problem for gen_rtx_SIGN_EXTEND.

Changes in v2:
Updated the patch to also fix the Cortex-M55 issue reported in PR115253 and
updated the commit message to mention the PR number.

Initial issue reported at https://linaro.atlassian.net/browse/GNU-1205.

Ok for these branches?

- releases/gcc-11
- releases/gcc-12
- releases/gcc-13
- releases/gcc-14
- trunk

Kind regards,
Torbjörn and Yvan




[PATCH v3 1/2] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]

2024-06-10 Thread Torbjörn SVENSSON
Properly handle zero and sign extension for Armv8-M.baseline as
Cortex-M23 can have the security extension active.
Currently, there is an internal compiler error on Cortex-M23 for the
epilog processing of sign extension.

This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.

gcc/ChangeLog:

PR target/115253
* config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
Sign extend for Thumb1.
(thumb1_expand_prologue): Add zero/sign extend.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 gcc/config/arm/arm.cc | 71 ++-
 1 file changed, 63 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index ea0c963a4d6..e7b4caf1083 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -19220,17 +19220,22 @@ cmse_nonsecure_call_inline_register_clear (void)
  || TREE_CODE (ret_type) == BOOLEAN_TYPE)
  && known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 4))
{
- machine_mode ret_mode = TYPE_MODE (ret_type);
+ rtx ret_reg = gen_rtx_REG (TYPE_MODE (ret_type), R0_REGNUM);
+ rtx si_reg = gen_rtx_REG (SImode, R0_REGNUM);
  rtx extend;
  if (TYPE_UNSIGNED (ret_type))
-   extend = gen_rtx_ZERO_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
+   extend = gen_rtx_SET (si_reg, gen_rtx_ZERO_EXTEND (SImode,
+  ret_reg));
  else
-   extend = gen_rtx_SIGN_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
- emit_insn_after (gen_rtx_SET (gen_rtx_REG (SImode, R0_REGNUM),
-extend), insn);
-
+   /* Signed-extension is a special case because of
+  thumb1_extendhisi2.  */
+   if (TARGET_THUMB1
+   && known_ge (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))
+ extend = gen_thumb1_extendhisi2 (si_reg, ret_reg);
+   else
+ extend = gen_rtx_SET (si_reg, gen_rtx_SIGN_EXTEND (SImode,
+ret_reg));
+ emit_insn_after (extend, insn);
}
 
 
@@ -27250,6 +27255,56 @@ thumb1_expand_prologue (void)
   live_regs_mask = offsets->saved_regs_mask;
   lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
 
+  /* The AAPCS requires the callee to widen integral types narrower
+ than 32 bits to the full width of the register; but when handling
+ calls to non-secure space, we cannot trust the callee to have
+ correctly done so.  So forcibly re-widen the result here.  */
+  if (IS_CMSE_ENTRY (func_type))
+{
+  function_args_iterator args_iter;
+  CUMULATIVE_ARGS args_so_far_v;
+  cumulative_args_t args_so_far;
+  bool first_param = true;
+  tree arg_type;
+  tree fndecl = current_function_decl;
+  tree fntype = TREE_TYPE (fndecl);
+  arm_init_cumulative_args (_so_far_v, fntype, NULL_RTX, fndecl);
+  args_so_far = pack_cumulative_args (_so_far_v);
+  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
+   {
+ rtx arg_rtx;
+
+ if (VOID_TYPE_P (arg_type))
+   break;
+
+ function_arg_info arg (arg_type, /*named=*/true);
+ if (!first_param)
+   /* We should advance after processing the argument and pass
+  the argument we're advancing past.  */
+   arm_function_arg_advance (args_so_far, arg);
+ first_param = false;
+ arg_rtx = arm_function_arg (args_so_far, arg);
+ gcc_assert (REG_P (arg_rtx));
+ if ((TREE_CODE (arg_type) == INTEGER_TYPE
+ || TREE_CODE (arg_type) == ENUMERAL_TYPE
+ || TREE_CODE (arg_type) == BOOLEAN_TYPE)
+ && known_lt (GET_MODE_SIZE (GET_MODE (arg_rtx)), 4))
+   {
+ rtx res_reg = gen_rtx_REG (SImode, REGNO (arg_rtx));
+ if (TYPE_UNSIGNED (arg_type))
+   emit_set_insn (res_reg, gen_rtx_ZERO_EXTEND (SImode, arg_rtx));
+ else
+   /* Signed-extension is a special case because of
+  thumb1_extendhisi2.  */
+   if (known_ge (GET_MODE_SIZE (GET_MODE (arg_rtx)), 2))
+ emit_insn (gen_thumb1_extendhisi2 (res_reg, arg_rtx));
+   else
+ emit_set_insn (res_reg,
+gen_rtx_SIGN_EXTEND (SImode, arg_rtx));
+   }
+   }
+}
+
   /* Extract a mask of the ones we can give to the Thumb's push instruction.  
*/
   l_mask = live_regs_mask & 0x40ff;
   /* Then count how many other high registers will need to be pushed.  */
-- 
2.25.1



[PATCH v3 2/2] testsuite: Fix expand-return CMSE test for Armv8.1-M [PR115253]

2024-06-10 Thread Torbjörn SVENSSON
For Armv8.1-M, the clearing of the registers is handled differently than
for Armv8-M, so update the test case accordingly.

gcc/testsuite/ChangeLog:

PR target/115253
* gcc.target/arm/cmse/extend-return.c: Update test case
condition for Armv8.1-M.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 .../gcc.target/arm/cmse/extend-return.c   | 62 +--
 1 file changed, 56 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
index 081de0d699f..2288d166bd3 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-mcmse -fshort-enums" } */
+/* ARMv8-M expectation with target { ! arm_cmse_clear_ok }.  */
+/* ARMv8.1-M expectation with target arm_cmse_clear_ok.  */
 /* { dg-final { check-function-bodies "**" "" "" } } */
 
 #include 
@@ -20,7 +22,15 @@ typedef enum offset __attribute__ ((cmse_nonsecure_call)) 
ns_enum_foo_t (void);
 typedef bool __attribute__ ((cmse_nonsecure_call)) ns_bool_foo_t (void);
 
 /*
-**unsignNonsecure0:
+**unsignNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxtbr0, r0
+** ...
+*/
+/*
+**unsignNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** uxtbr0, r0
@@ -32,7 +42,15 @@ unsigned char unsignNonsecure0 (ns_unsign_foo_t * ns_foo_p)
 }
 
 /*
-**signNonsecure0:
+**signNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** sxtbr0, r0
+** ...
+*/
+/*
+**signNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** sxtbr0, r0
@@ -44,7 +62,15 @@ signed char signNonsecure0 (ns_sign_foo_t * ns_foo_p)
 }
 
 /*
-**shortUnsignNonsecure0:
+**shortUnsignNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxthr0, r0
+** ...
+*/
+/*
+**shortUnsignNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** uxthr0, r0
@@ -56,7 +82,15 @@ unsigned short shortUnsignNonsecure0 (ns_short_unsign_foo_t 
* ns_foo_p)
 }
 
 /*
-**shortSignNonsecure0:
+**shortSignNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** sxthr0, r0
+** ...
+*/
+/*
+**shortSignNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** sxthr0, r0
@@ -68,7 +102,15 @@ signed short shortSignNonsecure0 (ns_short_sign_foo_t * 
ns_foo_p)
 }
 
 /*
-**enumNonsecure0:
+**enumNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxtbr0, r0
+** ...
+*/
+/*
+**enumNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** uxtbr0, r0
@@ -80,7 +122,15 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
(ns_enum_foo_t * ns_foo_p)
 }
 
 /*
-**boolNonsecure0:
+**boolNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxtbr0, r0
+** ...
+*/
+/*
+**boolNonsecure0: { target { ! arm_cmse_clear_ok } }
 ** ...
 ** bl  __gnu_cmse_nonsecure_call
 ** uxtbr0, r0
-- 
2.25.1



Re: [PATCH v2] vect: Merge loop mask and cond_op mask in fold-left, reduction [PR115382].

2024-06-10 Thread Richard Sandiford
Robin Dapp  writes:
>> Actually, as Richard mentioned in the PR, it would probably be better
>> to use prepare_vec_mask instead.  It should work in this context too
>> and would avoid redundant double masking.
>
> Attached is v2 that uses prepare_vec_mask.
>
> Regtested on riscv64 and armv8.8-a+sve via qemu.
> Bootstrap and regtest running on x86 and aarch64.
>
> Regards
>  Robin
>
>
> Currently we discard the cond-op mask when the loop is fully masked
> which causes wrong code in
> gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> when compiled with
> -O3 -march=cascadelake --param vect-partial-vector-usage=2.
>
> This patch ANDs both masks.
>
> gcc/ChangeLog:
>
>   PR tree-optimization/115382
>
>   * tree-vect-loop.cc (vectorize_fold_left_reduction): Merge loop
>   mask and cond-op mask.
> ---
>  gcc/tree-vect-loop.cc  | 10 +-
>  gcc/tree-vect-stmts.cc |  2 +-
>  gcc/tree-vectorizer.h  |  2 ++
>  3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 028692614bb..c9b037b8daf 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7215,7 +7215,15 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>tree len = NULL_TREE;
>tree bias = NULL_TREE;
>if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> - mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
> i);
> + {
> +   tree loop_mask = vect_get_loop_mask (loop_vinfo, gsi, masks,
> +vec_num, vectype_in, i);
> +   if (is_cond_op)
> + mask = prepare_vec_mask (loop_vinfo, TREE_TYPE (loop_mask),
> +  loop_mask, vec_opmask[i], gsi);
> +   else
> + mask = loop_mask;
> + }
>else if (is_cond_op)
>   mask = vec_opmask[i];
>if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 5098b7fab6a..124a3462753 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1643,7 +1643,7 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
> MASK_TYPE is the type of both masks.  If new statements are needed,
> insert them before GSI.  */
>  
> -static tree
> +tree
>  prepare_vec_mask (loop_vec_info loop_vinfo, tree mask_type, tree loop_mask,
> tree vec_mask, gimple_stmt_iterator *gsi)
>  {
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 97ec9c341e7..1f87c6c8ca2 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2508,6 +2508,8 @@ extern void vect_free_slp_tree (slp_tree);
>  extern bool compatible_calls_p (gcall *, gcall *);
>  extern int vect_slp_child_index_for_operand (const gimple *, int op, bool);
>  
> +extern tree prepare_vec_mask (loop_vec_info, tree, tree, tree, 
> gimple_stmt_iterator *);

Nit: long line.

OK with that fixed, thanks.

Richard

> +
>  /* In tree-vect-patterns.cc.  */
>  extern void
>  vect_mark_pattern_stmts (vec_info *, stmt_vec_info, gimple *, tree);


Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-10 Thread Jeff Law




On 6/10/24 1:33 AM, Robin Dapp wrote:

But isn't canonicalization of EQ/NE safe, even for IEEE NaN and +-0.0?

target = (a == b) ? x : y
target = (a != b) ? y : x

Are equivalent, even for IEEE IIRC.


Yes, that should be fine.  My concern was not that we do a
canonicalization but that we might not do it for some of the
vector cases.  In particular when one of the operands is wrapped
in a vec_duplicate and we end up with it first rather than
second.

My general feeling is that the patch is good but I wasn't entirely
sure about all cases (in particular in case we transform something
after expand).  That's why I would have liked to see at least some
small test cases for it along with the patch (for the combinations
we don't test yet).

Ah, OK.

Demin, can you some additional test coverage, guided by Robin's concerns 
above?


Thanks,
jeff



[to-be-committed][RISC-V] Generate bclr more often for rv64

2024-06-10 Thread Jeff Law
Another of Raphael's patches to improve our ability to safely generate a 
Zbs instruction, bclr in this instance.


In this case we have something like ~(1 << N) & C where N is variable, 
but C is a constant.  If C has 33 or more leading zeros, then no matter 
what bit we clear via bclr, the result will always have at least bits 
31..63 clear.  So we don't have to worry about any of the extension 
issues with SI objects in rv64.


Odds are this was seen in spec at some point by the RAU team, thus 
leading to Raphael's pattern.


Anyway, this has been through Ventana's CI system in the past.  I'll 
wait for it to work through upstream pre-commit CI before taking further 
action, but the plan is to commit after successful CI run.


Jeff



gcc/

* config/riscv/bitmanip.md ((~1 << N) & C): New splitter.

gcc/testsuite/

* gcc.target/riscv/zbs-ext.c: New test.


diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 6559d4d6950..4361be1c265 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -784,6 +784,23 @@ (define_insn_and_split "*bclridisi_nottwobits"
 }
 [(set_attr "type" "bitmanip")])
 
+;; An outer AND with a constant where bits 31..63 are 0 can be seen as
+;; a virtual zero extension from 31 to 64 bits.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+(and:DI (not:DI (subreg:DI
+ (ashift:SI (const_int 1)
+(match_operand:QI 1 "register_operand")) 0))
+(match_operand:DI 2 "arith_operand")))
+   (clobber (match_operand:DI 3 "register_operand"))]
+  "TARGET_64BIT && TARGET_ZBS
+   && clz_hwi (INTVAL (operands[2])) >= 33"
+  [(set (match_dup 3)
+(match_dup 2))
+   (set (match_dup 0)
+ (and:DI (rotate:DI (const_int -2) (match_dup 1))
+ (match_dup 3)))])
+
 (define_insn "*binv"
   [(set (match_operand:X 0 "register_operand" "=r")
(xor:X (ashift:X (const_int 1)
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-ext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-ext.c
new file mode 100644
index 000..65f42545b5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-ext.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bclr (const uint32_t i)
+{
+  uint64_t checks = 10;
+  checks &= ~(1U << i);
+  return checks;
+}
+
+/* { dg-final { scan-assembler-times "bclr\t" 1 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */


[PATCH v2] vect: Merge loop mask and cond_op mask in fold-left, reduction [PR115382].

2024-06-10 Thread Robin Dapp
> Actually, as Richard mentioned in the PR, it would probably be better
> to use prepare_vec_mask instead.  It should work in this context too
> and would avoid redundant double masking.

Attached is v2 that uses prepare_vec_mask.

Regtested on riscv64 and armv8.8-a+sve via qemu.
Bootstrap and regtest running on x86 and aarch64.

Regards
 Robin


Currently we discard the cond-op mask when the loop is fully masked
which causes wrong code in
gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
when compiled with
-O3 -march=cascadelake --param vect-partial-vector-usage=2.

This patch ANDs both masks.

gcc/ChangeLog:

PR tree-optimization/115382

* tree-vect-loop.cc (vectorize_fold_left_reduction): Merge loop
mask and cond-op mask.
---
 gcc/tree-vect-loop.cc  | 10 +-
 gcc/tree-vect-stmts.cc |  2 +-
 gcc/tree-vectorizer.h  |  2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 028692614bb..c9b037b8daf 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7215,7 +7215,15 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   tree len = NULL_TREE;
   tree bias = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-   mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
+   {
+ tree loop_mask = vect_get_loop_mask (loop_vinfo, gsi, masks,
+  vec_num, vectype_in, i);
+ if (is_cond_op)
+   mask = prepare_vec_mask (loop_vinfo, TREE_TYPE (loop_mask),
+loop_mask, vec_opmask[i], gsi);
+ else
+   mask = loop_mask;
+   }
   else if (is_cond_op)
mask = vec_opmask[i];
   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5098b7fab6a..124a3462753 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1643,7 +1643,7 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
MASK_TYPE is the type of both masks.  If new statements are needed,
insert them before GSI.  */
 
-static tree
+tree
 prepare_vec_mask (loop_vec_info loop_vinfo, tree mask_type, tree loop_mask,
  tree vec_mask, gimple_stmt_iterator *gsi)
 {
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 97ec9c341e7..1f87c6c8ca2 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2508,6 +2508,8 @@ extern void vect_free_slp_tree (slp_tree);
 extern bool compatible_calls_p (gcall *, gcall *);
 extern int vect_slp_child_index_for_operand (const gimple *, int op, bool);
 
+extern tree prepare_vec_mask (loop_vec_info, tree, tree, tree, 
gimple_stmt_iterator *);
+
 /* In tree-vect-patterns.cc.  */
 extern void
 vect_mark_pattern_stmts (vec_info *, stmt_vec_info, gimple *, tree);
-- 
2.45.1


Re: [PATCH] Rearrange SLP nodes with duplicate statements. [PR98138]

2024-06-10 Thread Richard Biener
On Mon, 10 Jun 2024, Manolis Tsamis wrote:

> On Wed, Jun 5, 2024 at 11:07 AM Richard Biener  wrote:
> >
> > On Tue, 4 Jun 2024, Manolis Tsamis wrote:
> >
> > > This change adds a function that checks for SLP nodes with multiple 
> > > occurrences
> > > of the same statement (e.g. {A, B, A, B, ...}) and tries to rearrange the 
> > > node
> > > so that there are no duplicates. A vec_perm is then introduced to 
> > > recreate the
> > > original ordering. These duplicates can appear due to how two_operators 
> > > nodes
> > > are handled, and they prevent vectorization in some cases.
> >
> > So the trick is that when we have two operands we elide duplicate lanes
> > so we can do discovery for a single combined operand instead which we
> > then decompose into the required two again.  That's a nice one.
> >
> > But as implemented this will fail SLP discovery if the combined operand
> > fails discovery possibly because of divergence in downstream defs.  That
> > is, it doesn't fall back to separate discovery.  I suspect the situation
> > of duplicate lanes isn't common but then I would also suspect that
> > divergence _is_ common.
> >
> That's a good point; I checked out and at least for the x264 testcase
> provided SLP discovery succeeds in both cases but in one case
> vectorization fails later on due to the unsupported load permutations
> among others.
> I think that's what Tamar also mentioned and it makes it hard to
> decide whether to apply the pattern based on if discovery fails.
> 
> > The discovery code is already quite complex with the way it possibly
> > swaps operands of lanes, fitting in this as another variant to try (first)
> > is likely going to be a bit awkward.  A way out might be to split the
> > function or to make the re-try in the caller which could indicate whether
> > to apply this pattern trick or not.  That said - can you try to get
> > data on how often the trick applies and discovery succeeds and how
> > often discovery fails but discovery would suceed without applying the
> > pattern (say, on SPEC)?
> >
> I checked out SPEC and this pattern only triggers on x264 and in that
> case discovery succeeds. So we don't have any data on the pattern
> applying but discovery failing.
> 
> > I also suppose instead of hardcoding three patterns for a fixed
> > size it should be possible to see there's
> > only (at most) half unique lanes in both operands (and one less in one
> > operand if the number of lanes is odd) and compute the un-swizzling lane
> > permutes during this discovery, removing the need of the explicit enum
> > and open-coding each case?
> >
> Yes, that's a fair point. I will change that in the next iteration.
> 
> > Another general note is that trying (and then undo on fail) such ticks
> > eats at the discovery limit we have in place to avoid exponential run-off
> > in exactly this degenerate cases.
> >
> 
> So, most importantly, the points you and Tamar mentioned got me
> thinking about the transformation again, why it is useful and when it
> applies.
> In this initial implementation I tried to make this independant from
> the two_operators logic and apply it when possible, which brings up
> all these issues about discovery and usefulness of the pattern in
> general.
> E.g. If we had just [a, b, a, b] + [c, d, c, d] without two_operators
> I sort of doubt it would be worth it to apply the transformation in
> most cases (except of course if it enables vectorization, but as I
> understand it it is hard to tell when that happens).
> On the other hand, if we know that we're dealing with two_operators
> nodes then the argument changes, as we know that we'll duplicate these
> nodes.
> 
> In turn, it may be best to try to see this as a 'two_operators
> lowering strategy' improvement instead of a generic rearrangement
> pattern.
> Specifically for x264, we're given code like
> 
> int t0 = s0 + s1;
> int t1 = s0 - s1;
> int t2 = s2 + s3;
> int t3 = s2 - s3;
> 
> and currently we lower that to VEC_PERM<(A + B), (A - B)>(...) with A
> = [s0, s0, s2, s2], B = [s1, s1, s3, s3] which doesn't work very well
> (due to element duplication).
> With this patch we do VEC_PERM<(A + B), (A - B)>(...) with A =
> VEC_PERM(...), B = VEC_PERM(...),  C = [s0, s1, s2, s3]
> instead which works good.

So I'm not sure I buy the argument that [a, b, a, b] + [c, d, c, d]
is much different from the two-operators version.
[a, b, a, b] + [c, d, c, d] is one of the variants (the plus) we
discover.

Isn't this really about us stupidly trying to force the use of
a larger vector rather than doing the two-operator handling by
doing

VEC_PERM <{s0, s2} + {s1, s3}, {s0, s2} - {s1, s3}, { 0, 2, 1, 3 }>

and thus recursing with smaller group size and then "interleaving"
the result?  Which might only work well in exactly the case where
we have the same number of + and -.

Of course in both forms 'A' and 'B' (or the smaller vectors) should
be SLP nodes re-used.

> But it is obvious that there are other strategies to lower 

Re: [PATCH v2 1/2] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]

2024-06-10 Thread Andre Vieira (lists)

Hi,

So, you talk about gen_thumb1_extendhisi2, but there is also 
gen_thumb1_extendqisi2. Will it actually be cleaner if the block is 
indented one level?
The comment can be added in the "if (TARGET_THUMB1)" block regardless to 
indicate that gen_rtx_SIGN_EXTEND can't be used.




gen_rtx_SIGN_EXTEND (I see I used wrong caps above before sorry!) will 
work for the case of QImode -> SImode for Thumb1. The reason it doesn't 
work for HImode -> SImode in thumb1 is because thumb1_extendhisi2 uses a 
scratch register, see:

(define_insn "thumb1_extendhisi2"
  [(set (match_operand:SI 0 "register_operand" "=l,l")
(sign_extend:SI (match_operand:HI 1 "nonimmediate_operand" "l,m")))
   (clobber (match_scratch:SI 2 "=X,l"))]

 meaning that the pattern generated with gen_rtx_SIGN_EXTEND:
[(set (SImode ...) (sign_extend:SImode (HImode))]
Does not match this and there are no other thumb1 patterns that match 
this either, so the compiler ICEs. For thumb1_extendqisi2 the pattern 
doesn't have the scratch register so a:

 gen_rtx_SET (, gen_rtx_SIGN_EXTEND (SImode, ))
will generate a rtl pattern that will match thumb1_extendqisi2.

Hope that makes it clearer.



Whit the patch you have in mind, will it be possible to call 
gen_rtx_SIGN_EXTEND  for THUMB1 too? Or do we need to keep calling 
thumb1_extendhisi2 and thumb1_extendqisi2?


With patch I have in mind, and will post soon, you could use 
gen_rtx_SIGN_EXTEND for HImode in thumb1, like I explained before for 
QImode its already possible.


That series will have another patch that also removes all calls to 
gen_thumb1_extend* and replaces them with gen_rtx_SET + 
gen_rtx_SIGN_EXTEND and renames the "thumb1_extend{hisi2,qisi1} patterns 
to "*thumb1_extend{hisi2,qisi2}".


However, that patch we won't backport, but yours we should hence why 
it's worth having your patch with the thumb1_extendhisi2 workaround.


Thanks,

Andre


Re: [PATCH] internal-fn: Force to reg if operand doesn't match.

2024-06-10 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Jun 10, 2024 at 9:35 AM Robin Dapp  wrote:
>>
>> Hi,
>>
>> despite looking good on cfarm185 and Linaro's pre-commit CI
>> gcc-15-638-g7ca35f2e430 now appears to have caused several
>> regressions on arm-eabi cortex-m55 as found by Linaro's CI:
>>
>> https://linaro.atlassian.net/browse/GNU-1252
>>
>> I'm assuming this target is not tested as regularly and thus
>> the failures went unnoticed until now.
>>
>> So it looks like we do need the insn_operand_matches after all?
>
> But why does expand_vec_cond_optab_fn get away without?
> (note we want to get rid of that variant)
>
> Almost no other expander checks this either, though some
> can_* functions validate.  It's not exactly clear to me whether
> we are just lucky and really always need to validate or whether
> it's a bug in the target?

Sounds like a bug in the target (although I'm not sure from a quick
glance what it would be).

expand_insn is responsible for making sure that operands satisfy
predicates.  We shouldn't need to enforce the predicates beforehand.

Thanks,
Richard

>
> Richard.
>
>> This patch only forces to register if the respective operands
>> do not already match.
>>
>> Bootstrap and regtest on aarch64 and x86 in progress.
>> Regtested on riscv64.
>>
>> Regards
>>  Robin
>>
>> gcc/ChangeLog:
>>
>> * internal-fn.cc (expand_vec_cond_mask_optab_fn): Only force to
>> reg if operand does not already match.
>> ---
>>  gcc/internal-fn.cc | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 4948b48bde8..fa85fa69f5a 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -3162,7 +3162,13 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall 
>> *stmt, convert_optab optab)
>>gcc_assert (icode != CODE_FOR_nothing);
>>
>>mask = expand_normal (op0);
>> +  if (!insn_operand_matches (icode, 3, mask))
>> +mask = force_reg (mask_mode, mask);
>> +
>>rtx_op1 = expand_normal (op1);
>> +  if (!insn_operand_matches (icode, 1, rtx_op1))
>> +rtx_op1 = force_reg (mode, rtx_op1);
>> +
>>rtx_op2 = expand_normal (op2);
>>
>>rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>> --
>> 2.45.1


Re: [PATCH v2 1/2] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]

2024-06-10 Thread Torbjorn SVENSSON

Hi Andre,

Thanks for the review!
Please see my questions below.

On 2024-06-10 12:37, Andre Vieira (lists) wrote:

Hi Torbjorn,

Thanks for this, I have some comments below.

On 07/06/2024 09:56, Torbjörn SVENSSON wrote:

Properly handle zero and sign extension for Armv8-M.baseline as
Cortex-M23 can have the security extension active.
Currently, there is a internal compiler error on Cortex-M23 for the
epilog processing of sign extension.

This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.

gcc/ChangeLog:

PR target/115253
* config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
Sign extend for Thumb1.
(thumb1_expand_prologue): Add zero/sign extend.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
  gcc/config/arm/arm.cc | 68 ++-
  1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index ea0c963a4d6..d1bb173c135 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -19220,17 +19220,23 @@ cmse_nonsecure_call_inline_register_clear 
(void)

    || TREE_CODE (ret_type) == BOOLEAN_TYPE)
    && known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 4))
  {
-  machine_mode ret_mode = TYPE_MODE (ret_type);
+  rtx ret_mode = gen_rtx_REG (TYPE_MODE (ret_type), R0_REGNUM);
+  rtx si_mode = gen_rtx_REG (SImode, R0_REGNUM);


I'd rename ret_mode and si_mode to ret_reg and si_reg, so its clear they 
are registers and not actually mode types.


Okay, will be changed before push and/or a V3 of the patches.


    rtx extend;
    if (TYPE_UNSIGNED (ret_type))
-    extend = gen_rtx_ZERO_EXTEND (SImode,
-  gen_rtx_REG (ret_mode, R0_REGNUM));
+    extend = gen_rtx_SET (si_mode, gen_rtx_ZERO_EXTEND (SImode,
+    ret_mode));
+  else if (TARGET_THUMB1)
+    {
+  if (known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))
+    extend = gen_thumb1_extendqisi2 (si_mode, ret_mode);
+  else
+    extend = gen_thumb1_extendhisi2 (si_mode, ret_mode);
+    }
    else
-    extend = gen_rtx_SIGN_EXTEND (SImode,
-  gen_rtx_REG (ret_mode, R0_REGNUM));
-  emit_insn_after (gen_rtx_SET (gen_rtx_REG (SImode, R0_REGNUM),
- extend), insn);
-
+    extend = gen_rtx_SET (si_mode, gen_rtx_SIGN_EXTEND (SImode,
+    ret_mode));
+  emit_insn_after (extend, insn);
  }


Using gen_rtx_SIGN_EXTEND should work for both, the reason it doesn't is 
because of some weird code in thumb1_extendhisi2, which I'm actually 
gonna look at removing, but I don't think we should block this fix as 
we'd want to backport it ASAP.


But for clearness we should re-order this code so it's clear we only 
need it for that specific case.

Can you maybe do:
if (TYPE_UNSIGNED ..)
{
}
else
{
    /*  Signed-extension is a special case because of 
thumb1_extendhisi2.  */

    if (TARGET_THUMB1
    && known_gt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))
  {
     //call the gen_thumb1_extendhisi2
  }
     else
  {
     // use gen_RTX_SIGN_EXTEND
  }
}


So, you talk about gen_thumb1_extendhisi2, but there is also 
gen_thumb1_extendqisi2. Will it actually be cleaner if the block is 
indented one level?
The comment can be added in the "if (TARGET_THUMB1)" block regardless to 
indicate that gen_rtx_SIGN_EXTEND can't be used.



@@ -27250,6 +27256,52 @@ thumb1_expand_prologue (void)
    live_regs_mask = offsets->saved_regs_mask;
    lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
+  /* The AAPCS requires the callee to widen integral types narrower
+ than 32 bits to the full width of the register; but when handling
+ calls to non-secure space, we cannot trust the callee to have
+ correctly done so.  So forcibly re-widen the result here.  */
+  if (IS_CMSE_ENTRY (func_type))
+    {
+  function_args_iterator args_iter;
+  CUMULATIVE_ARGS args_so_far_v;
+  cumulative_args_t args_so_far;
+  bool first_param = true;
+  tree arg_type;
+  tree fndecl = current_function_decl;
+  tree fntype = TREE_TYPE (fndecl);
+  arm_init_cumulative_args (_so_far_v, fntype, NULL_RTX, 
fndecl);

+  args_so_far = pack_cumulative_args (_so_far_v);
+  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
+    {
+  rtx arg_rtx;
+
+  if (VOID_TYPE_P (arg_type))
+    break;
+
+  function_arg_info arg (arg_type, /*named=*/true);
+  if (!first_param)
+    /* We should advance after processing the argument and pass
+   the argument we're advancing past.  */
+    arm_function_arg_advance (args_so_far, arg);
+  first_param = false;
+  arg_rtx = arm_function_arg (args_so_far, arg);
+  gcc_assert (REG_P (arg_rtx));
+  if ((TREE_CODE (arg_type) == 

Re: [PATCH] fixincludes: bypass the math_exception fix on __cplusplus

2024-06-10 Thread Rainer Orth
Hi FX,

>> However, please note that the comment states
>> * This should be bypassed on __cplusplus, but some supposedly C++
>> * aware headers, such as Solaris 8 and 9, don't wrap their struct
>> It's "such as Solaris 8 and 9", so there may well be others.
>
> I know, but that was 24 years ago, and I could find zero documentation
> anywhere (mailing-list or bugzilla) of what those other targets could be. I
> don’t think it’s unreasonable, for the benefit of all the other working
> targets, to reverse now. It is early in stage 1, and the fix be restored if
> needed on specific targets.

I know, and I certainly won't oppose removing such cruft, just stating
that you won't find out what breaks until something breaks ;-)  The
problem is: if those fixes are only needed on lesser-used (and -tested)
targets, it may take quite some time until one does find out...

I'd have loved to remove fixes that mention obsolete Solaris versions,
but refrained from doing so when there was no way of knowing that no
innocent would be harmed.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] fixincludes: bypass the math_exception fix on __cplusplus

2024-06-10 Thread FX Coudert
> However, please note that the comment states
> * This should be bypassed on __cplusplus, but some supposedly C++
> * aware headers, such as Solaris 8 and 9, don't wrap their struct
> It's "such as Solaris 8 and 9", so there may well be others.

I know, but that was 24 years ago, and I could find zero documentation anywhere 
(mailing-list or bugzilla) of what those other targets could be. I don’t think 
it’s unreasonable, for the benefit of all the other working targets, to reverse 
now. It is early in stage 1, and the fix be restored if needed on specific 
targets.

Best,
FX

Re: [PATCH] libstdc++: Introduce scale factor in 30_threads/future/members/poll.cc [PR98678]

2024-06-10 Thread Jonathan Wakely
On Mon, 10 Jun 2024 at 12:54, Rainer Orth  wrote:
>
> 30_threads/future/members/poll.cc consistently FAILs on Solaris/x86
> (both 32 and 64-bit):
>
> FAIL: 30_threads/future/members/poll.cc  -std=gnu++17 execution test

I see this one failing under x86_64-linux under high load. So I think
we might simply want a better test.


>
> /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/30_threads/future/members/poll.cc:95:
>  int main(): Assertion 'wait_until_sys_min < (ready * 100)' failed.
> wait_for(0s): 11892ns for 200 calls, avg 59.46ns per call
> wait_until(system_clock minimum): 1304458ns for 200 calls, avg 6522.29ns per 
> call
> wait_until(steady_clock minimum): 1403221ns for 200 calls, avg 7016.1ns per 
> call
> wait_until(system_clock epoch): 3343806ns for 200 calls, avg 16719ns per call
> wait_until(steady_clock epoch: 2959581ns for 200 calls, avg 14797.9ns per call
> wait_for when ready: 10969ns for 200 calls, avg 54.845ns per call
>
> As reported in the PR, across a considerable range of CPUs the test
> doesn't complete in the expected time.  Therefore, this patch introduces
> a Solaris/x86 specific scale factor to allow for that.
>
> There's no such issue on Solaris/SPARC, though.
>
> Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.
>
> Ok for trunk?
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-06-04  Rainer Orth  
>
> libstdc++-v3:
> PR libstdc++/98678
> * testsuite/30_threads/future/members/poll.cc (main): Introduce
> scale factor.
>



[PATCH] libstdc++: Introduce scale factor in 30_threads/future/members/poll.cc [PR98678]

2024-06-10 Thread Rainer Orth
30_threads/future/members/poll.cc consistently FAILs on Solaris/x86
(both 32 and 64-bit):

FAIL: 30_threads/future/members/poll.cc  -std=gnu++17 execution test

/vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/30_threads/future/members/poll.cc:95:
 int main(): Assertion 'wait_until_sys_min < (ready * 100)' failed.
wait_for(0s): 11892ns for 200 calls, avg 59.46ns per call
wait_until(system_clock minimum): 1304458ns for 200 calls, avg 6522.29ns per 
call
wait_until(steady_clock minimum): 1403221ns for 200 calls, avg 7016.1ns per call
wait_until(system_clock epoch): 3343806ns for 200 calls, avg 16719ns per call
wait_until(steady_clock epoch: 2959581ns for 200 calls, avg 14797.9ns per call
wait_for when ready: 10969ns for 200 calls, avg 54.845ns per call

As reported in the PR, across a considerable range of CPUs the test
doesn't complete in the expected time.  Therefore, this patch introduces
a Solaris/x86 specific scale factor to allow for that.

There's no such issue on Solaris/SPARC, though.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-06-04  Rainer Orth  

libstdc++-v3:
PR libstdc++/98678
* testsuite/30_threads/future/members/poll.cc (main): Introduce
scale factor.

# HG changeset patch
# Parent  8f086e53ab093c8919708b1689a86f2bc407
libstdc++: Introduce scale factor in 30_threads/future/members/poll.cc [PR98678]

diff --git a/libstdc++-v3/testsuite/30_threads/future/members/poll.cc b/libstdc++-v3/testsuite/30_threads/future/members/poll.cc
--- a/libstdc++-v3/testsuite/30_threads/future/members/poll.cc
+++ b/libstdc++-v3/testsuite/30_threads/future/members/poll.cc
@@ -129,14 +129,21 @@ int main()
   VERIFY( wait_for_0 < (ready * 30) );
 
   // Polling before ready using wait_until(min) should not be terribly slow.
-  VERIFY( wait_until_sys_min < (ready * 100) );
-  VERIFY( wait_until_steady_min < (ready * 100) );
+  // These tests consistently time out on a couple of targets, so provide
+  // scale factor.
+#if defined(__sun) && defined(__svr4__) && (defined(__i386__) || defined(__x86_64__))
+  double scale = 2.5;
+#else
+  double scale = 1.0;
+#endif
+  VERIFY( wait_until_sys_min < (ready * 100 * scale) );
+  VERIFY( wait_until_steady_min < (ready * 100 * scale) );
 
   // The following two tests fail with GCC 11, see
   // https://gcc.gnu.org/pipermail/libstdc++/2020-November/051422.html
 #if 0
   // Polling before ready using wait_until(epoch) should not be terribly slow.
-  VERIFY( wait_until_sys_epoch < (ready * 100) );
-  VERIFY( wait_until_steady_epoch < (ready * 100) );
+  VERIFY( wait_until_sys_epoch < (ready * 100 * scale) );
+  VERIFY( wait_until_steady_epoch < (ready * 100 * scale) );
 #endif
 }


Re: [PATCH] Rearrange SLP nodes with duplicate statements. [PR98138]

2024-06-10 Thread Manolis Tsamis
On Wed, Jun 5, 2024 at 11:07 AM Richard Biener  wrote:
>
> On Tue, 4 Jun 2024, Manolis Tsamis wrote:
>
> > This change adds a function that checks for SLP nodes with multiple 
> > occurrences
> > of the same statement (e.g. {A, B, A, B, ...}) and tries to rearrange the 
> > node
> > so that there are no duplicates. A vec_perm is then introduced to recreate 
> > the
> > original ordering. These duplicates can appear due to how two_operators 
> > nodes
> > are handled, and they prevent vectorization in some cases.
>
> So the trick is that when we have two operands we elide duplicate lanes
> so we can do discovery for a single combined operand instead which we
> then decompose into the required two again.  That's a nice one.
>
> But as implemented this will fail SLP discovery if the combined operand
> fails discovery possibly because of divergence in downstream defs.  That
> is, it doesn't fall back to separate discovery.  I suspect the situation
> of duplicate lanes isn't common but then I would also suspect that
> divergence _is_ common.
>
That's a good point; I checked out and at least for the x264 testcase
provided SLP discovery succeeds in both cases but in one case
vectorization fails later on due to the unsupported load permutations
among others.
I think that's what Tamar also mentioned and it makes it hard to
decide whether to apply the pattern based on if discovery fails.

> The discovery code is already quite complex with the way it possibly
> swaps operands of lanes, fitting in this as another variant to try (first)
> is likely going to be a bit awkward.  A way out might be to split the
> function or to make the re-try in the caller which could indicate whether
> to apply this pattern trick or not.  That said - can you try to get
> data on how often the trick applies and discovery succeeds and how
> often discovery fails but discovery would suceed without applying the
> pattern (say, on SPEC)?
>
I checked out SPEC and this pattern only triggers on x264 and in that
case discovery succeeds. So we don't have any data on the pattern
applying but discovery failing.

> I also suppose instead of hardcoding three patterns for a fixed
> size it should be possible to see there's
> only (at most) half unique lanes in both operands (and one less in one
> operand if the number of lanes is odd) and compute the un-swizzling lane
> permutes during this discovery, removing the need of the explicit enum
> and open-coding each case?
>
Yes, that's a fair point. I will change that in the next iteration.

> Another general note is that trying (and then undo on fail) such ticks
> eats at the discovery limit we have in place to avoid exponential run-off
> in exactly this degenerate cases.
>

So, most importantly, the points you and Tamar mentioned got me
thinking about the transformation again, why it is useful and when it
applies.
In this initial implementation I tried to make this independant from
the two_operators logic and apply it when possible, which brings up
all these issues about discovery and usefulness of the pattern in
general.
E.g. If we had just [a, b, a, b] + [c, d, c, d] without two_operators
I sort of doubt it would be worth it to apply the transformation in
most cases (except of course if it enables vectorization, but as I
understand it it is hard to tell when that happens).
On the other hand, if we know that we're dealing with two_operators
nodes then the argument changes, as we know that we'll duplicate these
nodes.

In turn, it may be best to try to see this as a 'two_operators
lowering strategy' improvement instead of a generic rearrangement
pattern.
Specifically for x264, we're given code like

int t0 = s0 + s1;
int t1 = s0 - s1;
int t2 = s2 + s3;
int t3 = s2 - s3;

and currently we lower that to VEC_PERM<(A + B), (A - B)>(...) with A
= [s0, s0, s2, s2], B = [s1, s1, s3, s3] which doesn't work very well
(due to element duplication).
With this patch we do VEC_PERM<(A + B), (A - B)>(...) with A =
VEC_PERM(...), B = VEC_PERM(...),  C = [s0, s1, s2, s3]
instead which works good.
But it is obvious that there are other strategies to lower this too
and they may be even better (by taking advantage of the fact that we
know we're dealing with a two_operators node *and* have duplicate
elements).
For example doing VEC_PERM<(A + B), (A - B)>(...) with A = [s0, s1,
s2, s3] and B = VEC_PERM(1, 0, 3, 2) looks interesting too and
is only possible because we combine two_operators and rearrangement.

Do you believe that narrowing this to a "two_operators lowering
improvement" makes more sense and addresses at least some of the
issues mentioned?
I'm currently testing to see the code that we generate with other
strategies and will reach out once I have new results.

Thanks,
Manolis

> Thanks,
> Richard.
>
> > This targets the vectorization of the SPEC2017 x264 pixel_satd functions.
> > In some processors a larger than 10% improvement on x264 has been observed.
> >
> > See also: 

[PATCH] tree-optimization/115388 - wrong DSE in irreductible regions

2024-06-10 Thread Richard Biener
The following fixes a latent bug in DSE with regarding to variant
array accesses where the code avoiding bogus DSE in loops fails to
handle irreducible regions.  For those we need to make sure backedges
are marked and discover a header for the irreducible region to check
invariantness.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Unfortunately this doesn't seem to fix PR115256, the miscompare
of 502.gcc_r bisected to the same rev.

PR tree-optimization/115388
* tree-ssa-dse.cc (dse_classify_store): Handle irreducible
regions.
(pass_dse::execute): Make sure to mark backedges.

* gcc.dg/torture/pr115388.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr115388.c | 34 ++
 gcc/tree-ssa-dse.cc | 61 -
 2 files changed, 74 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr115388.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr115388.c 
b/gcc/testsuite/gcc.dg/torture/pr115388.c
new file mode 100644
index 000..c7c902888da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr115388.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+
+int printf(const char *, ...);
+int a[10], b, c, d[0], h, i, j, k, l;
+char e = -1, g;
+volatile int f;
+static void n() {
+  while (e >= 0)
+while (1)
+  ;
+  for (b = 2; b >= 0; b--) {
+for (k = 0; k < 4; k++) {
+  if (e || i)
+continue;
+  for (h = 0; h < 2; h++)
+f;
+}
+for (l = 2; l >= 0; l--)
+  g = 0;
+for (; g < 1; g++)
+  if (c)
+d[l] = 1;
+a[9] = 0;
+a[b] = 1;
+while (j)
+  printf("\n");
+  }
+}
+int main() {
+  n();
+  if (a[1] != 1)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 9252ca34050..63bf4491cf6 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -1018,8 +1018,11 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
  if (defvar == stop_at_vuse)
return DSE_STORE_LIVE;
 
- FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
+ use_operand_p usep;
+ FOR_EACH_IMM_USE_FAST (usep, ui, defvar)
{
+ use_stmt = USE_STMT (usep);
+
  /* Limit stmt walking.  */
  if (++cnt > param_dse_max_alias_queries_per_store)
{
@@ -1031,31 +1034,43 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
 have to be careful with loops and with memory references
 containing operands that are also operands of PHI nodes.
 See gcc.c-torture/execute/20051110-*.c.  */
- if (gimple_code (use_stmt) == GIMPLE_PHI)
+ if (gphi *phi = dyn_cast  (use_stmt))
{
  /* Look through single-argument PHIs.  */
- if (gimple_phi_num_args (use_stmt) == 1)
-   worklist.safe_push (gimple_phi_result (use_stmt));
-
- /* If we already visited this PHI ignore it for further
-processing.  */
- else if (!bitmap_bit_p (visited,
- SSA_NAME_VERSION
-   (PHI_RESULT (use_stmt
+ if (gimple_phi_num_args (phi) == 1)
+   worklist.safe_push (gimple_phi_result (phi));
+ else
{
  /* If we visit this PHI by following a backedge then we
 have to make sure ref->ref only refers to SSA names
 that are invariant with respect to the loop
-represented by this PHI node.  */
- if (dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
- gimple_bb (use_stmt))
- && !for_each_index (ref->ref ? >ref : >base,
- check_name, gimple_bb (use_stmt)))
-   return DSE_STORE_LIVE;
- defs.safe_push (use_stmt);
- if (!first_phi_def)
-   first_phi_def = as_a  (use_stmt);
- last_phi_def = as_a  (use_stmt);
+represented by this PHI node.  We handle irreducible
+regions by relying on backedge marking and identifying
+the head of the (sub-)region.  */
+ edge e = gimple_phi_arg_edge
+(phi, PHI_ARG_INDEX_FROM_USE (usep));
+ if (e->flags & EDGE_DFS_BACK)
+   {
+ basic_block rgn_head
+   = nearest_common_dominator (CDI_DOMINATORS,
+   gimple_bb (phi),
+   e->src);
+ if (!for_each_index (ref->ref
+

Re: [PATCH] fixincludes: bypass the math_exception fix on __cplusplus

2024-06-10 Thread Rainer Orth
Hi FX,

> The fixincludes fix “math_exception” is being applied overly broadly,
> including many targets which don’t need it, like darwin (and probably all
> non-glibc targets). I’m not sure if it is still needed on any target, but
> because I can’t be absolutely positive about that, I don’t want to remove
> it. But it dates from before 1998.

right: that's the ugly thing about those fixes that aren't restricted to
particular targets.  Even if they were introduced for a specific set of
targets initially, there's no way of knowing if they are aren't needed
elsewere, too.  In fact, they *may* be needed and nobody nobody noticed
because the fix was already in place.  This makes them almost impossible
to get rid of.

> In subsequent times (2000) it was bypassed on glibc headers, as well as
> Solaris 10. It was still needed on Solaris 8 and 9, which are (AFAICT)
> unsupported nowadays. The fix was originally bypassed on __cplusplus, which
> is the correct thing to do, but that bypass was neutralized to cater to a
> bug on Solaris 8 and 9 headers. Now that those are gone… let’s revert to

Solaris 8 and 9 support is long gone from trunk.  In fact, the only
Solaris version supported there is 11.4.

However, please note that the comment states

 * This should be bypassed on __cplusplus, but some supposedly C++
 * aware headers, such as Solaris 8 and 9, don't wrap their struct

It's "such as Solaris 8 and 9", so there may well be others.  Commercial
OSes in particular can retain their headers (bugs and all) for very long
times ;-(  It make take quite some insistence from customers or
developes to have them changed (I know all too well from the Solaris
side of things, and you probably have similar experiences for Darwin).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] aarch64: Add fcsel to cmov integer and csel to float cmov [PR98477]

2024-06-10 Thread Richard Sandiford
Andrew Pinski  writes:
> This patch adds an alternative to the integer cmov and one to floating
> point cmov so we avoid in some more moving
>
>   PR target/98477
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (*cmov_insn[GPI]): Add 'w'
>   alternative.
>   (*cmov_insn[GPF]): Add 'r' alternative.
>   * config/aarch64/iterators.md (wv): New mode attr.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/csel_1.c: New test.
>   * gcc.target/aarch64/fcsel_2.c: New test.

This seems a bit dangerous while PR114766 remains unresolved.
The problem (AIUI) is that adding r and w alternatives to the
same pattern means that, when computing the cost of each register
class, r and w are equally cheap for each operand in isolation,
without the interdependencies being modelled.  (This is because
class preferences are calculated on a per-register basis.)

E.g. if:

- insn I1 is op0 = fn(op1, op2)
- I1 provides r and w alternatives
- other uses of op0 and op1 prefer r
- other uses of op2 prefer w

then I1 will not influence the costs of op0, op1 or op2, and so
I1 effectively will provide a zero-cost cross-over between r and w.

There again, we already have other patterns with this problem.  And I
think we do need to make it work somehow.  It's just a question of
whether we should pause adding more "r or w" alternatives until the
problem is fixed, or whether we should treat adding more alternatives
and fixing the RA problem as parallel work.

> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.md  | 13 +++
>  gcc/config/aarch64/iterators.md|  4 
>  gcc/testsuite/gcc.target/aarch64/csel_1.c  | 27 ++
>  gcc/testsuite/gcc.target/aarch64/fcsel_2.c | 20 
>  4 files changed, 59 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/csel_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fcsel_2.c
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 2bdd443e71d..a6cedd0f1b8 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4404,6 +4404,7 @@ (define_insn "*cmov_insn"
>   [ r, Ui1 , rZ  ; csel] csinc\t%0, %4, zr, %M1
>   [ r, UsM , UsM ; mov_imm ] mov\t%0, -1
>   [ r, Ui1 , Ui1 ; mov_imm ] mov\t%0, 1
> + [ w, w   , w   ; fcsel   ] fcsel\t%0, %3, %4, 
> %m1
>}
>  )
>  
> @@ -4464,15 +4465,17 @@ (define_insn "*cmovdi_insn_uxtw"
>  )
>  
>  (define_insn "*cmov_insn"
> -  [(set (match_operand:GPF 0 "register_operand" "=w")
> +  [(set (match_operand:GPF 0 "register_operand" "=r,w")
>   (if_then_else:GPF
>(match_operator 1 "aarch64_comparison_operator"
> [(match_operand 2 "cc_register" "") (const_int 0)])
> -  (match_operand:GPF 3 "register_operand" "w")
> -  (match_operand:GPF 4 "register_operand" "w")))]
> +  (match_operand:GPF 3 "register_operand" "r,w")
> +  (match_operand:GPF 4 "register_operand" "r,w")))]
>"TARGET_FLOAT"
> -  "fcsel\\t%0, %3, %4, %m1"
> -  [(set_attr "type" "fcsel")]
> +  "@
> +   csel\t%0, %3, %4, %m1
> +   fcsel\\t%0, %3, %4, %m1"
> +  [(set_attr "type" "fcsel,csel")]
>  )

I think we should use the new syntax for all new insns with more than
one alternative.

Thanks,
Richard

>  
>  (define_expand "movcc"
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 99cde46f1ba..42303f2ec02 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -1147,6 +1147,10 @@ (define_mode_attr e [(CCFP "") (CCFPE "e")])
>  ;; 32-bit version and "%x0" in the 64-bit version.
>  (define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")])
>  
> +;; For cmov template to be used with fscel instruction
> +(define_mode_attr wv [(QI "s") (HI "s") (SI "s") (DI "d") (SF "s") (DF "d")])
> +
> +
>  ;; The size of access, in bytes.
>  (define_mode_attr ldst_sz [(SI "4") (DI "8")])
>  ;; Likewise for load/store pair.
> diff --git a/gcc/testsuite/gcc.target/aarch64/csel_1.c 
> b/gcc/testsuite/gcc.target/aarch64/csel_1.c
> new file mode 100644
> index 000..5848e5be2ff
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/csel_1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-ssa-phiopt" } */
> +/* PR target/98477 */
> +
> +/* We should be able to produce csel followed by a store
> +   and not move between the GPRs and simd registers. */
> +/* Note -fno-ssa-phiopt is needed, otherwise the tree level
> +   does the VCE after the cmov which allowed to use the csel
> +   instruction. */
> +_Static_assert (sizeof(long long) == sizeof(double));
> +void
> +foo (int a, double *b, long long c, long long d)
> +{
> +  double ct;
> +  double dt;
> +  __builtin_memcpy(, , sizeof(long long));
> +  __builtin_memcpy(, , sizeof(long long));
> +  double t = a ? ct : dt;
> +  *b = t;
> +}
> +
> +/* { 

Re: [PATCH] aarch64: Add vector floating point trunc pattern

2024-06-10 Thread Richard Sandiford
Pengxuan Zheng  writes:
> This patch is a follow-up of r15-1079-g230d62a2cdd16c to add vector floating
> point trunc pattern for V2DF->V2SF and V4SF->V4HF conversions by renaming the
> existing aarch64_float_truncate_lo_ pattern to the 
> standard
> optab one, i.e., trunc2. This allows the vectorizer
> to vectorize certain floating point narrowing operations for the aarch64 
> target.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (VAR1): Remap float_truncate_lo_
>   builtin codes to standard optab ones.
>   * config/aarch64/aarch64-simd.md 
> (aarch64_float_truncate_lo_):
>   Rename to...
>   (trunc2): ... This.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/trunc-vec.c: New test.

OK, thanks.

Richard

> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64-builtins.cc   |  7 +++
>  gcc/config/aarch64/aarch64-simd.md   |  6 +++---
>  gcc/testsuite/gcc.target/aarch64/trunc-vec.c | 21 
>  3 files changed, 31 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/trunc-vec.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 25189888d17..d589e59defc 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -543,6 +543,13 @@ BUILTIN_VDQ_BHSI (uhadd, uavg, _floor, 0)
>  VAR1 (float_extend_lo_, extend, v2sf, v2df)
>  VAR1 (float_extend_lo_, extend, v4hf, v4sf)
>  
> +/* __builtin_aarch64_float_truncate_lo_ should be expanded through the
> +   standard optabs CODE_FOR_trunc2. */
> +constexpr insn_code CODE_FOR_aarch64_float_truncate_lo_v4hf
> += CODE_FOR_truncv4sfv4hf2;
> +constexpr insn_code CODE_FOR_aarch64_float_truncate_lo_v2sf
> += CODE_FOR_truncv2dfv2sf2;
> +
>  #undef VAR1
>  #define VAR1(T, N, MAP, FLAG, A) \
>{#N #A, UP (A), CF##MAP (N, A), 0, TYPES_##T, FLAG_##FLAG},
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index c5e2c9f00d0..f644bd1731e 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3197,7 +3197,7 @@ (define_expand "aarch64_float_trunc_rodd_hi_v4sf"
>  }
>  )
>  
> -(define_insn "aarch64_float_truncate_lo_"
> +(define_insn "trunc2"
>[(set (match_operand:VDF 0 "register_operand" "=w")
>(float_truncate:VDF
>   (match_operand: 1 "register_operand" "w")))]
> @@ -3256,7 +3256,7 @@ (define_expand "vec_pack_trunc_v2df"
>  int lo = BYTES_BIG_ENDIAN ? 2 : 1;
>  int hi = BYTES_BIG_ENDIAN ? 1 : 2;
>  
> -emit_insn (gen_aarch64_float_truncate_lo_v2sf (tmp, operands[lo]));
> +emit_insn (gen_truncv2dfv2sf2 (tmp, operands[lo]));
>  emit_insn (gen_aarch64_float_truncate_hi_v4sf (operands[0],
>  tmp, operands[hi]));
>  DONE;
> @@ -3272,7 +3272,7 @@ (define_expand "vec_pack_trunc_df"
>{
>  rtx tmp = gen_reg_rtx (V2SFmode);
>  emit_insn (gen_aarch64_vec_concatdf (tmp, operands[1], operands[2]));
> -emit_insn (gen_aarch64_float_truncate_lo_v2sf (operands[0], tmp));
> +emit_insn (gen_truncv2dfv2sf2 (operands[0], tmp));
>  DONE;
>}
>  )
> diff --git a/gcc/testsuite/gcc.target/aarch64/trunc-vec.c 
> b/gcc/testsuite/gcc.target/aarch64/trunc-vec.c
> new file mode 100644
> index 000..05e8af7912d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/trunc-vec.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* { dg-final { scan-assembler-times {fcvtn\tv[0-9]+.2s, v[0-9]+.2d} 1 } } */
> +void
> +f (double *__restrict a, float *__restrict b)
> +{
> +  b[0] = a[0];
> +  b[1] = a[1];
> +}
> +
> +/* { dg-final { scan-assembler-times {fcvtn\tv[0-9]+.4h, v[0-9]+.4s} 1 } } */
> +void
> +f1 (float *__restrict a, _Float16 *__restrict b)
> +{
> +
> +  b[0] = a[0];
> +  b[1] = a[1];
> +  b[2] = a[2];
> +  b[3] = a[3];
> +}


Re: [PATCH] internal-fn: Force to reg if operand doesn't match.

2024-06-10 Thread Richard Biener
On Mon, Jun 10, 2024 at 9:35 AM Robin Dapp  wrote:
>
> Hi,
>
> despite looking good on cfarm185 and Linaro's pre-commit CI
> gcc-15-638-g7ca35f2e430 now appears to have caused several
> regressions on arm-eabi cortex-m55 as found by Linaro's CI:
>
> https://linaro.atlassian.net/browse/GNU-1252
>
> I'm assuming this target is not tested as regularly and thus
> the failures went unnoticed until now.
>
> So it looks like we do need the insn_operand_matches after all?

But why does expand_vec_cond_optab_fn get away without?
(note we want to get rid of that variant)

Almost no other expander checks this either, though some
can_* functions validate.  It's not exactly clear to me whether
we are just lucky and really always need to validate or whether
it's a bug in the target?

Richard.

> This patch only forces to register if the respective operands
> do not already match.
>
> Bootstrap and regtest on aarch64 and x86 in progress.
> Regtested on riscv64.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_vec_cond_mask_optab_fn): Only force to
> reg if operand does not already match.
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 4948b48bde8..fa85fa69f5a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3162,7 +3162,13 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>gcc_assert (icode != CODE_FOR_nothing);
>
>mask = expand_normal (op0);
> +  if (!insn_operand_matches (icode, 3, mask))
> +mask = force_reg (mask_mode, mask);
> +
>rtx_op1 = expand_normal (op1);
> +  if (!insn_operand_matches (icode, 1, rtx_op1))
> +rtx_op1 = force_reg (mode, rtx_op1);
> +
>rtx_op2 = expand_normal (op2);
>
>rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> --
> 2.45.1


Re: [PATCH v2 1/2] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]

2024-06-10 Thread Andre Vieira (lists)

Hi Torbjorn,

Thanks for this, I have some comments below.

On 07/06/2024 09:56, Torbjörn SVENSSON wrote:

Properly handle zero and sign extension for Armv8-M.baseline as
Cortex-M23 can have the security extension active.
Currently, there is a internal compiler error on Cortex-M23 for the
epilog processing of sign extension.

This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.

gcc/ChangeLog:

PR target/115253
* config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
Sign extend for Thumb1.
(thumb1_expand_prologue): Add zero/sign extend.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
  gcc/config/arm/arm.cc | 68 ++-
  1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index ea0c963a4d6..d1bb173c135 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -19220,17 +19220,23 @@ cmse_nonsecure_call_inline_register_clear (void)
  || TREE_CODE (ret_type) == BOOLEAN_TYPE)
  && known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 4))
{
- machine_mode ret_mode = TYPE_MODE (ret_type);
+ rtx ret_mode = gen_rtx_REG (TYPE_MODE (ret_type), R0_REGNUM);
+ rtx si_mode = gen_rtx_REG (SImode, R0_REGNUM);


I'd rename ret_mode and si_mode to ret_reg and si_reg, so its clear they 
are registers and not actually mode types.



  rtx extend;
  if (TYPE_UNSIGNED (ret_type))
-   extend = gen_rtx_ZERO_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
+   extend = gen_rtx_SET (si_mode, gen_rtx_ZERO_EXTEND (SImode,
+   ret_mode));
+ else if (TARGET_THUMB1)
+   {
+ if (known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))
+   extend = gen_thumb1_extendqisi2 (si_mode, ret_mode);
+ else
+   extend = gen_thumb1_extendhisi2 (si_mode, ret_mode);
+   }
  else
-   extend = gen_rtx_SIGN_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
- emit_insn_after (gen_rtx_SET (gen_rtx_REG (SImode, R0_REGNUM),
-extend), insn);
-
+   extend = gen_rtx_SET (si_mode, gen_rtx_SIGN_EXTEND (SImode,
+   ret_mode));
+ emit_insn_after (extend, insn);
}


Using gen_rtx_SIGN_EXTEND should work for both, the reason it doesn't is 
because of some weird code in thumb1_extendhisi2, which I'm actually 
gonna look at removing, but I don't think we should block this fix as 
we'd want to backport it ASAP.


But for clearness we should re-order this code so it's clear we only 
need it for that specific case.

Can you maybe do:
if (TYPE_UNSIGNED ..)
{
}
else
{
   /*  Signed-extension is a special case because of 
thumb1_extendhisi2.  */

   if (TARGET_THUMB1
   && known_gt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))
 {
//call the gen_thumb1_extendhisi2
 }
else
 {
// use gen_RTX_SIGN_EXTEND
 }
}
  
  
@@ -27250,6 +27256,52 @@ thumb1_expand_prologue (void)

live_regs_mask = offsets->saved_regs_mask;
lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
  
+  /* The AAPCS requires the callee to widen integral types narrower

+ than 32 bits to the full width of the register; but when handling
+ calls to non-secure space, we cannot trust the callee to have
+ correctly done so.  So forcibly re-widen the result here.  */
+  if (IS_CMSE_ENTRY (func_type))
+{
+  function_args_iterator args_iter;
+  CUMULATIVE_ARGS args_so_far_v;
+  cumulative_args_t args_so_far;
+  bool first_param = true;
+  tree arg_type;
+  tree fndecl = current_function_decl;
+  tree fntype = TREE_TYPE (fndecl);
+  arm_init_cumulative_args (_so_far_v, fntype, NULL_RTX, fndecl);
+  args_so_far = pack_cumulative_args (_so_far_v);
+  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
+   {
+ rtx arg_rtx;
+
+ if (VOID_TYPE_P (arg_type))
+   break;
+
+ function_arg_info arg (arg_type, /*named=*/true);
+ if (!first_param)
+   /* We should advance after processing the argument and pass
+  the argument we're advancing past.  */
+   arm_function_arg_advance (args_so_far, arg);
+ first_param = false;
+ arg_rtx = arm_function_arg (args_so_far, arg);
+ gcc_assert (REG_P (arg_rtx));
+ if ((TREE_CODE (arg_type) == INTEGER_TYPE
+ || TREE_CODE (arg_type) == ENUMERAL_TYPE
+ || TREE_CODE (arg_type) == BOOLEAN_TYPE)
+ && known_lt (GET_MODE_SIZE (GET_MODE 

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Richard Sandiford
Ajit Agarwal  writes:
> On 10/06/24 3:20 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
>>> On 10/06/24 2:52 pm, Richard Sandiford wrote:
 Ajit Agarwal  writes:
> On 10/06/24 2:12 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
>>> +
>>> + rtx set = single_set (insn);
>>> + if (set == NULL_RTX)
>>> +   return false;
>>> +
>>> + rtx op0 = SET_SRC (set);
>>> + rtx_code code = GET_CODE (op0);
>>> +
>>> + // This check is added as register pairs are not 
>>> generated
>>> + // by RA for neg:V2DF (fma: V2DF (reg1)
>>> + //  (reg2)
>>> + //  (neg:V2DF (reg3)))
>>> + if (GET_RTX_CLASS (code) == RTX_UNARY)
>>> +   return false;
>>
>> What's special about (neg (fma ...))?
>>
>
> I am not sure why register allocator fails allocating register 
> pairs with
> NEG Unary operation with fma operand. I have not debugged 
> register allocator why the NEG
> Unary operation with fma operand. 

>>>
>>> For neg (fma ...) cases because of subreg 128 bits from OOmode 256 
>>> bits are
>>> set correctly. 
>>> IRA marked them spill candidates as spill priority is zero.
>>>
>>> Due to this LRA reload pass couldn't allocate register pairs.
>>
>> I think this is just restating the symptom though.  I suppose the 
>> same
>> kind of questions apply here too: what was the instruction before the
>> pass runs, what was the instruction after the pass runs, and why is
>> the rtl change incorrect (by the meaning above)?
>>
>
> Original case where we dont do load fusion, spill happens, in that
> case we dont require sequential register pairs to be generated for 2 
> loads
> for. Hence it worked.
>
> rtl change is correct and there is no error.
>
> for load fusion spill happens and we dont generate sequential 
> register pairs
> because pf spill candidate and lxvp gives incorrect results as 
> sequential register
> pairs are required for lxvp.

 Can you go into more detail?  How is the lxvp represented?  And how do
 we end up not getting a sequential register pair?  What does the rtl
 look like (before and after things have gone wrong)?

 It seems like either the rtl is not describing the result of the fusion
 correctly or there is some problem in the .md description of lxvp.

>>>
>>> After fusion pass:
>>>
>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>> [240])
>>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
>>> (const_int 16 [0x10])) [1 MEM  
>>> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 
>>> {vsx_movv2df_64bit}
>>>  (nil))
>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>> [240])
>>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM >> real(kind=8)> [(real(kind=8) *)_4050]+16 ])
>>> (reg:V2DF 44 12 [3119])
>>> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>> [240]) {*vsx_nfmsv2df4}
>>>  (nil))
>>>
>>> In LRA reload.
>>>
>>> (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ])
>>> (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM 
>>>  [(real(kind=8) *)_4188]+0 S16 A64])) 
>>> "shell_lam.fppized.f":238:72 2187 {*movoo}
>>>  (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] 
>>> [1285]) [1 MEM  [(real(kind=8) *)_4188]+0 S16 
>>> A64])
>>> (nil)))
>>> (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ])
>>> (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM >> real(kind=8)> [(real(kind=8) *)_4050] ]) 16)
>>> (reg:V2DF 4283 [3119])
>>> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 
>>> ]) 16)  {*vsx_nfmsv2df4}
>>>  (nil))
>>>
>>>
>>> In LRA reload sequential registers are not generated as r2572 is splled 
>>> and move to spill location
>>> in stack and subsequent uses loads from stack. Hence sequential 
>>> registers pairs are not generated.
>>>
>>> lxvp vsx0, 0(r1).
>>>
>>> It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use 
>>> sequential register pairs.
>>>
>>> Without load fusion since 2 loads exists and 2 loads 

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Ajit Agarwal
Hello Richard:

On 10/06/24 3:20 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> Hello Richard:
>>
>> On 10/06/24 2:52 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
 On 10/06/24 2:12 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> +
>> +  rtx set = single_set (insn);
>> +  if (set == NULL_RTX)
>> +return false;
>> +
>> +  rtx op0 = SET_SRC (set);
>> +  rtx_code code = GET_CODE (op0);
>> +
>> +  // This check is added as register pairs are not 
>> generated
>> +  // by RA for neg:V2DF (fma: V2DF (reg1)
>> +  //  (reg2)
>> +  //  (neg:V2DF (reg3)))
>> +  if (GET_RTX_CLASS (code) == RTX_UNARY)
>> +return false;
>
> What's special about (neg (fma ...))?
>

 I am not sure why register allocator fails allocating register 
 pairs with
 NEG Unary operation with fma operand. I have not debugged register 
 allocator why the NEG
 Unary operation with fma operand. 
>>>
>>
>> For neg (fma ...) cases because of subreg 128 bits from OOmode 256 
>> bits are
>> set correctly. 
>> IRA marked them spill candidates as spill priority is zero.
>>
>> Due to this LRA reload pass couldn't allocate register pairs.
>
> I think this is just restating the symptom though.  I suppose the same
> kind of questions apply here too: what was the instruction before the
> pass runs, what was the instruction after the pass runs, and why is
> the rtl change incorrect (by the meaning above)?
>

 Original case where we dont do load fusion, spill happens, in that
 case we dont require sequential register pairs to be generated for 2 
 loads
 for. Hence it worked.

 rtl change is correct and there is no error.

 for load fusion spill happens and we dont generate sequential register 
 pairs
 because pf spill candidate and lxvp gives incorrect results as 
 sequential register
 pairs are required for lxvp.
>>>
>>> Can you go into more detail?  How is the lxvp represented?  And how do
>>> we end up not getting a sequential register pair?  What does the rtl
>>> look like (before and after things have gone wrong)?
>>>
>>> It seems like either the rtl is not describing the result of the fusion
>>> correctly or there is some problem in the .md description of lxvp.
>>>
>>
>> After fusion pass:
>>
>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>> [240])
>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
>> (const_int 16 [0x10])) [1 MEM  
>> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 
>> {vsx_movv2df_64bit}
>>  (nil))
>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>> [240])
>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM > real(kind=8)> [(real(kind=8) *)_4050]+16 ])
>> (reg:V2DF 44 12 [3119])
>> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>> [240]) {*vsx_nfmsv2df4}
>>  (nil))
>>
>> In LRA reload.
>>
>> (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ])
>> (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM 
>>  [(real(kind=8) *)_4188]+0 S16 A64])) 
>> "shell_lam.fppized.f":238:72 2187 {*movoo}
>>  (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] 
>> [1285]) [1 MEM  [(real(kind=8) *)_4188]+0 S16 
>> A64])
>> (nil)))
>> (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ])
>> (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM > real(kind=8)> [(real(kind=8) *)_4050] ]) 16)
>> (reg:V2DF 4283 [3119])
>> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 
>> ]) 16)  {*vsx_nfmsv2df4}
>>  (nil))
>>
>>
>> In LRA reload sequential registers are not generated as r2572 is splled 
>> and move to spill location
>> in stack and subsequent uses loads from stack. Hence sequential 
>> registers pairs are not generated.
>>
>> lxvp vsx0, 0(r1).
>>
>> It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use 
>> sequential register pairs.
>>
>> Without load fusion since 2 loads exists and 2 loads need not require 
>> sequential registers
>> hence it worked but with load fusion and using 

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Ajit Agarwal
Hello Richard:

On 10/06/24 3:20 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> Hello Richard:
>>
>> On 10/06/24 2:52 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
 On 10/06/24 2:12 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> +
>> +  rtx set = single_set (insn);
>> +  if (set == NULL_RTX)
>> +return false;
>> +
>> +  rtx op0 = SET_SRC (set);
>> +  rtx_code code = GET_CODE (op0);
>> +
>> +  // This check is added as register pairs are not 
>> generated
>> +  // by RA for neg:V2DF (fma: V2DF (reg1)
>> +  //  (reg2)
>> +  //  (neg:V2DF (reg3)))
>> +  if (GET_RTX_CLASS (code) == RTX_UNARY)
>> +return false;
>
> What's special about (neg (fma ...))?
>

 I am not sure why register allocator fails allocating register 
 pairs with
 NEG Unary operation with fma operand. I have not debugged register 
 allocator why the NEG
 Unary operation with fma operand. 
>>>
>>
>> For neg (fma ...) cases because of subreg 128 bits from OOmode 256 
>> bits are
>> set correctly. 
>> IRA marked them spill candidates as spill priority is zero.
>>
>> Due to this LRA reload pass couldn't allocate register pairs.
>
> I think this is just restating the symptom though.  I suppose the same
> kind of questions apply here too: what was the instruction before the
> pass runs, what was the instruction after the pass runs, and why is
> the rtl change incorrect (by the meaning above)?
>

 Original case where we dont do load fusion, spill happens, in that
 case we dont require sequential register pairs to be generated for 2 
 loads
 for. Hence it worked.

 rtl change is correct and there is no error.

 for load fusion spill happens and we dont generate sequential register 
 pairs
 because pf spill candidate and lxvp gives incorrect results as 
 sequential register
 pairs are required for lxvp.
>>>
>>> Can you go into more detail?  How is the lxvp represented?  And how do
>>> we end up not getting a sequential register pair?  What does the rtl
>>> look like (before and after things have gone wrong)?
>>>
>>> It seems like either the rtl is not describing the result of the fusion
>>> correctly or there is some problem in the .md description of lxvp.
>>>
>>
>> After fusion pass:
>>
>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>> [240])
>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
>> (const_int 16 [0x10])) [1 MEM  
>> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 
>> {vsx_movv2df_64bit}
>>  (nil))
>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>> [240])
>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM > real(kind=8)> [(real(kind=8) *)_4050]+16 ])
>> (reg:V2DF 44 12 [3119])
>> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>> [240]) {*vsx_nfmsv2df4}
>>  (nil))
>>
>> In LRA reload.
>>
>> (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ])
>> (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM 
>>  [(real(kind=8) *)_4188]+0 S16 A64])) 
>> "shell_lam.fppized.f":238:72 2187 {*movoo}
>>  (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] 
>> [1285]) [1 MEM  [(real(kind=8) *)_4188]+0 S16 
>> A64])
>> (nil)))
>> (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ])
>> (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM > real(kind=8)> [(real(kind=8) *)_4050] ]) 16)
>> (reg:V2DF 4283 [3119])
>> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 
>> ]) 16)  {*vsx_nfmsv2df4}
>>  (nil))
>>
>>
>> In LRA reload sequential registers are not generated as r2572 is splled 
>> and move to spill location
>> in stack and subsequent uses loads from stack. Hence sequential 
>> registers pairs are not generated.
>>
>> lxvp vsx0, 0(r1).
>>
>> It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use 
>> sequential register pairs.
>>
>> Without load fusion since 2 loads exists and 2 loads need not require 
>> sequential registers
>> hence it worked but with load fusion and using 

Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-06-10 Thread YunQiang Su
Richard Sandiford  于2024年6月6日周四 17:54写道:
>
> YunQiang Su  writes:
> > YunQiang Su  于2024年5月29日周三 10:02写道:
> >>
> >> Richard Sandiford  于2024年5月29日周三 05:28写道:
> >> >
> >> > YunQiang Su  writes:
> >> > > If `find_a_program` cannot find `as/ld/objcopy` and we are a cross 
> >> > > toolchain,
> >> > > the final fallback is `as/ld` of system.  In fact, we can have a try 
> >> > > with
> >> > > -as/ld/objcopy before fallback to native as/ld/objcopy.
> >> > >
> >> > > This patch is derivatived from Debian's patch:
> >> > >   gcc-search-prefixed-as-ld.diff
> >> >
> >> > I'm probably making you repeat a previous discussion, sorry, but could
> >> > you describe the use case in more detail?  The current approach to
> >> > handling cross toolchains has been used for many years.  Presumably
> >> > this patch is supporting a different way of organising things,
> >> > but I wasn't sure from the description what it was.
> >> >
> >> > AIUI, we currently assume that cross as, ld and objcopy will be
> >> > installed under those names in $prefix/$target_alias/bin (aka 
> >> > $tooldir/bin).
> >> > E.g.:
> >> >
> >> >bin/aarch64-elf-as = aarch64-elf/bin/as
> >> >
> >> > GCC should then find as in aarch64-elf/bin.
> >> >
> >> > Is that not true in your case?
> >> >
> >>
> >> Yes. This patch is only about the final fallback. I mean aarch64-elf/bin/as
> >> still has higher priority than bin/aarch64-elf-as.
> >>
> >> In the current code, we find gas with:
> >> /prefix/aarch64-elf/bin/as > $PATH/as
> >>
> >> And this patch a new one between them:
> >> /prefix/aarch64-elf/bin/as > $PATH/aarch64-elf-as > $PATH/as
> >>
> >> > To be clear, I'm not saying the patch is wrong.  I'm just trying to
> >> > understand why the patch is needed.
> >> >
> >>
> >> Yes. If gcc is configured correctly, it is not so useful.
> >> In some case for some lazy user, it may be useful,
> >> for example, the binutils installed into different prefix with libc etc.
> >>
> >> For example, binutils is installed into /usr/aarch64-elf/bin, while
> >> libc is installed into /usr/local/aarch64-elf/.
> >>
> >
> > Any idea about it? Is it a use case making sense?
>
> Yeah, I think it makes sense.  GCC and binutils are separate packages.
> Users could cherry-pick a GCC installation and a separate binutils
> installation rather than bundling them together into a single
> toolchain.  And not everyone will have permission to change $tooldir.
>
> So I agree we should support searching the user's path for an
> as/ld/etc. based on the tool prefix.  Unfortunately, I don't think
> I understand the code & constraints well enough to do a review.
>
> In particular, it seems unfortunate that we need to do a trial
> subcommand invocation before committing to the prefixed name.
> And, if we continue to search for "as" in the user's path as a fallback,
> it's not 100% obvious that "${triple}-as" later in the path should trump
> "as" earlier in the path.
>
> In some ways, it seems more consistent to do the replacement without
> first doing a trial invocation.  But I don't know whether that would
> break existing use cases.  (To be clear, I wouldn't feel comfortable

Yes. This is also my worry as some users may set $PATH manually
to a path which contains target `as`, such as
   export PATH="/usr/aarch64-linux-gnu/bin:$PATH"

> approving a patch to do that without buy-in from other maintainers.)
>
> Thanks,
> Richard


Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Richard Sandiford
Ajit Agarwal  writes:
> Hello Richard:
>
> On 10/06/24 2:52 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
>>> On 10/06/24 2:12 pm, Richard Sandiford wrote:
 Ajit Agarwal  writes:
> +
> +   rtx set = single_set (insn);
> +   if (set == NULL_RTX)
> + return false;
> +
> +   rtx op0 = SET_SRC (set);
> +   rtx_code code = GET_CODE (op0);
> +
> +   // This check is added as register pairs are not generated
> +   // by RA for neg:V2DF (fma: V2DF (reg1)
> +   //  (reg2)
> +   //  (neg:V2DF (reg3)))
> +   if (GET_RTX_CLASS (code) == RTX_UNARY)
> + return false;

 What's special about (neg (fma ...))?

>>>
>>> I am not sure why register allocator fails allocating register 
>>> pairs with
>>> NEG Unary operation with fma operand. I have not debugged register 
>>> allocator why the NEG
>>> Unary operation with fma operand. 
>>
>
> For neg (fma ...) cases because of subreg 128 bits from OOmode 256 
> bits are
> set correctly. 
> IRA marked them spill candidates as spill priority is zero.
>
> Due to this LRA reload pass couldn't allocate register pairs.

 I think this is just restating the symptom though.  I suppose the same
 kind of questions apply here too: what was the instruction before the
 pass runs, what was the instruction after the pass runs, and why is
 the rtl change incorrect (by the meaning above)?

>>>
>>> Original case where we dont do load fusion, spill happens, in that
>>> case we dont require sequential register pairs to be generated for 2 
>>> loads
>>> for. Hence it worked.
>>>
>>> rtl change is correct and there is no error.
>>>
>>> for load fusion spill happens and we dont generate sequential register 
>>> pairs
>>> because pf spill candidate and lxvp gives incorrect results as 
>>> sequential register
>>> pairs are required for lxvp.
>>
>> Can you go into more detail?  How is the lxvp represented?  And how do
>> we end up not getting a sequential register pair?  What does the rtl
>> look like (before and after things have gone wrong)?
>>
>> It seems like either the rtl is not describing the result of the fusion
>> correctly or there is some problem in the .md description of lxvp.
>>
>
> After fusion pass:
>
> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
> [240])
> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
> (const_int 16 [0x10])) [1 MEM  
> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 
> {vsx_movv2df_64bit}
>  (nil))
> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
> [240])
> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM  
> [(real(kind=8) *)_4050]+16 ])
> (reg:V2DF 44 12 [3119])
> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
> [240]) {*vsx_nfmsv2df4}
>  (nil))
>
> In LRA reload.
>
> (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ])
> (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM 
>  [(real(kind=8) *)_4188]+0 S16 A64])) 
> "shell_lam.fppized.f":238:72 2187 {*movoo}
>  (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] 
> [1285]) [1 MEM  [(real(kind=8) *)_4188]+0 S16 
> A64])
> (nil)))
> (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ])
> (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM  real(kind=8)> [(real(kind=8) *)_4050] ]) 16)
> (reg:V2DF 4283 [3119])
> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 
> ]) 16)  {*vsx_nfmsv2df4}
>  (nil))
>
>
> In LRA reload sequential registers are not generated as r2572 is splled 
> and move to spill location
> in stack and subsequent uses loads from stack. Hence sequential registers 
> pairs are not generated.
>
> lxvp vsx0, 0(r1).
>
> It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use 
> sequential register pairs.
>
> Without load fusion since 2 loads exists and 2 loads need not require 
> sequential registers
> hence it worked but with load fusion and using lxvp it requires 
> sequential register pairs.

 Do you mean that this is a performance regression?  I.e. the fact that
 lxvp requires sequential registers causes extra spilling, due to having
 less allocation freedom?

 Or is it a correctness problem?  If 

Re: [PATCH] vect: Merge loop mask and cond_op mask in fold-left, reduction.

2024-06-10 Thread Richard Sandiford
Richard Sandiford  writes:
> Robin Dapp  writes:
>> Hi,
>>
>> currently we discard the cond-op mask when the loop is fully masked
>> which causes wrong code in
>> gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>> when compiled with
>> -O3 -march=cascadelake --param vect-partial-vector-usage=2.
>>
>> This patch ANDs both masks instead.
>>
>> Bootstrapped and regtested on x86, aarch64 and power10.
>> Regtested on riscv64 and armv8.8-a+sve via qemu.
>>
>> Regards
>>  Robin
>>
>> gcc/ChangeLog:
>>
>>  * tree-vect-loop.cc (vectorize_fold_left_reduction): Merge loop
>>  mask and cond-op mask.
>
> OK, thanks.

Actually, as Richard mentioned in the PR, it would probably be better
to use prepare_vec_mask instead.  It should work in this context too
and would avoid redundant double masking.

Thanks,
Richard

>
>> ---
>>  gcc/tree-vect-loop.cc | 16 +++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index 028692614bb..f9bf6a45611 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -7215,7 +7215,21 @@ vectorize_fold_left_reduction (loop_vec_info 
>> loop_vinfo,
>>tree len = NULL_TREE;
>>tree bias = NULL_TREE;
>>if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>> -mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
>> i);
>> +{
>> +  tree mask_loop = vect_get_loop_mask (loop_vinfo, gsi, masks,
>> +   vec_num, vectype_in, i);
>> +  if (is_cond_op)
>> +{
>> +  /* Merge the loop mask and the cond_op mask.  */
>> +  mask = make_ssa_name (TREE_TYPE (mask_loop));
>> +  gassign *and_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
>> +   mask_loop,
>> +   vec_opmask[i]);
>> +  gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT);
>> +}
>> +  else
>> +mask = mask_loop;
>> +}
>>else if (is_cond_op)
>>  mask = vec_opmask[i];
>>if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))


Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Ajit Agarwal
Hello Richard:

On 10/06/24 2:52 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> On 10/06/24 2:12 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
 +
 +rtx set = single_set (insn);
 +if (set == NULL_RTX)
 +  return false;
 +
 +rtx op0 = SET_SRC (set);
 +rtx_code code = GET_CODE (op0);
 +
 +// This check is added as register pairs are not generated
 +// by RA for neg:V2DF (fma: V2DF (reg1)
 +//  (reg2)
 +//  (neg:V2DF (reg3)))
 +if (GET_RTX_CLASS (code) == RTX_UNARY)
 +  return false;
>>>
>>> What's special about (neg (fma ...))?
>>>
>>
>> I am not sure why register allocator fails allocating register pairs 
>> with
>> NEG Unary operation with fma operand. I have not debugged register 
>> allocator why the NEG
>> Unary operation with fma operand. 
>

 For neg (fma ...) cases because of subreg 128 bits from OOmode 256 
 bits are
 set correctly. 
 IRA marked them spill candidates as spill priority is zero.

 Due to this LRA reload pass couldn't allocate register pairs.
>>>
>>> I think this is just restating the symptom though.  I suppose the same
>>> kind of questions apply here too: what was the instruction before the
>>> pass runs, what was the instruction after the pass runs, and why is
>>> the rtl change incorrect (by the meaning above)?
>>>
>>
>> Original case where we dont do load fusion, spill happens, in that
>> case we dont require sequential register pairs to be generated for 2 
>> loads
>> for. Hence it worked.
>>
>> rtl change is correct and there is no error.
>>
>> for load fusion spill happens and we dont generate sequential register 
>> pairs
>> because pf spill candidate and lxvp gives incorrect results as 
>> sequential register
>> pairs are required for lxvp.
>
> Can you go into more detail?  How is the lxvp represented?  And how do
> we end up not getting a sequential register pair?  What does the rtl
> look like (before and after things have gone wrong)?
>
> It seems like either the rtl is not describing the result of the fusion
> correctly or there is some problem in the .md description of lxvp.
>

 After fusion pass:

 (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
 [240])
 (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
 (const_int 16 [0x10])) [1 MEM  
 [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 
 {vsx_movv2df_64bit}
  (nil))
 (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
 [240])
 (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM  
 [(real(kind=8) *)_4050]+16 ])
 (reg:V2DF 44 12 [3119])
 (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
 [240]) {*vsx_nfmsv2df4}
  (nil))

 In LRA reload.

 (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ])
 (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM 
  [(real(kind=8) *)_4188]+0 S16 A64])) 
 "shell_lam.fppized.f":238:72 2187 {*movoo}
  (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] 
 [1285]) [1 MEM  [(real(kind=8) *)_4188]+0 S16 A64])
 (nil)))
 (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ])
 (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM >>> real(kind=8)> [(real(kind=8) *)_4050] ]) 16)
 (reg:V2DF 4283 [3119])
 (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 ]) 
 16)  {*vsx_nfmsv2df4}
  (nil))


 In LRA reload sequential registers are not generated as r2572 is splled 
 and move to spill location
 in stack and subsequent uses loads from stack. Hence sequential registers 
 pairs are not generated.

 lxvp vsx0, 0(r1).

 It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use 
 sequential register pairs.

 Without load fusion since 2 loads exists and 2 loads need not require 
 sequential registers
 hence it worked but with load fusion and using lxvp it requires sequential 
 register pairs.
>>>
>>> Do you mean that this is a performance regression?  I.e. the fact that
>>> lxvp requires sequential registers causes extra spilling, due to having
>>> less allocation freedom?
>>>
>>> Or is it a correctness problem?  If so, what is it?  Nothing in the rtl
>>> above looks wrong in principle (although I've no idea if the REG_EQUIV
>>> is correct in this 

[PATCH] tree-optimization/115395 - wrong-code with SLP reduction in epilog

2024-06-10 Thread Richard Biener
When we continue a non-SLP reduction from the main loop in the
epilog with a SLP reduction we currently fail to handle an
adjustment by the initial value because that's not a thing with SLP.
As long as we have the possibility to mix SLP and non-SLP we have
to handle it though.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115395
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Handle STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT also for SLP
reductions of group_size one.

* gcc.dg/vect/pr115395.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr115395.c | 27 +++
 gcc/tree-vect-loop.cc| 27 ---
 2 files changed, 35 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115395.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr115395.c 
b/gcc/testsuite/gcc.dg/vect/pr115395.c
new file mode 100644
index 000..cd1cee9f3df
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115395.c
@@ -0,0 +1,27 @@
+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
+
+#include "tree-vect.h"
+
+struct {
+  long header_size;
+  long start_offset;
+  long end_offset;
+} myrar_dbo[5] = {{0, 87, 6980}, {0, 7087, 13980}, {0, 14087, 0}};
+
+int i;
+long offset;
+
+int main()
+{
+  check_vect ();
+
+  offset += myrar_dbo[0].start_offset;
+  while (i < 2) {
+i++;
+offset += myrar_dbo[i].start_offset - myrar_dbo[i - 1].end_offset;
+  }
+  if (offset != 301)
+abort();
+
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f598403df46..d894ac1c067 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6047,25 +6047,14 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 
   tree induc_val = NULL_TREE;
   tree adjustment_def = NULL;
-  if (slp_node)
-{
-  /* Optimize: for induction condition reduction, if we can't use zero
-for induc_val, use initial_def.  */
-  if (STMT_VINFO_REDUC_TYPE (reduc_info) == INTEGER_INDUC_COND_REDUCTION)
-   induc_val = STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL (reduc_info);
-  /* ???  Coverage for 'else' isn't clear.  */
-}
+  /* Optimize: for induction condition reduction, if we can't use zero
+ for induc_val, use initial_def.  */
+  if (STMT_VINFO_REDUC_TYPE (reduc_info) == INTEGER_INDUC_COND_REDUCTION)
+induc_val = STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL (reduc_info);
+  else if (double_reduc)
+;
   else
-{
-  /* Optimize: for induction condition reduction, if we can't use zero
- for induc_val, use initial_def.  */
-  if (STMT_VINFO_REDUC_TYPE (reduc_info) == INTEGER_INDUC_COND_REDUCTION)
-   induc_val = STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL (reduc_info);
-  else if (double_reduc)
-   ;
-  else
-   adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info);
-}
+adjustment_def = STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info);
 
   stmt_vec_info single_live_out_stmt[] = { stmt_info };
   array_slice live_out_stmts = single_live_out_stmt;
@@ -6890,7 +6879,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 
   if (adjustment_def)
 {
-  gcc_assert (!slp_reduc);
+  gcc_assert (!slp_reduc || group_size == 1);
   gimple_seq stmts = NULL;
   if (double_reduc)
{
-- 
2.35.3


Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Richard Sandiford
Ajit Agarwal  writes:
> On 10/06/24 2:12 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
>>> +
>>> + rtx set = single_set (insn);
>>> + if (set == NULL_RTX)
>>> +   return false;
>>> +
>>> + rtx op0 = SET_SRC (set);
>>> + rtx_code code = GET_CODE (op0);
>>> +
>>> + // This check is added as register pairs are not generated
>>> + // by RA for neg:V2DF (fma: V2DF (reg1)
>>> + //  (reg2)
>>> + //  (neg:V2DF (reg3)))
>>> + if (GET_RTX_CLASS (code) == RTX_UNARY)
>>> +   return false;
>>
>> What's special about (neg (fma ...))?
>>
>
> I am not sure why register allocator fails allocating register pairs 
> with
> NEG Unary operation with fma operand. I have not debugged register 
> allocator why the NEG
> Unary operation with fma operand. 

>>>
>>> For neg (fma ...) cases because of subreg 128 bits from OOmode 256 bits 
>>> are
>>> set correctly. 
>>> IRA marked them spill candidates as spill priority is zero.
>>>
>>> Due to this LRA reload pass couldn't allocate register pairs.
>>
>> I think this is just restating the symptom though.  I suppose the same
>> kind of questions apply here too: what was the instruction before the
>> pass runs, what was the instruction after the pass runs, and why is
>> the rtl change incorrect (by the meaning above)?
>>
>
> Original case where we dont do load fusion, spill happens, in that
> case we dont require sequential register pairs to be generated for 2 loads
> for. Hence it worked.
>
> rtl change is correct and there is no error.
>
> for load fusion spill happens and we dont generate sequential register 
> pairs
> because pf spill candidate and lxvp gives incorrect results as sequential 
> register
> pairs are required for lxvp.

 Can you go into more detail?  How is the lxvp represented?  And how do
 we end up not getting a sequential register pair?  What does the rtl
 look like (before and after things have gone wrong)?

 It seems like either the rtl is not describing the result of the fusion
 correctly or there is some problem in the .md description of lxvp.

>>>
>>> After fusion pass:
>>>
>>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>> [240])
>>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
>>> (const_int 16 [0x10])) [1 MEM  
>>> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 
>>> {vsx_movv2df_64bit}
>>>  (nil))
>>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>> [240])
>>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM  
>>> [(real(kind=8) *)_4050]+16 ])
>>> (reg:V2DF 44 12 [3119])
>>> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>>> [240]) {*vsx_nfmsv2df4}
>>>  (nil))
>>>
>>> In LRA reload.
>>>
>>> (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ])
>>> (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM 
>>>  [(real(kind=8) *)_4188]+0 S16 A64])) 
>>> "shell_lam.fppized.f":238:72 2187 {*movoo}
>>>  (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] 
>>> [1285]) [1 MEM  [(real(kind=8) *)_4188]+0 S16 A64])
>>> (nil)))
>>> (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ])
>>> (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM >> real(kind=8)> [(real(kind=8) *)_4050] ]) 16)
>>> (reg:V2DF 4283 [3119])
>>> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 ]) 
>>> 16)  {*vsx_nfmsv2df4}
>>>  (nil))
>>>
>>>
>>> In LRA reload sequential registers are not generated as r2572 is splled and 
>>> move to spill location
>>> in stack and subsequent uses loads from stack. Hence sequential registers 
>>> pairs are not generated.
>>>
>>> lxvp vsx0, 0(r1).
>>>
>>> It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use 
>>> sequential register pairs.
>>>
>>> Without load fusion since 2 loads exists and 2 loads need not require 
>>> sequential registers
>>> hence it worked but with load fusion and using lxvp it requires sequential 
>>> register pairs.
>> 
>> Do you mean that this is a performance regression?  I.e. the fact that
>> lxvp requires sequential registers causes extra spilling, due to having
>> less allocation freedom?
>> 
>> Or is it a correctness problem?  If so, what is it?  Nothing in the rtl
>> above looks wrong in principle (although I've no idea if the REG_EQUIV
>> is correct in this context).  What does the allocated code look like,
>> and why is it wrong?
>> 
>> If (reg:OO 2561) is spilled and then one half of it used, only that half
>> needs to be loaded from the 

[PATCH v2 0/6] Add DLL import/export implementation to AArch64

2024-06-10 Thread Evgeny Karpov
The patch series has been successfully verified by patchwork,
after resolving the issue with the mailing client.
https://patchwork.sourceware.org/project/gcc/list/?series=34865

The x86_64-w64-mingw32 build has been tested, and no regressions 
have been detected after applying the patch series.
https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build/actions/runs/9417869213

The patch series has been approved by the x64 and mingw maintainers. 

Richard, could we proceed with merging the patch series? Thanks.

Regards,
Evgeny


[COMMITTED 29/30] ada: Storage_Error in indirect call to function returning limited type

2024-06-10 Thread Marc Poulhiès
From: Javier Miranda 

At runtime the code generated by the compiler reports the
exception Storage_Error in an indirect call through an
access-to-subprogram variable that references a function
returning a limited tagged type object.

gcc/ada/

* sem_ch6.adb (Might_Need_BIP_Task_Actuals): Add support
for access-to-subprogram parameter types.
* exp_ch6.adb (Add_Task_Actuals_To_Build_In_Place_Call):
Add dummy BIP parameters to access-to-subprogram types
that may reference a function that has BIP parameters.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch6.adb | 11 ---
 gcc/ada/sem_ch6.adb | 12 +++-
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
index b5c5865242d..005210ce6bd 100644
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -642,15 +642,20 @@ package body Exp_Ch6 is
   Master_Formal : Node_Id;
 
begin
+  pragma Assert (Ekind (Function_Id) in E_Function
+  | E_Subprogram_Type);
+
   --  No such extra parameters are needed if there are no tasks
 
   if not Needs_BIP_Task_Actuals (Function_Id) then
 
  --  However we must add dummy extra actuals if the function is
- --  a dispatching operation that inherited these extra formals.
+ --  a dispatching operation that inherited these extra formals
+ --  or an access-to-subprogram type that requires these extra
+ --  actuals.
 
- if Is_Dispatching_Operation (Function_Id)
-   and then Has_BIP_Extra_Formal (Function_Id, BIP_Task_Master)
+ if Has_BIP_Extra_Formal (Function_Id, BIP_Task_Master,
+  Must_Be_Frozen => False)
  then
 Master_Formal :=
   Build_In_Place_Formal (Function_Id, BIP_Task_Master);
diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index ca40b5479e0..50dac5c4a51 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -8663,9 +8663,12 @@ package body Sem_Ch6 is
   --  Determines if E has its extra formals
 
   function Might_Need_BIP_Task_Actuals (E : Entity_Id) return Boolean;
-  --  Determines if E is a dispatching primitive returning a limited tagged
-  --  type object since some descendant might return an object with tasks
-  --  (and therefore need the BIP task extra actuals).
+  --  Determines if E is a function or an access to a function returning a
+  --  limited tagged type object. On dispatching primitives this predicate
+  --  is used to determine if some descendant of the function might return
+  --  an object with tasks (and therefore need the BIP task extra actuals).
+  --  On access-to-subprogram types it is used to determine if the target
+  --  function might return an object with tasks.
 
   function Needs_Accessibility_Check_Extra
 (E  : Entity_Id;
@@ -8786,9 +8789,8 @@ package body Sem_Ch6 is
 
  Func_Typ := Root_Type (Underlying_Type (Etype (Subp_Id)));
 
- return Ekind (Subp_Id) = E_Function
+ return Ekind (Subp_Id) in E_Function | E_Subprogram_Type
and then not Has_Foreign_Convention (Func_Typ)
-   and then Is_Dispatching_Operation (Subp_Id)
and then Is_Tagged_Type (Func_Typ)
and then Is_Limited_Type (Func_Typ)
and then not Has_Aspect (Func_Typ, Aspect_No_Task_Parts);
-- 
2.45.1



[COMMITTED 25/30] ada: Resolve compilation issues with container aggregates in draft ACATS B tests

2024-06-10 Thread Marc Poulhiès
From: Gary Dismukes 

This change set addresses compilation problems encountered in the draft
versions of the following ACATS B tests for container aggregates:

B435001 (container aggregates with Assign_Indexed)
B435002 (container aggregates with Add_Unnamed)
B435003 (container aggregates with Add_Named)
B435004 (container aggregates with Assign_Indexed and Add_Unnamed)

gcc/ada/

* sem_aggr.adb (Resolve_Iterated_Association): In the case of
N_Iterated_Element_Associations that have a key expression, issue
an error if the aggregate type does not have an Add_Named
operation, and include a reference to RM22 4.3.5(24) in the error
message. In the case of an N_Component_Association with a
Defining_Identifer where the "choice" is given by a function call,
in the creation of the iterator_specification associate a copy of
Choice as its Name, and remove the call to
Analyze_Iterator_Specification, which was causing problems with
the reanalysis of function calls originally given in prefixed form
that were transformed into function calls in normal (infix) form.
The iterator_specification will be analyzed later in any case, so
that call should not be done here. Remove the with and use of
Sem_Ch5.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index 60738550ec1..51b88ab831f 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -49,7 +49,6 @@ with Sem_Aux;use Sem_Aux;
 with Sem_Case;   use Sem_Case;
 with Sem_Cat;use Sem_Cat;
 with Sem_Ch3;use Sem_Ch3;
-with Sem_Ch5;use Sem_Ch5;
 with Sem_Ch8;use Sem_Ch8;
 with Sem_Ch13;   use Sem_Ch13;
 with Sem_Dim;use Sem_Dim;
@@ -3381,7 +3380,15 @@ package body Sem_Aggr is
 
 Key_Expr := Key_Expression (Comp);
 if Present (Key_Expr) then
-   Preanalyze_And_Resolve (New_Copy_Tree (Key_Expr), Key_Type);
+   if not Present (Add_Named_Subp) then
+  Error_Msg_N
+("iterated_element_association with key_expression only "
+   & "allowed for container type with Add_Named operation "
+   & "(RM22 4.3.5(24))",
+ Comp);
+   else
+  Preanalyze_And_Resolve (New_Copy_Tree (Key_Expr), Key_Type);
+   end if;
 end if;
 End_Scope;
 
@@ -3414,6 +3421,16 @@ package body Sem_Aggr is
  else
 Choice := First (Discrete_Choices (Comp));
 
+--  A copy of Choice is made before it's analyzed, to preserve
+--  prefixed calls in their original form, because otherwise the
+--  analysis of Choice can transform such calls to normal form,
+--  and the later analysis of an iterator_specification created
+--  below in the case of a function-call choice may trigger an
+--  error on the call (in the case where the function is not
+--  directly visible).
+
+Copy := Copy_Separate_Tree (Choice);
+
 --  This is an N_Component_Association with a Defining_Identifier
 --  and Discrete_Choice_List, but the latter can only have a single
 --  choice, as it's a stand-in for a Loop_Parameter_Specification
@@ -3437,7 +3454,7 @@ package body Sem_Aggr is
 Make_Iterator_Specification (Sloc (N),
   Defining_Identifier =>
 Relocate_Node (Defining_Identifier (Comp)),
-  Name=> New_Copy_Tree (Choice),
+  Name=> Copy,
   Reverse_Present => False,
   Iterator_Filter => Empty,
   Subtype_Indication  => Empty);
@@ -3445,9 +3462,6 @@ package body Sem_Aggr is
   Set_Iterator_Specification (Comp, I_Spec);
   Set_Defining_Identifier (Comp, Empty);
 
-  Analyze_Iterator_Specification
-(Iterator_Specification (Comp));
-
   Resolve_Iterated_Association (Comp, Key_Type, Elmt_Type);
   --  Recursive call to expand association as iterator_spec
 
-- 
2.45.1



[COMMITTED 24/30] ada: Missing style check for extra parentheses in operators

2024-06-10 Thread Marc Poulhiès
From: Justin Squirek 

This patch fixes an issue in the compiler whereby wrapping an operand
of a boolean operator resulted in a failure to detect whether or not
they were unnecessary for the -gnatyx style checks.

gcc/ada/

* ali.adb (Get_Nat): Remove unnecessary parentheses.
* exp_ch11.adb (Expand_Local_Exception_Handlers): Remove
unnecessary parentheses.
* freeze.adb (Freeze_Entity): Remove unnecessary parentheses.
* lib-list.adb (List): Remove unnecessary parentheses.
* par-ch5.adb (P_Condition): Add extra parentheses checks on
condition operands.
* sem_ch3.adb (Add_Interface_Tag_Components): Remove unnecessary
parentheses.
(Check_Delta_Expression): Remove unnecessary parenthesis.
(Check_Digits_Expression): Remove unnecessary parentheses.
* sem_ch12.adb (Validate_Array_Type_Instance): Remove unnecessary
parentheses.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/ali.adb  |  2 +-
 gcc/ada/exp_ch11.adb |  2 +-
 gcc/ada/freeze.adb   |  2 +-
 gcc/ada/lib-list.adb |  4 ++--
 gcc/ada/par-ch5.adb  | 25 +
 gcc/ada/sem_ch12.adb |  2 +-
 gcc/ada/sem_ch3.adb  |  6 +++---
 7 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/ali.adb b/gcc/ada/ali.adb
index 69a91bce5ab..7c7f790325b 100644
--- a/gcc/ada/ali.adb
+++ b/gcc/ada/ali.adb
@@ -1351,7 +1351,7 @@ package body ALI is
  --  Check if we are on a number. In the case of bad ALI files, this
  --  may not be true.
 
- if not (Nextc in '0' .. '9') then
+ if Nextc not in '0' .. '9' then
 Fatal_Error;
  end if;
 
diff --git a/gcc/ada/exp_ch11.adb b/gcc/ada/exp_ch11.adb
index 9a0f66ff440..678d76cf3eb 100644
--- a/gcc/ada/exp_ch11.adb
+++ b/gcc/ada/exp_ch11.adb
@@ -552,7 +552,7 @@ package body Exp_Ch11 is
 
  --  Nothing to do if no handlers requiring the goto transformation
 
- if not (Local_Expansion_Required) then
+ if not Local_Expansion_Required then
 return;
  end if;
 
diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index ea6106e6455..ea18f87a4ab 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -6963,7 +6963,7 @@ package body Freeze is
if Is_Type (Comp) then
   Freeze_And_Append (Comp, N, Result);
 
-   elsif (Ekind (Comp)) /= E_Function then
+   elsif Ekind (Comp) /= E_Function then
 
   --  The guard on the presence of the Etype seems to be needed
   --  for some CodePeer (-gnatcC) cases, but not clear why???
diff --git a/gcc/ada/lib-list.adb b/gcc/ada/lib-list.adb
index ecc29258e13..210827abf8e 100644
--- a/gcc/ada/lib-list.adb
+++ b/gcc/ada/lib-list.adb
@@ -80,7 +80,7 @@ begin
   else
  Write_Unit_Name (Unit_Name (Sorted_Units (R)));
 
- if Name_Len > (Unit_Length - 1) then
+ if Name_Len > Unit_Length - 1 then
 Write_Eol;
 Write_Str (Unit_Bln);
  else
@@ -91,7 +91,7 @@ begin
 
  Write_Name (Full_File_Name (Source_Index (Sorted_Units (R;
 
- if Name_Len > (File_Length - 1) then
+ if Name_Len > File_Length - 1 then
 Write_Eol;
 Write_Str (Unit_Bln);
 Write_Str (File_Bln);
diff --git a/gcc/ada/par-ch5.adb b/gcc/ada/par-ch5.adb
index d72ddffdece..68c3025e3a0 100644
--- a/gcc/ada/par-ch5.adb
+++ b/gcc/ada/par-ch5.adb
@@ -1360,6 +1360,31 @@ package body Ch5 is
   else
  if Style_Check then
 Style.Check_Xtra_Parens (Cond);
+
+--  When the condition is an operator then examine parentheses
+--  surrounding the condition's operands - taking care to avoid
+--  flagging operands which themselves are operators since they
+--  may be required for resolution or precedence.
+
+if Nkind (Cond) in N_Op
+ | N_Membership_Test
+ | N_Short_Circuit
+  and then Nkind (Right_Opnd (Cond)) not in N_Op
+  | N_Membership_Test
+  | N_Short_Circuit
+then
+   Style.Check_Xtra_Parens (Right_Opnd (Cond));
+end if;
+
+if Nkind (Cond) in N_Binary_Op
+ | N_Membership_Test
+ | N_Short_Circuit
+  and then Nkind (Left_Opnd (Cond)) not in N_Op
+ | N_Membership_Test
+ | N_Short_Circuit
+then
+   Style.Check_Xtra_Parens (Left_Opnd (Cond));
+end if;
  end if;
 
  --  And return the result
diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 9919cda6340..7daa35f7fe1 100644
--- 

Re: [PATCH] vect: Merge loop mask and cond_op mask in fold-left, reduction.

2024-06-10 Thread Richard Sandiford
Robin Dapp  writes:
> Hi,
>
> currently we discard the cond-op mask when the loop is fully masked
> which causes wrong code in
> gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> when compiled with
> -O3 -march=cascadelake --param vect-partial-vector-usage=2.
>
> This patch ANDs both masks instead.
>
> Bootstrapped and regtested on x86, aarch64 and power10.
> Regtested on riscv64 and armv8.8-a+sve via qemu.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
>   * tree-vect-loop.cc (vectorize_fold_left_reduction): Merge loop
>   mask and cond-op mask.

OK, thanks.

Richard

> ---
>  gcc/tree-vect-loop.cc | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 028692614bb..f9bf6a45611 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7215,7 +7215,21 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>tree len = NULL_TREE;
>tree bias = NULL_TREE;
>if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> - mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
> i);
> + {
> +   tree mask_loop = vect_get_loop_mask (loop_vinfo, gsi, masks,
> +vec_num, vectype_in, i);
> +   if (is_cond_op)
> + {
> +   /* Merge the loop mask and the cond_op mask.  */
> +   mask = make_ssa_name (TREE_TYPE (mask_loop));
> +   gassign *and_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> +mask_loop,
> +vec_opmask[i]);
> +   gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT);
> + }
> +   else
> + mask = mask_loop;
> + }
>else if (is_cond_op)
>   mask = vec_opmask[i];
>if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))


[COMMITTED 21/30] ada: Unreferenced warning on abstract subprogram

2024-06-10 Thread Marc Poulhiès
From: Justin Squirek 

This patch modifies the unreferenced entity warning in the compiler to avoid
noisily warning about unreferenced abstract subprogram.

gcc/ada/

* sem_warn.adb (Warn_On_Unreferenced_Entity): Add a condition to
ignore warnings on unreferenced abstract subprogram.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_warn.adb | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/sem_warn.adb b/gcc/ada/sem_warn.adb
index 2de3f8668b0..91a57d521d1 100644
--- a/gcc/ada/sem_warn.adb
+++ b/gcc/ada/sem_warn.adb
@@ -4452,12 +4452,16 @@ package body Sem_Warn is
  ("?u?literal & is not referenced!", E);
 
 when E_Function =>
-   Error_Msg_N -- CODEFIX
- ("?u?function & is not referenced!", E);
+   if not Is_Abstract_Subprogram (E) then
+  Error_Msg_N -- CODEFIX
+("?u?function & is not referenced!", E);
+   end if;
 
 when E_Procedure =>
-   Error_Msg_N -- CODEFIX
- ("?u?procedure & is not referenced!", E);
+   if not Is_Abstract_Subprogram (E) then
+  Error_Msg_N -- CODEFIX
+("?u?procedure & is not referenced!", E);
+   end if;
 
 when E_Package =>
Error_Msg_N -- CODEFIX
-- 
2.45.1



[COMMITTED 26/30] ada: For freezing, treat an extension or delta aggregate like a regular aggregate.

2024-06-10 Thread Marc Poulhiès
From: Steve Baird 

Extend existing special freezing rules for regular aggregates to also apply to
extension and delta aggregates.

gcc/ada/

* freeze.adb
(Should_Freeze_Type.Is_Dispatching_Call_Or_Aggregate): Treat an 
extension
aggregate or a delta aggregate like a regular aggregate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/freeze.adb | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index ea18f87a4ab..c872050dd35 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -222,7 +222,9 @@ package body Freeze is
   = Scope (Typ)
  then
 return Abandon;
- elsif Nkind (N) = N_Aggregate
+ elsif Nkind (N) in N_Aggregate
+  | N_Extension_Aggregate
+  | N_Delta_Aggregate
and then Base_Type (Etype (N)) = Base_Type (Typ)
  then
 return Abandon;
-- 
2.45.1



[COMMITTED 14/30] ada: Remove incorrect assertion in run-time

2024-06-10 Thread Marc Poulhiès
From: Ronan Desplanques 

There is a special case of file paths on Windows that are absolute
but don't start with a drive letter: UNC paths. This patch removes
an assertion in System.OS_Lib.Normalize_Pathname that failed to take
this case into account. It also renames a local subprogram of
Normalize_Pathname to make its purpose clearer.

gcc/ada/

* libgnat/s-os_lib.adb (Normalize_Pathname): Remove incorrect
assert statement.
(Missed_Drive_Letter): Rename into...
(Drive_Letter_Omitted): This.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-os_lib.adb | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/ada/libgnat/s-os_lib.adb b/gcc/ada/libgnat/s-os_lib.adb
index 20e109aaa0b..dd2156e1dcb 100644
--- a/gcc/ada/libgnat/s-os_lib.adb
+++ b/gcc/ada/libgnat/s-os_lib.adb
@@ -2089,8 +2089,10 @@ package body System.OS_Lib is
   --  Returns True only if the Name is including a drive
   --  letter at start.
 
-  function Missed_Drive_Letter (Name : String) return Boolean;
-  --  Missed drive letter at start of the normalized pathname
+  function Drive_Letter_Omitted (Name : String) return Boolean;
+  --  Name must be an absolute path. Returns True if and only if
+  --  Name doesn't start with a drive letter and Name is not a
+  --  UNC path.
 
   ---
   -- Is_With_Drive --
@@ -2104,11 +2106,11 @@ package body System.OS_Lib is
  or else Name (Name'First) in 'A' .. 'Z');
   end Is_With_Drive;
 
-  -
-  -- Missed_Drive_Letter --
-  -
+  --
+  -- Drive_Letter_Omitted --
+  --
 
-  function Missed_Drive_Letter (Name : String) return Boolean is
+  function Drive_Letter_Omitted (Name : String) return Boolean is
   begin
  return On_Windows
and then not Is_With_Drive (Name)
@@ -2117,7 +2119,7 @@ package body System.OS_Lib is
  /= Directory_Separator
  or else Name (Name'First + 1)
  /= Directory_Separator);
-  end Missed_Drive_Letter;
+  end Drive_Letter_Omitted;
 
   -
   -- Final_Value --
@@ -2174,7 +2176,7 @@ package body System.OS_Lib is
 
  elsif Directory = ""
or else not Is_Absolute_Path (Directory)
-   or else Missed_Drive_Letter (Directory)
+   or else Drive_Letter_Omitted (Directory)
  then
 --  Directory name not given or it is not absolute or without drive
 --  letter on Windows, get current directory.
@@ -2251,7 +2253,7 @@ package body System.OS_Lib is
   end if;
 
   if Is_Absolute_Path (Name) then
- if Missed_Drive_Letter (Name) then
+ if Drive_Letter_Omitted (Name) then
 Fill_Directory (Drive_Only => True);
 
 --  Take only drive letter part with colon
@@ -2286,8 +2288,6 @@ package body System.OS_Lib is
 
  --  Ensure drive letter is upper-case
 
- pragma Assert (Path_Buffer (2) = ':');
-
  if Path_Buffer (1) in 'a' .. 'z' then
 System.Case_Util.To_Upper (Path_Buffer (1 .. 1));
  end if;
-- 
2.45.1



Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-10 Thread Ajit Agarwal
Hello Richard:

On 10/06/24 2:12 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> +
>> +  rtx set = single_set (insn);
>> +  if (set == NULL_RTX)
>> +return false;
>> +
>> +  rtx op0 = SET_SRC (set);
>> +  rtx_code code = GET_CODE (op0);
>> +
>> +  // This check is added as register pairs are not generated
>> +  // by RA for neg:V2DF (fma: V2DF (reg1)
>> +  //  (reg2)
>> +  //  (neg:V2DF (reg3)))
>> +  if (GET_RTX_CLASS (code) == RTX_UNARY)
>> +return false;
>
> What's special about (neg (fma ...))?
>

 I am not sure why register allocator fails allocating register pairs 
 with
 NEG Unary operation with fma operand. I have not debugged register 
 allocator why the NEG
 Unary operation with fma operand. 
>>>
>>
>> For neg (fma ...) cases because of subreg 128 bits from OOmode 256 bits 
>> are
>> set correctly. 
>> IRA marked them spill candidates as spill priority is zero.
>>
>> Due to this LRA reload pass couldn't allocate register pairs.
>
> I think this is just restating the symptom though.  I suppose the same
> kind of questions apply here too: what was the instruction before the
> pass runs, what was the instruction after the pass runs, and why is
> the rtl change incorrect (by the meaning above)?
>

 Original case where we dont do load fusion, spill happens, in that
 case we dont require sequential register pairs to be generated for 2 loads
 for. Hence it worked.

 rtl change is correct and there is no error.

 for load fusion spill happens and we dont generate sequential register 
 pairs
 because pf spill candidate and lxvp gives incorrect results as sequential 
 register
 pairs are required for lxvp.
>>>
>>> Can you go into more detail?  How is the lxvp represented?  And how do
>>> we end up not getting a sequential register pair?  What does the rtl
>>> look like (before and after things have gone wrong)?
>>>
>>> It seems like either the rtl is not describing the result of the fusion
>>> correctly or there is some problem in the .md description of lxvp.
>>>
>>
>> After fusion pass:
>>
>> (insn 9299 2472 2412 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>> [240])
>> (mem:V2DF (plus:DI (reg:DI 8 8 [orig:1285 ivtmp.886 ] [1285])
>> (const_int 16 [0x10])) [1 MEM  
>> [(real(kind=8) *)_4188]+16 S16 A64])) "shell_lam.fppized.f":238:72 1190 
>> {vsx_movv2df_64bit}
>>  (nil))
>> (insn 2412 9299 2477 187 (set (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>> [240])
>> (neg:V2DF (fma:V2DF (reg:V2DF 39 7 [ MEM  
>> [(real(kind=8) *)_4050]+16 ])
>> (reg:V2DF 44 12 [3119])
>> (neg:V2DF (reg:V2DF 51 19 [orig:240 vect__302.545 ] 
>> [240]) {*vsx_nfmsv2df4}
>>  (nil))
>>
>> In LRA reload.
>>
>> (insn 2472 2461 2412 161 (set (reg:OO 2572 [ vect__300.543_236 ])
>> (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] [1285]) [1 MEM 
>>  [(real(kind=8) *)_4188]+0 S16 A64])) 
>> "shell_lam.fppized.f":238:72 2187 {*movoo}
>>  (expr_list:REG_EQUIV (mem:OO (reg:DI 4260 [orig:1285 ivtmp.886 ] 
>> [1285]) [1 MEM  [(real(kind=8) *)_4188]+0 S16 A64])
>> (nil)))
>> (insn 2412 2472 2477 161 (set (reg:V2DF 240 [ vect__302.545 ])
>> (neg:V2DF (fma:V2DF (subreg:V2DF (reg:OO 2561 [ MEM > real(kind=8)> [(real(kind=8) *)_4050] ]) 16)
>> (reg:V2DF 4283 [3119])
>> (neg:V2DF (subreg:V2DF (reg:OO 2572 [ vect__300.543_236 ]) 
>> 16)  {*vsx_nfmsv2df4}
>>  (nil))
>>
>>
>> In LRA reload sequential registers are not generated as r2572 is splled and 
>> move to spill location
>> in stack and subsequent uses loads from stack. Hence sequential registers 
>> pairs are not generated.
>>
>> lxvp vsx0, 0(r1).
>>
>> It loads from from r1+0 into vsx0 and vsx1 and appropriate uses use 
>> sequential register pairs.
>>
>> Without load fusion since 2 loads exists and 2 loads need not require 
>> sequential registers
>> hence it worked but with load fusion and using lxvp it requires sequential 
>> register pairs.
> 
> Do you mean that this is a performance regression?  I.e. the fact that
> lxvp requires sequential registers causes extra spilling, due to having
> less allocation freedom?
> 
> Or is it a correctness problem?  If so, what is it?  Nothing in the rtl
> above looks wrong in principle (although I've no idea if the REG_EQUIV
> is correct in this context).  What does the allocated code look like,
> and why is it wrong?
> 
> If (reg:OO 2561) is spilled and then one half of it used, only that half
> needs to be loaded from the spill slot.  E.g. if (reg:OO 2561) is reloaded
> for insn 2412 on its own, only the second half of the register needs 

[COMMITTED 28/30] ada: Derived type with convention C must override convention C_Pass_By_Copy

2024-06-10 Thread Marc Poulhiès
From: Gary Dismukes 

If a type DT is derived from a record type T with convention C_Pass_By_Copy
and explicitly specifies convention C (via aspect or pragma), then type DT
should not be treated as a type with convention C_Pass_By_Copy. Any parameters
of the derived type should be passed by reference rather than by copy. The
compiler was incorrectly inheriting convention C_Pass_By_Copy, by inheriting
the flag set on the parent type, but that flag needs to be unset in the case
where the convention is overridden.

gcc/ada/

* sem_prag.adb (Set_Convention_From_Pragma): If the specified 
convention on
a record type is not C_Pass_By_Copy, then force the C_Pass_By_Copy flag 
to
False, to ensure that it's overridden.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 9ccf1b9cf65..671b2a542ea 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -8498,6 +8498,15 @@ package body Sem_Prag is
end if;
 end if;
 
+--  If the convention of a record type is changed (such as to C),
+--  this must override C_Pass_By_Copy if that flag was inherited
+--  from a parent type where the latter convention was specified,
+--  so we force the flag to False.
+
+if Cname /= Name_C_Pass_By_Copy and then Is_Record_Type (E) then
+   Set_C_Pass_By_Copy (Base_Type (E), False);
+end if;
+
 --  If the entity is a derived boolean type, check for the special
 --  case of convention C, C++, or Fortran, where we consider any
 --  nonzero value to represent true.
-- 
2.45.1



[COMMITTED 23/30] ada: Iterator filter ignored on formal loop

2024-06-10 Thread Marc Poulhiès
From: Justin Squirek 

This patch fixs an issue where iterator filters for formal container and
formal container element loops got silently ignored and remained unexpanded.

gcc/ada/

* exp_ch5.adb (Expand_Formal_Container_Element_Loop): Add
expansion of filter condition.
(Expand_Formal_Container_Loop): Add expansion of filter condition.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch5.adb | 45 +
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/gcc/ada/exp_ch5.adb b/gcc/ada/exp_ch5.adb
index 2973658ce98..f397086d73a 100644
--- a/gcc/ada/exp_ch5.adb
+++ b/gcc/ada/exp_ch5.adb
@@ -4394,6 +4394,18 @@ package body Exp_Ch5 is
   Reinit_Field_To_Zero (Init_Name, F_SPARK_Pragma_Inherited);
   Mutate_Ekind (Init_Name, E_Loop_Parameter);
 
+  --  Wrap the block statements with the condition specified in the
+  --  iterator filter when one is present.
+
+  if Present (Iterator_Filter (I_Spec)) then
+ pragma Assert (Ada_Version >= Ada_2022);
+ Set_Statements (Handled_Statement_Sequence (N),
+New_List (Make_If_Statement (Loc,
+  Condition => Iterator_Filter (I_Spec),
+  Then_Statements =>
+Statements (Handled_Statement_Sequence (N);
+  end if;
+
   --  The cursor was marked as a loop parameter to prevent user assignments
   --  to it, however this renders the advancement step illegal as it is not
   --  possible to change the value of a constant. Flag the advancement step
@@ -4436,6 +4448,7 @@ package body Exp_Ch5 is
   Advance   : Node_Id;
   Init  : Node_Id;
   New_Loop  : Node_Id;
+  Block : Node_Id;
 
begin
   --  For an element iterator, the Element aspect must be present,
@@ -4456,7 +4469,6 @@ package body Exp_Ch5 is
 
   Build_Formal_Container_Iteration
 (N, Container, Cursor, Init, Advance, New_Loop);
-  Append_To (Stats, Advance);
 
   Mutate_Ekind (Cursor, E_Variable);
   Insert_Action (N, Init);
@@ -4481,13 +4493,30 @@ package body Exp_Ch5 is
 Convert_To_Iterable_Type (Container, Loc),
 New_Occurrence_Of (Cursor, Loc;
 
-  Set_Statements (New_Loop,
-New_List
-  (Make_Block_Statement (Loc,
- Declarations => New_List (Elmt_Decl),
- Handled_Statement_Sequence =>
-   Make_Handled_Sequence_Of_Statements (Loc,
- Statements => Stats;
+  Block :=
+Make_Block_Statement (Loc,
+  Declarations => New_List (Elmt_Decl),
+  Handled_Statement_Sequence =>
+Make_Handled_Sequence_Of_Statements (Loc,
+  Statements => Stats));
+
+  --  Wrap the block statements with the condition specified in the
+  --  iterator filter when one is present.
+
+  if Present (Iterator_Filter (I_Spec)) then
+ pragma Assert (Ada_Version >= Ada_2022);
+ Set_Statements (Handled_Statement_Sequence (Block),
+New_List (
+  Make_If_Statement (Loc,
+Condition   => Iterator_Filter (I_Spec),
+Then_Statements =>
+  Statements (Handled_Statement_Sequence (Block))),
+  Advance));
+  else
+ Append_To (Stats, Advance);
+  end if;
+
+  Set_Statements (New_Loop, New_List (Block));
 
   --  The element is only modified in expanded code, so it appears as
   --  unassigned to the warning machinery. We must suppress this spurious
-- 
2.45.1



[COMMITTED 12/30] ada: Cleanup repeated code in expansion of stream attributes

2024-06-10 Thread Marc Poulhiès
From: Piotr Trojanek 

In expansion of various attributes, in particular for the Input/Output
and Read/Write attributes, we can use constants that are already used
for expansion of many other attributes.

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference): Use constants
declared at the beginning of subprogram; tune layout.
* exp_ch3.adb (Predefined_Primitive_Bodies): Tune layout.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_attr.adb | 36 +++-
 gcc/ada/exp_ch3.adb  |  3 +--
 2 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index 69428142839..0349db28a1a 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -179,7 +179,6 @@ package body Exp_Attr is
--* Rec_Typ - the record type whose internals are to be validated
 
function Default_Streaming_Unavailable (Typ : Entity_Id) return Boolean;
-   --
--  In most cases, references to unavailable streaming attributes
--  are rejected at compile time. In some obscure cases involving
--  generics and formal derived types, the problem is dealt with at runtime.
@@ -4091,10 +4090,8 @@ package body Exp_Attr is
   --
 
   when Attribute_Has_Same_Storage => Has_Same_Storage : declare
- Loc : constant Source_Ptr := Sloc (N);
-
- X   : constant Node_Id := Prefix (N);
- Y   : constant Node_Id := First (Expressions (N));
+ X : constant Node_Id := Pref;
+ Y : constant Node_Id := First (Exprs);
  --  The arguments
 
  X_Addr : Node_Id;
@@ -4363,7 +4360,7 @@ package body Exp_Attr is
 
  if Restriction_Active (No_Streams) then
 Rewrite (N,
-  Make_Raise_Program_Error (Sloc (N),
+  Make_Raise_Program_Error (Loc,
 Reason => PE_Stream_Operation_Not_Allowed));
 Set_Etype (N, B_Type);
 return;
@@ -4415,7 +4412,7 @@ package body Exp_Attr is
--  case where a No_Streams restriction is active.
 
Rewrite (N,
- Make_Raise_Program_Error (Sloc (N),
+ Make_Raise_Program_Error (Loc,
Reason => PE_Stream_Operation_Not_Allowed));
Set_Etype (N, B_Type);
return;
@@ -5295,10 +5292,8 @@ package body Exp_Attr is
   --
 
   when Attribute_Overlaps_Storage => Overlaps_Storage : declare
- Loc : constant Source_Ptr := Sloc (N);
- X   : constant Node_Id:= Prefix (N);
- Y   : constant Node_Id:= First (Expressions (N));
-
+ X : constant Node_Id := Pref;
+ Y : constant Node_Id := First (Exprs);
  --  The arguments
 
  X_Addr, Y_Addr : Node_Id;
@@ -5451,7 +5446,7 @@ package body Exp_Attr is
 
  if Restriction_Active (No_Streams) then
 Rewrite (N,
-  Make_Raise_Program_Error (Sloc (N),
+  Make_Raise_Program_Error (Loc,
 Reason => PE_Stream_Operation_Not_Allowed));
 Set_Etype (N, Standard_Void_Type);
 return;
@@ -5505,7 +5500,7 @@ package body Exp_Attr is
--  case where a No_Streams restriction is active.
 
Rewrite (N,
- Make_Raise_Program_Error (Sloc (N),
+ Make_Raise_Program_Error (Loc,
Reason => PE_Stream_Operation_Not_Allowed));
Set_Etype (N, Standard_Void_Type);
return;
@@ -6180,10 +6175,9 @@ package body Exp_Attr is
 
   when Attribute_Reduce =>
  declare
-Loc : constant Source_Ptr := Sloc (N);
-E1  : constant Node_Id:= First (Expressions (N));
-E2  : constant Node_Id:= Next (E1);
-Bnn : constant Entity_Id  := Make_Temporary (Loc, 'B', N);
+E1  : constant Node_Id   := First (Exprs);
+E2  : constant Node_Id   := Next (E1);
+Bnn : constant Entity_Id := Make_Temporary (Loc, 'B', N);
 
 Accum_Typ : Entity_Id := Empty;
 New_Loop  : Node_Id;
@@ -6381,7 +6375,7 @@ package body Exp_Attr is
 
  if Restriction_Active (No_Streams) then
 Rewrite (N,
-  Make_Raise_Program_Error (Sloc (N),
+  Make_Raise_Program_Error (Loc,
 Reason => PE_Stream_Operation_Not_Allowed));
 Set_Etype (N, B_Type);
 return;
@@ -6453,7 +6447,7 @@ package body Exp_Attr is
--  case where a No_Streams restriction is active.
 
Rewrite (N,
- Make_Raise_Program_Error (Sloc (N),
+ Make_Raise_Program_Error (Loc,
Reason => PE_Stream_Operation_Not_Allowed));
Set_Etype (N, B_Type);
return;
@@ -8096,7 +8090,7 @@ package body Exp_Attr is
 
  if Restriction_Active (No_Streams) then

[COMMITTED 20/30] ada: Further refine 'Super attribute

2024-06-10 Thread Marc Poulhiès
From: Justin Squirek 

This patch adds the restriction on 'Super such that it cannot apply to objects
whose parent type is an interface.

gcc/ada/

* sem_attr.adb (Analyze_Attribute): Add check for interface parent
types.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 4fd270aeae9..2fd95f36d65 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -6683,6 +6683,12 @@ package body Sem_Attr is
 elsif Depends_On_Private (P_Type) then
Error_Attr_P ("prefix type of % is a private extension");
 
+--  Disallow view conversions to interfaces in order to avoid
+--  depending on whether an interface type is used as a parent
+--  or progenitor type.
+
+elsif Is_Interface (Node (First_Elmt (Parents))) then
+   Error_Attr_P ("type of % cannot be an interface");
 end if;
 
 --  Generate a view conversion and analyze it
-- 
2.45.1



[COMMITTED 27/30] ada: Minor code adjustment to "not Present" test

2024-06-10 Thread Marc Poulhiès
From: Gary Dismukes 

This is just changing a "not Present (...)" test to "No (...)"
to address a CB complaint from gnatcheck.

gcc/ada/

* sem_aggr.adb (Resolve_Iterated_Association): Change "not Present"
to "No" in test of Add_Named_Subp.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index 51b88ab831f..249350d21de 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -3380,7 +3380,7 @@ package body Sem_Aggr is
 
 Key_Expr := Key_Expression (Comp);
 if Present (Key_Expr) then
-   if not Present (Add_Named_Subp) then
+   if No (Add_Named_Subp) then
   Error_Msg_N
 ("iterated_element_association with key_expression only "
& "allowed for container type with Add_Named operation "
-- 
2.45.1



[COMMITTED 22/30] ada: Crash checking accessibility level on private type

2024-06-10 Thread Marc Poulhiès
From: Justin Squirek 

This patch fixes an issue in the compiler whereby calculating a static
accessibility level on a private type with an access discriminant resulted
in a compile time crash when No_Dynamic_Accessibility_Checks is enabled.

gcc/ada/

* accessibility.adb (Accessibility_Level): Use Get_Full_View to
avoid crashes when calculating scope.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/accessibility.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/accessibility.adb b/gcc/ada/accessibility.adb
index 33ce001718a..47b3a7af10a 100644
--- a/gcc/ada/accessibility.adb
+++ b/gcc/ada/accessibility.adb
@@ -2227,7 +2227,7 @@ package body Accessibility is
   --  that of the type.
 
   elsif Ekind (Def_Ent) = E_Discriminant then
- return Scope_Depth (Scope (Def_Ent));
+ return Scope_Depth (Get_Full_View (Scope (Def_Ent)));
   end if;
end if;
 
-- 
2.45.1



[COMMITTED 13/30] ada: Fix incorrect lower bound presumption in gnatlink

2024-06-10 Thread Marc Poulhiès
From: Ronan Desplanques 

This patch fixes a subprogram in gnatlink that incorrectly assumed
that the strings it is passed as arguments all have a lower bound of
1.

gcc/ada/

* gnatlink.adb (Check_File_Name): Fix incorrect assumption.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gnatlink.adb | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/gnatlink.adb b/gcc/ada/gnatlink.adb
index d00fd9e5af7..1455412ef93 100644
--- a/gcc/ada/gnatlink.adb
+++ b/gcc/ada/gnatlink.adb
@@ -42,6 +42,7 @@ with Types;
 
 with Ada.Command_Line; use Ada.Command_Line;
 with Ada.Exceptions;   use Ada.Exceptions;
+with Ada.Strings.Fixed;
 
 with System.OS_Lib; use System.OS_Lib;
 with System.CRTL;
@@ -1697,15 +1698,13 @@ begin
 
   procedure Check_File_Name (S : String) is
   begin
- for J in 1 .. FN'Length - (S'Length - 1) loop
-if FN (J .. J + (S'Length - 1)) = S then
-   Error_Msg
- ("warning: executable file name """ & Output_File_Name.all
-  & """ contains substring """ & S & '"');
-   Error_Msg
- ("admin privileges may be required to run this file");
-end if;
- end loop;
+ if Ada.Strings.Fixed.Index (FN, S) /= 0 then
+Error_Msg
+  ("warning: executable file name """ & Output_File_Name.all
+   & """ contains substring """ & S & '"');
+Error_Msg
+  ("admin privileges may be required to run this file");
+ end if;
   end Check_File_Name;
 
--  Start of processing for Bad_File_Names_On_Windows
-- 
2.45.1



[COMMITTED 15/30] ada: Fix usage of SetThreadIdealProcessor

2024-06-10 Thread Marc Poulhiès
From: Ronan Desplanques 

This patches fixes the way the run-time library checks the return
value of SetThreadIdealProcessor.

gcc/ada/

* libgnarl/s-taprop__mingw.adb (Set_Task_Affinity): Fix usage
of SetThreadIdealProcessor.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-taprop__mingw.adb | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/libgnarl/s-taprop__mingw.adb 
b/gcc/ada/libgnarl/s-taprop__mingw.adb
index 3a124ba78d0..38e281cb721 100644
--- a/gcc/ada/libgnarl/s-taprop__mingw.adb
+++ b/gcc/ada/libgnarl/s-taprop__mingw.adb
@@ -1308,7 +1308,13 @@ package body System.Task_Primitives.Operations is
  Result :=
SetThreadIdealProcessor
  (T.Common.LL.Thread, ProcessorId (T.Common.Base_CPU) - 1);
- pragma Assert (Result = 1);
+
+ --  The documentation for SetThreadIdealProcessor states:
+ --
+ --  If the function fails, the return value is (DWORD) - 1.
+ --
+ --  That should map to DWORD'Last in Ada.
+ pragma Assert (Result /= DWORD'Last);
 
   --  Task_Info
 
@@ -1317,7 +1323,10 @@ package body System.Task_Primitives.Operations is
 Result :=
   SetThreadIdealProcessor
 (T.Common.LL.Thread, T.Common.Task_Info.CPU);
-pragma Assert (Result = 1);
+
+--  See the comment above about the return value of
+--  SetThreadIdealProcessor.
+pragma Assert (Result /= DWORD'Last);
  end if;
 
   --  Dispatching domains
-- 
2.45.1



[COMMITTED 17/30] ada: Remove streaming facilities from generics for formal containers

2024-06-10 Thread Marc Poulhiès
From: Yannick Moy 

The dependency on Ada.Streams is problematic for light runtimes.
As these streaming facilities are in fact not used in formal containers,
remove the corresponding dead code.

gcc/ada/

* libgnat/a-chtgfo.adb (Generic_Read, Generic_Write): Remove.
* libgnat/a-chtgfo.ads: Same. Remove dependency on Ada.Streams.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-chtgfo.adb | 68 
 gcc/ada/libgnat/a-chtgfo.ads | 24 -
 2 files changed, 92 deletions(-)

diff --git a/gcc/ada/libgnat/a-chtgfo.adb b/gcc/ada/libgnat/a-chtgfo.adb
index c3fff336e9d..df7b554c050 100644
--- a/gcc/ada/libgnat/a-chtgfo.adb
+++ b/gcc/ada/libgnat/a-chtgfo.adb
@@ -359,74 +359,6 @@ package body 
Ada.Containers.Hash_Tables.Generic_Formal_Operations is
   end loop;
end Generic_Iteration;
 
-   --
-   -- Generic_Read --
-   --
-
-   procedure Generic_Read
- (Stream : not null access Root_Stream_Type'Class;
-  HT : out Hash_Table_Type)
-   is
-  N : Count_Type'Base;
-
-   begin
-  Clear (HT);
-
-  Count_Type'Base'Read (Stream, N);
-
-  if Checks and then N < 0 then
- raise Program_Error with "stream appears to be corrupt";
-  end if;
-
-  if N = 0 then
- return;
-  end if;
-
-  if Checks and then N > HT.Capacity then
- raise Capacity_Error with "too many elements in stream";
-  end if;
-
-  for J in 1 .. N loop
- declare
-Node : constant Count_Type := New_Node (Stream);
-Indx : constant Hash_Type := Index (HT, HT.Nodes (Node));
-B: Count_Type renames HT.Buckets (Indx);
- begin
-Set_Next (HT.Nodes (Node), Next => B);
-B := Node;
- end;
-
- HT.Length := HT.Length + 1;
-  end loop;
-   end Generic_Read;
-
-   ---
-   -- Generic_Write --
-   ---
-
-   procedure Generic_Write
- (Stream : not null access Root_Stream_Type'Class;
-  HT : Hash_Table_Type)
-   is
-  procedure Write (Node : Count_Type);
-  pragma Inline (Write);
-
-  procedure Write is new Generic_Iteration (Write);
-
-  ---
-  -- Write --
-  ---
-
-  procedure Write (Node : Count_Type) is
-  begin
- Write (Stream, HT.Nodes (Node));
-  end Write;
-
-   begin
-  Count_Type'Base'Write (Stream, HT.Length);
-  Write (HT);
-   end Generic_Write;
-
---
-- Index --
---
diff --git a/gcc/ada/libgnat/a-chtgfo.ads b/gcc/ada/libgnat/a-chtgfo.ads
index 76633d8da05..f4471bec3d2 100644
--- a/gcc/ada/libgnat/a-chtgfo.ads
+++ b/gcc/ada/libgnat/a-chtgfo.ads
@@ -30,8 +30,6 @@
 --  Hash_Table_Type is used to implement hashed containers. This package
 --  declares hash-table operations that do not depend on keys.
 
-with Ada.Streams;
-
 generic
with package HT_Types is
  new Generic_Formal_Hash_Table_Types (<>);
@@ -113,26 +111,4 @@ package 
Ada.Containers.Hash_Tables.Generic_Formal_Operations is
procedure Generic_Iteration (HT : Hash_Table_Type);
--  Calls Process for each node in hash table HT
 
-   generic
-  use Ada.Streams;
-  with procedure Write
-(Stream : not null access Root_Stream_Type'Class;
- Node   : Node_Type);
-   procedure Generic_Write
- (Stream : not null access Root_Stream_Type'Class;
-  HT : Hash_Table_Type);
-   --  Used to implement the streaming attribute for hashed containers. It
-   --  calls Write for each node to write its value into Stream.
-
-   generic
-  use Ada.Streams;
-  with function New_Node (Stream : not null access Root_Stream_Type'Class)
- return Count_Type;
-   procedure Generic_Read
- (Stream : not null access Root_Stream_Type'Class;
-  HT : out Hash_Table_Type);
-   --  Used to implement the streaming attribute for hashed containers. It
-   --  first clears hash table HT, then populates the hash table by calling
-   --  New_Node for each item in Stream.
-
 end Ada.Containers.Hash_Tables.Generic_Formal_Operations;
-- 
2.45.1



[COMMITTED 08/30] ada: Enable inlining for subprograms with multiple return statements

2024-06-10 Thread Marc Poulhiès
From: Piotr Trojanek 

With the support for forward GOTO statements in the GNATprove backend,
we can now inline subprograms with multiple return statements in the
frontend.

Also, fix inconsistent source locations in the inlined code, which were
now triggering assertion violations in the code for GNATprove
counterexamples.

gcc/ada/

* inline.adb (Has_Single_Return_In_GNATprove_Mode): Remove.
(Process_Formals): When rewriting an occurrence of a formal
parameter, use location of the occurrence, not of the inlined
call.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/inline.adb | 91 --
 1 file changed, 8 insertions(+), 83 deletions(-)

diff --git a/gcc/ada/inline.adb b/gcc/ada/inline.adb
index 17b3099e6a6..04cf1194009 100644
--- a/gcc/ada/inline.adb
+++ b/gcc/ada/inline.adb
@@ -1090,14 +1090,6 @@ package body Inline is
   --  conflict with subsequent inlinings, so that it is unsafe to try to
   --  inline in such a case.
 
-  function Has_Single_Return_In_GNATprove_Mode return Boolean;
-  --  This function is called only in GNATprove mode, and it returns
-  --  True if the subprogram has no return statement or a single return
-  --  statement as last statement. It returns False for subprogram with
-  --  a single return as last statement inside one or more blocks, as
-  --  inlining would generate gotos in that case as well (although the
-  --  goto is useless in that case).
-
   function Uses_Secondary_Stack (Bod : Node_Id) return Boolean;
   --  If the body of the subprogram includes a call that returns an
   --  unconstrained type, the secondary stack is involved, and it is
@@ -1173,64 +1165,6 @@ package body Inline is
  return False;
   end Has_Pending_Instantiation;
 
-  -
-  -- Has_Single_Return_In_GNATprove_Mode --
-  -
-
-  function Has_Single_Return_In_GNATprove_Mode return Boolean is
- Body_To_Inline : constant Node_Id := N;
- Last_Statement : Node_Id := Empty;
-
- function Check_Return (N : Node_Id) return Traverse_Result;
- --  Returns OK on node N if this is not a return statement different
- --  from the last statement in the subprogram.
-
- --
- -- Check_Return --
- --
-
- function Check_Return (N : Node_Id) return Traverse_Result is
- begin
-case Nkind (N) is
-   when N_Extended_Return_Statement
-  | N_Simple_Return_Statement
-   =>
-  if N = Last_Statement then
- return OK;
-  else
- return Abandon;
-  end if;
-
-   --  Skip locally declared subprogram bodies inside the body to
-   --  inline, as the return statements inside those do not count.
-
-   when N_Subprogram_Body =>
-  if N = Body_To_Inline then
- return OK;
-  else
- return Skip;
-  end if;
-
-   when others =>
-  return OK;
-end case;
- end Check_Return;
-
- function Check_All_Returns is new Traverse_Func (Check_Return);
-
-  --  Start of processing for Has_Single_Return_In_GNATprove_Mode
-
-  begin
- --  Retrieve the last statement
-
- Last_Statement := Last (Statements (Handled_Statement_Sequence (N)));
-
- --  Check that the last statement is the only possible return
- --  statement in the subprogram.
-
- return Check_All_Returns (N) = OK;
-  end Has_Single_Return_In_GNATprove_Mode;
-
   --
   -- Uses_Secondary_Stack --
   --
@@ -1275,16 +1209,6 @@ package body Inline is
   then
  return;
 
-  --  Subprograms that have return statements in the middle of the body are
-  --  inlined with gotos. GNATprove does not currently support gotos, so
-  --  we prevent such inlining.
-
-  elsif GNATprove_Mode
-and then not Has_Single_Return_In_GNATprove_Mode
-  then
- Cannot_Inline ("cannot inline & (multiple returns)?", N, Spec_Id);
- return;
-
   --  Functions that return controlled types cannot currently be inlined
   --  because they require secondary stack handling; controlled actions
   --  may also interfere in complex ways with inlining.
@@ -3518,6 +3442,7 @@ package body Inline is
   -
 
   function Process_Formals (N : Node_Id) return Traverse_Result is
+ Loc : constant Source_Ptr := Sloc (N);
  A   : Entity_Id;
  E   : Entity_Id;
  Ret : Node_Id;
@@ -3544,13 +3469,13 @@ package body Inline is
 
if 

[COMMITTED 19/30] ada: Fix references to Ada RM in comments

2024-06-10 Thread Marc Poulhiès
From: Piotr Trojanek 

We seem to have a convention of using "RM" in the GNAT comments, not
"Ada RM". Also, the paragraph references by convention should appear
in parentheses, e.g. "8.3(12.3/2)", not "8.3 12.3/2".

gcc/ada/

* einfo.ads, exp_attr.adb, exp_ch4.adb, exp_ch7.adb,
lib-writ.adb, libgnat/a-stbuut.ads, sem_ch13.adb, sem_ch3.adb,
sem_ch7.adb: Use "RM" in comments.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/einfo.ads| 2 +-
 gcc/ada/exp_attr.adb | 4 ++--
 gcc/ada/exp_ch4.adb  | 2 +-
 gcc/ada/exp_ch7.adb  | 2 +-
 gcc/ada/lib-writ.adb | 3 +--
 gcc/ada/libgnat/a-stbuut.ads | 2 +-
 gcc/ada/sem_ch13.adb | 4 ++--
 gcc/ada/sem_ch3.adb  | 2 +-
 gcc/ada/sem_ch7.adb  | 2 +-
 9 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads
index e5110f51670..0b0529a39cf 100644
--- a/gcc/ada/einfo.ads
+++ b/gcc/ada/einfo.ads
@@ -2728,7 +2728,7 @@ package Einfo is
 --   Defined in all entities. Set for implicitly declared subprograms
 --   that require overriding or are null procedures, and are hidden by
 --   a non-fully conformant homograph with the same characteristics
---   (Ada RM 8.3 12.3/2).
+--   (RM 8.3(12.3/2)).
 
 --Is_Hidden_Open_Scope
 --   Defined in all entities. Set for a scope that contains the
diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index 0349db28a1a..1396007a2d1 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -2173,8 +2173,8 @@ package body Exp_Attr is
   --  for the arguments of a 'Read attribute reference (since the
   --  scalar argument is an OUT scalar) and for the arguments of a
   --  'Has_Same_Storage or 'Overlaps_Storage attribute reference (which not
-  --  considered to be reads of their prefixes and expressions, see Ada RM
-  --  13.3(73.10/3)).
+  --  considered to be reads of their prefixes and expressions, see
+  --  RM 13.3(73.10/3)).
 
   if Validity_Checks_On and then Validity_Check_Operands
 and then Id /= Attribute_Asm_Output
diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 6ceffdf8302..95b7765b173 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -8512,7 +8512,7 @@ package body Exp_Ch4 is
 
  --  For small negative exponents, we return the reciprocal of
  --  the folding of the exponentiation for the opposite (positive)
- --  exponent, as required by Ada RM 4.5.6(11/3).
+ --  exponent, as required by RM 4.5.6(11/3).
 
  if abs Expv <= 4 then
 
diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 993c13c7318..fd1d9db0654 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -7419,7 +7419,7 @@ package body Exp_Ch7 is
  --  non-POC components are finalized before the
  --  non-POC extension components. This violates the
  --  usual "finalize in reverse declaration order"
- --  principle, but that's ok (see Ada RM 7.6.1(9)).
+ --  principle, but that's ok (see RM 7.6.1(9)).
  --
  --  Last_POC_Call should be non-empty if the extension
  --  has at least one POC. Interactions with variant
diff --git a/gcc/ada/lib-writ.adb b/gcc/ada/lib-writ.adb
index 697b2f2b797..0755b92e4db 100644
--- a/gcc/ada/lib-writ.adb
+++ b/gcc/ada/lib-writ.adb
@@ -298,8 +298,7 @@ package body Lib.Writ is
  function Is_Implicit_With_Clause (Clause : Node_Id) return Boolean is
  begin
 --  With clauses created for ancestor units are marked as internal,
---  however, they emulate the semantics in Ada RM 10.1.2 (6/2),
---  where
+--  however, they emulate the semantics in RM 10.1.2 (6/2), where
 --
 --with A.B;
 --
diff --git a/gcc/ada/libgnat/a-stbuut.ads b/gcc/ada/libgnat/a-stbuut.ads
index dadfe5f0010..2a8b08bca57 100644
--- a/gcc/ada/libgnat/a-stbuut.ads
+++ b/gcc/ada/libgnat/a-stbuut.ads
@@ -33,7 +33,7 @@ with Ada.Strings.UTF_Encoding.Wide_Wide_Strings;
 
 package Ada.Strings.Text_Buffers.Utils with Pure is
 
-   --  Ada.Strings.Text_Buffers is a predefined unit (see Ada RM A.4.12).
+   --  Ada.Strings.Text_Buffers is a predefined unit (see RM A.4.12).
--  This is a GNAT-defined child unit of that parent.
 
subtype Character_7 is
diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index c0a5b6c2c37..f84ca2c75d7 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -12860,7 +12860,7 @@ package body Sem_Ch13 is
   procedure Hide_Non_Overridden_Subprograms (Typ : Entity_Id);
   --  Inspect the primitive operations of type Typ and hide all pairs of
   --  implicitly declared non-overridden non-fully conformant homographs
-  --  (Ada RM 8.3 12.3/2).
+  --  (RM 8.3(12.3/2)).
 

[COMMITTED 30/30] ada: Add support for No_Implicit_Conditionals to nonbinary modular types

2024-06-10 Thread Marc Poulhiès
From: Eric Botcazou 

The expansion of additive operations for nonbinary modular types implemented
in the front-end and its counterpart in code generators may create branches,
which is not allowed when restriction No_Implicit_Conditionals is in effect.

This changes it to use an explicit Mod operation when the restriction is in
effect, which is assumed not to create such branches.

gcc/ada/

* exp_ch4.adb (Expand_Nonbinary_Modular_Op): Create an explicit Mod
for additive operations if No_Implicit_Conditionals is in effect.
(Expand_Modular_Addition): Likewise.
(Expand_Modular_Subtraction): Likewise.
(Expand_Modular_Op): Always use an unsigned type obtained by calling
Small_Integer_Type_For on the required size.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb | 132 ++--
 1 file changed, 77 insertions(+), 55 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 95b7765b173..bf90b46249a 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -139,9 +139,10 @@ package body Exp_Ch4 is
--  case of array type arguments.
 
procedure Expand_Nonbinary_Modular_Op (N : Node_Id);
-   --  When generating C code, convert nonbinary modular arithmetic operations
-   --  into code that relies on the front-end expansion of operator Mod. No
-   --  expansion is performed if N is not a nonbinary modular operand.
+   --  When generating C code or if restriction No_Implicit_Conditionals is in
+   --  effect, convert most nonbinary modular arithmetic operations into code
+   --  that relies on the expansion of an explicit Mod operator. No expansion
+   --  is performed if N is not a nonbinary modular operation.
 
procedure Expand_Short_Circuit_Operator (N : Node_Id);
--  Common expansion processing for short-circuit boolean operators
@@ -3899,10 +3900,13 @@ package body Exp_Ch4 is
 
   procedure Expand_Modular_Addition is
   begin
- --  If this is not the addition of a constant then compute it using
- --  the general rule: (lhs + rhs) mod Modulus
+ --  If this is not the addition of a constant or else restriction
+ --  No_Implicit_Conditionals is in effect, then compute it using
+ --  the general rule: (lhs + rhs) mod Modulus.
 
- if Nkind (Right_Opnd (N)) /= N_Integer_Literal then
+ if Nkind (Right_Opnd (N)) /= N_Integer_Literal
+   or else Restriction_Active (No_Implicit_Conditionals)
+ then
 Expand_Modular_Op;
 
  --  If this is an addition of a constant, convert it to a subtraction
@@ -3921,6 +3925,7 @@ package body Exp_Ch4 is
Cond_Expr : Node_Id;
Then_Expr : Node_Id;
Else_Expr : Node_Id;
+
 begin
--  To prevent spurious visibility issues, convert all
--  operands to Standard.Unsigned.
@@ -3966,12 +3971,12 @@ package body Exp_Ch4 is
  --   We will convert to another type (not a nonbinary-modulus modular
  --   type), evaluate the op in that representation, reduce the result,
  --   and convert back to the original type. This means that the
- --   backend does not have to deal with nonbinary-modulus ops.
-
- Op_Expr  : constant Node_Id := New_Op_Node (Nkind (N), Loc);
- Mod_Expr : Node_Id;
+ --   back end does not have to deal with nonbinary-modulus ops.
 
+ Mod_Expr: Node_Id;
+ Op_Expr : Node_Id;
  Target_Type : Entity_Id;
+
   begin
  --  Select a target type that is large enough to avoid spurious
  --  intermediate overflow on pre-reduction computation (for
@@ -3979,22 +3984,15 @@ package body Exp_Ch4 is
 
  declare
 Required_Size : Uint := RM_Size (Etype (N));
-Use_Unsigned  : Boolean := True;
+
  begin
 case Nkind (N) is
-   when N_Op_Add =>
+   when N_Op_Add | N_Op_Subtract =>
   --  For example, if modulus is 255 then RM_Size will be 8
   --  and the range of possible values (before reduction) will
   --  be 0 .. 508; that range requires 9 bits.
   Required_Size := Required_Size + 1;
 
-   when N_Op_Subtract =>
-  --  For example, if modulus is 255 then RM_Size will be 8
-  --  and the range of possible values (before reduction) will
-  --  be -254 .. 254; that range requires 9 bits, signed.
-  Use_Unsigned := False;
-  Required_Size := Required_Size + 1;
-
when N_Op_Multiply =>
   --  For example, if modulus is 255 then RM_Size will be 8
   --  and the range of possible values (before reduction) will
@@ -4005,37 +4003,15 @@ package body Exp_Ch4 is
   null;
 end 

[COMMITTED 09/30] ada: Simplify check for type without stream operations

2024-06-10 Thread Marc Poulhiès
From: Piotr Trojanek 

Recursive routine Type_Without_Stream_Operation was checking restriction
No_Default_Stream_Attributes at every call, which was confusing and
inefficient.

This routine is only called from the places: Check_Stream_Attribute,
which already checks if this restriction is active, and
Stream_Operation_OK, where we add such a check.

Cleanup related to extending the use of No_Streams restriction.

gcc/ada/

* exp_ch3.adb (Stream_Operation_OK): Check restriction
No_Default_Stream_Attributes before call to
Type_Without_Stream_Operation.
* sem_util.adb (Type_Without_Stream_Operation): Remove static
condition from recursive routine

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb  | 4 +++-
 gcc/ada/sem_util.adb | 4 
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 8ddae1eb1be..f9dd0914111 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -12912,7 +12912,9 @@ package body Exp_Ch3 is
 and then No (No_Tagged_Streams_Pragma (Typ))
 and then not No_Run_Time_Mode
 and then RTE_Available (RE_Tag)
-and then No (Type_Without_Stream_Operation (Typ))
+and then
+  (not Restriction_Active (No_Default_Stream_Attributes)
+ or else No (Type_Without_Stream_Operation (Typ)))
 and then RTE_Available (RE_Root_Stream_Type);
end Stream_Operation_OK;
 
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 15994b4d1e9..241be3d2957 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -28557,10 +28557,6 @@ package body Sem_Util is
   Op_Missing : Boolean;
 
begin
-  if not Restriction_Active (No_Default_Stream_Attributes) then
- return Empty;
-  end if;
-
   if Is_Elementary_Type (T) then
  if Op = TSS_Null then
 Op_Missing :=
-- 
2.45.1



[COMMITTED 04/30] ada: Fix handling of aspects CPU and Interrupt_Priority

2024-06-10 Thread Marc Poulhiès
From: Piotr Trojanek 

When resolving aspect expression, aspects CPU and Interrupt_Priority
should be handled like the aspect Priority; in particular, all these
expressions can reference discriminants of the annotated task type.

gcc/ada/

* sem_ch13.adb (Check_Aspect_At_End_Of_Declarations): Make
discriminants visible when analyzing aspect Interrupt_Priority.
(Freeze_Entity_Checks): Likewise.
(Resolve_Aspect_Expressions): Likewise for both aspects CPU and
Interrupt_Priority.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 4cf6fc9a645..c0a5b6c2c37 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -11107,6 +11107,7 @@ package body Sem_Ch13 is
  elsif A_Id in Aspect_CPU
  | Aspect_Dynamic_Predicate
  | Aspect_Ghost_Predicate
+ | Aspect_Interrupt_Priority
  | Aspect_Predicate
  | Aspect_Priority
  | Aspect_Static_Predicate
@@ -13366,6 +13367,7 @@ package body Sem_Ch13 is
   if Get_Aspect_Id (Ritem) in Aspect_CPU
 | Aspect_Dynamic_Predicate
 | Aspect_Ghost_Predicate
+| Aspect_Interrupt_Priority
 | Aspect_Predicate
 | Aspect_Static_Predicate
 | Aspect_Priority
@@ -15881,7 +15883,10 @@ package body Sem_Ch13 is
  Set_Must_Not_Freeze (Expr);
  Preanalyze_Spec_Expression (Expr, E);
 
-  when Aspect_Priority =>
+  when Aspect_CPU
+ | Aspect_Interrupt_Priority
+ | Aspect_Priority
+  =>
  Push_Type (E);
  Preanalyze_Spec_Expression (Expr, Any_Integer);
  Pop_Type (E);
-- 
2.45.1



[COMMITTED 18/30] ada: Tune code related to potentially unevaluated expressions

2024-06-10 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/

* sem_util.adb
(Immediate_Context_Implies_Is_Potentially_Unevaluated): Use
collective subtypes in membership tests.
(Is_Known_On_Entry): Require all alternatives in a case statement
to return; this change could prevent a recently fixed glitch,
where one of the alternatives relied on the return statement
afterwards (also, the new code is shorter).
* sem_util.ads (Is_Potentially_Unevaluated): Clarify that this
routine applies to Ada 2012.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 8 +++-
 gcc/ada/sem_util.ads | 2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 241be3d2957..5bea088c44e 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -19485,10 +19485,10 @@ package body Sem_Util is
  elsif Nkind (Par) = N_Case_Expression then
 return Expr /= Expression (Par);
 
- elsif Nkind (Par) in N_And_Then | N_Or_Else then
+ elsif Nkind (Par) in N_Short_Circuit then
 return Expr = Right_Opnd (Par);
 
- elsif Nkind (Par) in N_In | N_Not_In then
+ elsif Nkind (Par) in N_Membership_Test then
 
 --  If the membership includes several alternatives, only the first
 --  is definitely evaluated.
@@ -30880,10 +30880,8 @@ package body Sem_Util is
   return True;
 
when others =>
-  null;
+  return False;
 end case;
-
-return False;
  end Is_Known_On_Entry;
 
   end Conditional_Evaluation;
diff --git a/gcc/ada/sem_util.ads b/gcc/ada/sem_util.ads
index 4fef8966380..f282d1fad99 100644
--- a/gcc/ada/sem_util.ads
+++ b/gcc/ada/sem_util.ads
@@ -2219,7 +2219,7 @@ package Sem_Util is
--  type be partially initialized.
 
function Is_Potentially_Unevaluated (N : Node_Id) return Boolean;
-   --  Predicate to implement definition given in RM 6.1.1 (20/3)
+   --  Predicate to implement definition given in RM 2012 6.1.1 (20/3)
 
function Is_Potentially_Persistent_Type (T : Entity_Id) return Boolean;
--  Determines if type T is a potentially persistent type. A potentially
-- 
2.45.1



  1   2   >