Re: [PATCH] i386: Modify testcases failed under -DDEBUG

2024-01-24 Thread Hongtao Liu
On Mon, Jan 22, 2024 at 10:31 AM Haochen Jiang  wrote:
>
> Hi all,
>
> Recently, I happened to run i386.exp under -DDEBUG and found some fail.
>
> This patch aims to fix that. Ok for trunk?
OK.
>
> Thx,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/adx-check.h: Include stdio.h when DEBUG
> is defined.
> * gcc.target/i386/avx512fp16-vscalefph-1b.c: Do not define
> DEBUG.
> * gcc.target/i386/avx512fp16vl-vaddph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vcmpph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vdivph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vfpclassph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vgetexpph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vgetmantph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vmaxph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vminph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vmulph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vrcpph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vreduceph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vscalefph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vsqrtph-1b.c: Ditto.
> * gcc.target/i386/avx512fp16vl-vsubph-1b.c: Ditto.
> * gcc.target/i386/readeflags-1.c: Include stdio.h when DEBUG
> is defined.
> * gcc.target/i386/rtm-check.h: Ditto.
> * gcc.target/i386/sha-check.h: Ditto.
> * gcc.target/i386/writeeflags-1.c: Ditto.
> ---
>  gcc/testsuite/gcc.target/i386/adx-check.h   | 3 +++
>  gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c | 3 ---
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c   | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c   | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c| 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c   | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c | 1 -
>  gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c  | 1 -
>  gcc/testsuite/gcc.target/i386/readeflags-1.c| 3 +++
>  gcc/testsuite/gcc.target/i386/rtm-check.h   | 3 +++
>  gcc/testsuite/gcc.target/i386/sha-check.h   | 3 +++
>  gcc/testsuite/gcc.target/i386/writeeflags-1.c   | 3 +++
>  22 files changed, 15 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/adx-check.h 
> b/gcc/testsuite/gcc.target/i386/adx-check.h
> index cfed1a38483..45435b91d0e 100644
> --- a/gcc/testsuite/gcc.target/i386/adx-check.h
> +++ b/gcc/testsuite/gcc.target/i386/adx-check.h
> @@ -1,5 +1,8 @@
>  #include 
>  #include "cpuid.h"
> +#ifdef DEBUG
> +#include 
> +#endif
>
>  static void adx_test (void);
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c 
> b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
> index 7c7288d6eb3..0ba9ec57f37 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
> @@ -1,9 +1,6 @@
>  /* { dg-do run { target avx512fp16 } } */
>  /* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
>
> -
> -#define DEBUG
> -
>  #define AVX512FP16
>  #include "avx512fp16-helper.h"
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c 
> b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
> index fcf6a9058f5..1db7c565262 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
> @@ -1,7 +1,6 @@
>  /* { dg-do run { target avx512fp16 } } */
>  /* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
>
> -#define DEBUG
>  #define AVX512VL
>  #define AVX512F_LEN 256
>  #define AVX512F_LEN_HALF 128
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c 
> b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c
> index c201a9258bf..bbd366a5d29 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c
> @@ -1,7 +1,6 @@
>  /* { dg-do run { target avx512f

Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-24 Thread FX Coudert
Hi,

> Hopefully, FX sees this as my emails to gmail bounce.

I am seeing this email.


> Now, if
> the OS adds cospi() to libm and it's in libm's symbol map, then the
> cospi() used by gfortran depends on the search order of the loaded
> libraries.

We only include the fallback math functions in libgfortran when they are not 
available on the system. configure detects what is present in the libc being 
targeted, and conditionally compiles the necessary fallback functions (and only 
them).

FX



[PATCH] aarch64: Re-enable ldp/stp fusion pass

2024-01-24 Thread Alex Coplan
Hi,

Since, to the best of my knowledge, all reported regressions related to
the ldp/stp fusion pass have now been fixed, and PGO+LTO bootstrap with
--enable-languages=all is working again with the passes enabled, this
patch turns the passes back on by default, as agreed with Jakub here:

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642478.html

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

* config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
to 1.
(-mlate-ldp-fusion): Likewise.
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index c495cb34fbf..ceed5cdb201 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -290,12 +290,12 @@ Target Var(aarch64_track_speculation)
 Generate code to track when the CPU might be speculating incorrectly.
 
 mearly-ldp-fusion
-Target Var(flag_aarch64_early_ldp_fusion) Optimization Init(0)
+Target Var(flag_aarch64_early_ldp_fusion) Optimization Init(1)
 Enable the copy of the AArch64 load/store pair fusion pass that runs before
 register allocation.
 
 mlate-ldp-fusion
-Target Var(flag_aarch64_late_ldp_fusion) Optimization Init(0)
+Target Var(flag_aarch64_late_ldp_fusion) Optimization Init(1)
 Enable the copy of the AArch64 load/store pair fusion pass that runs after
 register allocation.
 


[PATCH] testsuite: i386: Fix gcc.target/i386/pr70321.c on 32-bit Solaris/x86

2024-01-24 Thread Rainer Orth
gcc.target/i386/pr70321.c FAILs on 32-bit Solaris/x86 since its
introduction in

commit 43201f2c2173894bf7c423cad6da1c21567e06c0
Author: Roger Sayle 
Date:   Mon May 30 21:20:09 2022 +0100

PR target/70321: Split double word equality/inequality after STV on x86.

FAIL: gcc.target/i386/pr70321.c scan-assembler-times mov 1

The failure happens because 32-bit Solaris/x86 defaults to
-fno-omit-frame-pointer.

Fixed by specifying -fomit-frame-pointer explicitly.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-01-23  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/pr70321.c: Add -fomit-frame-pointer to
dg-options.

# HG changeset patch
# Parent  229bf3d228cb30f17ea645f6f4b2e8f48d2cfa75
testsuite: i386: Fix gcc.target/i386/pr70321.c on 32-bit Solaris/x86

diff --git a/gcc/testsuite/gcc.target/i386/pr70321.c b/gcc/testsuite/gcc.target/i386/pr70321.c
--- a/gcc/testsuite/gcc.target/i386/pr70321.c
+++ b/gcc/testsuite/gcc.target/i386/pr70321.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target ia32 } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fomit-frame-pointer" } */
 
 void foo (long long ixi)
 {


[PATCH] testsuite: i386: Fix gcc.target/i386/pr80833-1.c on 32-bit Solaris/x86

2024-01-24 Thread Rainer Orth
gcc.target/i386/pr80833-1.c FAILs on 32-bit Solaris/x86 since 20220609:

FAIL: gcc.target/i386/pr80833-1.c scan-assembler pextrd

Unlike e.g. Linux/i686, 32-bit Solaris/x86 defaults to -mstackrealign,
so this patch overrides that to match.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-01-23  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/pr80833-1.c: Add -mno-stackrealign to dg-options.

# HG changeset patch
# Parent  0f0cde35c23eb9b92159347bfc0ae5f075be65b3
testsuite: i386: Fix gcc.target/i386/pr80833-1.c on 32-bit Solaris/x86

diff --git a/gcc/testsuite/gcc.target/i386/pr80833-1.c b/gcc/testsuite/gcc.target/i386/pr80833-1.c
--- a/gcc/testsuite/gcc.target/i386/pr80833-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr80833-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -msse4.1 -mtune=intel -mregparm=2" } */
+/* { dg-options "-O2 -msse4.1 -mtune=intel -mregparm=2 -mno-stackrealign" } */
 /* { dg-require-effective-target ia32 } */
 
 long long test (long long a)


Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pr80833-1.c on 32-bit Solaris/x86

2024-01-24 Thread Uros Bizjak
On Wed, Jan 24, 2024 at 10:07 AM Rainer Orth
 wrote:
>
> gcc.target/i386/pr80833-1.c FAILs on 32-bit Solaris/x86 since 20220609:
>
> FAIL: gcc.target/i386/pr80833-1.c scan-assembler pextrd
>
> Unlike e.g. Linux/i686, 32-bit Solaris/x86 defaults to -mstackrealign,
> so this patch overrides that to match.
>
> Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.
>
> Ok for trunk?
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-01-23  Rainer Orth  
>
> gcc/testsuite:
> * gcc.target/i386/pr80833-1.c: Add -mno-stackrealign to dg-options.

OK.

Thanks,
Uros.


Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-24 Thread Janne Blomqvist
On Wed, Jan 24, 2024 at 10:28 AM FX Coudert  wrote:
> > Now, if
> > the OS adds cospi() to libm and it's in libm's symbol map, then the
> > cospi() used by gfortran depends on the search order of the loaded
> > libraries.
>
> We only include the fallback math functions in libgfortran when they are not 
> available on the system. configure detects what is present in the libc being 
> targeted, and conditionally compiles the necessary fallback functions (and 
> only them).

Exactly. However, there is the (corner?) case when libgfortran has
been compiled, and cospi() not found and thus the fallback
implementation is included, and then later libc is updated to a
version that does provide cospi(). I believe in that case which
version gets used is down to the library search order (i.e. the order
that "ldd /path/to/binary" prints the libs), it will use the first
symbol it finds.  Also, it's not necessary to do some ifdef tricks
with gfortran.map, if a symbol listed there isn't found in the library
it's just ignored. So the *pi() trig functions can be unconditionally
added there, and then depending on whether the target libm includes
those or not they are then included in the exported symbol list.

It's possible to override this to look for specific symbol versions
etc., but that probably goes deep into the weeds of target-specific
stuff (e.g. are we looking for cospi@FBSD_1.7, cospi@GLIBC_X.Y.Z, or
something else?). I'm sure you don't wanna go there.


-- 
Janne Blomqvist


[PATCH] testsuite: i386: Fix gcc.target/i386/avx512vl-stv-rotatedi-1.c on 32-bit Solaris/x86

2024-01-24 Thread Rainer Orth
gcc.target/i386/avx512vl-stv-rotatedi-1.c FAILs on 32-bit Solaris/x86
since its introduction in

commit 4814b63c3c2326cb5d7baa63882da60ac011bd97
Author: Roger Sayle 
Date:   Mon Jul 10 09:04:29 2023 +0100

i386: Add AVX512 support for STV of SI/DImode rotation by constant.

FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times vpro[lr]q 
29

While the test depends on -mstv, 32-bit Solaris/x86 defaults to
-mstackrealign which is incompatible.

The patch thus specifies -mstv -mno-stackrealign explicitly.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-01-23  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/avx512vl-stv-rotatedi-1.c: Add -mstv
-mno-stackrealign to dg-options.

# HG changeset patch
# Parent  b787665f90dfbd5be1b00d411675d6fd896f7a9a
testsuite: i386: Fix gcc.target/i386/avx512vl-stv-rotatedi-1.c on 32-bit Solaris/x86

diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-stv-rotatedi-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-stv-rotatedi-1.c
--- a/gcc/testsuite/gcc.target/i386/avx512vl-stv-rotatedi-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-stv-rotatedi-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target ia32 } } */
-/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-options "-O2 -mavx512vl -mstv -mno-stackrealign" } */
 
 unsigned long long rot1(unsigned long long x) { return (x>>1) | (x<<63); }
 unsigned long long rot2(unsigned long long x) { return (x>>2) | (x<<62); }


RE: [PATCH] aarch64: Re-enable ldp/stp fusion pass

2024-01-24 Thread Kyrylo Tkachov
Hi Alex,

> -Original Message-
> From: Alex Coplan 
> Sent: Wednesday, January 24, 2024 8:34 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> ; Kyrylo Tkachov ;
> Jakub Jelinek 
> Subject: [PATCH] aarch64: Re-enable ldp/stp fusion pass
> 
> Hi,
> 
> Since, to the best of my knowledge, all reported regressions related to
> the ldp/stp fusion pass have now been fixed, and PGO+LTO bootstrap with
> --enable-languages=all is working again with the passes enabled, this
> patch turns the passes back on by default, as agreed with Jakub here:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642478.html
> 
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> 

If we were super-pedantic about the GCC rules we could say that this is a 
revert of 8ed77a2356c3562f96c64f968e7529065c128c6a and therefore:
"Similarly, no outside approval is needed to revert a patch that you checked 
in." 😊
But that would go against the spirit of the rule.
Anyway, this is ok. Thanks for working through the regressions so diligently.
Kyrill

> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
>   to 1.
>   (-mlate-ldp-fusion): Likewise.


[PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-01-24 Thread Jiahao Xu
gcc/ChangeLog:

* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.

diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h
index 7692415e04d..ff2c9f460ac 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -336,38 +336,38 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2)
 #ifdef __loongarch_frecipe
 /* Assembly instruction format: fd, fj.  */
 /* Data types in instruction templates:  SF, SF.  */
-extern __inline void
+extern __inline float
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 __frecipe_s (float _1)
 {
-  __builtin_loongarch_frecipe_s ((float) _1);
+  return (float) __builtin_loongarch_frecipe_s ((float) _1);
 }
 
 /* Assembly instruction format: fd, fj.  */
 /* Data types in instruction templates:  DF, DF.  */
-extern __inline void
+extern __inline double
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 __frecipe_d (double _1)
 {
-  __builtin_loongarch_frecipe_d ((double) _1);
+  return (double) __builtin_loongarch_frecipe_d ((double) _1);
 }
 
 /* Assembly instruction format: fd, fj.  */
 /* Data types in instruction templates:  SF, SF.  */
-extern __inline void
+extern __inline float
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 __frsqrte_s (float _1)
 {
-  __builtin_loongarch_frsqrte_s ((float) _1);
+  return (float) __builtin_loongarch_frsqrte_s ((float) _1);
 }
 
 /* Assembly instruction format: fd, fj.  */
 /* Data types in instruction templates:  DF, DF.  */
-extern __inline void
+extern __inline double
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 __frsqrte_d (double _1)
 {
-  __builtin_loongarch_frsqrte_d ((double) _1);
+  return (double) __builtin_loongarch_frsqrte_d ((double) _1);
 }
 #endif
 
diff --git a/gcc/testsuite/gcc.target/loongarch/larch-frecipe-intrinsic.c 
b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-intrinsic.c
new file mode 100644
index 000..6ce2bde0acf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-intrinsic.c
@@ -0,0 +1,30 @@
+/* Test intrinsics for frecipe.{s/d} and frsqrte.{s/d} instructions */
+/* { dg-do compile } */
+/* { dg-options "-mfrecipe -O2" } */
+/* { dg-final { scan-assembler-times 
"test_frecipe_s:.*frecipe\\.s.*test_frecipe_s" 1 } } */
+/* { dg-final { scan-assembler-times 
"test_frecipe_d:.*frecipe\\.d.*test_frecipe_d" 1 } } */
+/* { dg-final { scan-assembler-times 
"test_frsqrte_s:.*frsqrte\\.s.*test_frsqrte_s" 1 } } */
+/* { dg-final { scan-assembler-times 
"test_frsqrte_d:.*frsqrte\\.d.*test_frsqrte_d" 1 } } */
+
+#include 
+
+float
+test_frecipe_s (float _1)
+{
+  return __frecipe_s (_1);
+}
+double
+test_frecipe_d (double _1)
+{
+  return __frecipe_d (_1);
+}
+float
+test_frsqrte_s (float _1)
+{
+  return __frsqrte_s (_1);
+}
+double
+test_frsqrte_d (double _1)
+{
+  return __frsqrte_d (_1);
+}
-- 
2.20.1



[PATCH] Loongarch: Remove vec_concatz pattern

2024-01-24 Thread Jiahao Xu
It is incorrect to use vld/vori to implement the vec_concatz because when 
the LSX
instruction is used to update the value of the vector register, the upper 128 
bits of
the vector register will not be zeroed.

gcc/ChangeLog:

* config/loongarch/lasx.md (@vec_concatz): Remove this 
define_insn pattern.
* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): 
Use vec_concat.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 90f66ee4d24..e2115ffb884 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -582,21 +582,6 @@ (define_insn "lasx_xvinsgr2vr_"
   [(set_attr "type" "simd_insert")
(set_attr "mode" "")])
 
-(define_insn "@vec_concatz"
-  [(set (match_operand:LASX 0 "register_operand" "=f")
-(vec_concat:LASX
-  (match_operand: 1 "nonimmediate_operand")
-  (match_operand: 2 "const_0_operand")))]
-  "ISA_HAS_LASX"
-{
-  if (MEM_P (operands[1]))
-return "vld\t%w0,%1";
-  else
-return "vori.b\t%w0,%w1,0";
-}
-  [(set_attr "type" "simd_splat")
-   (set_attr "mode" "")])
-
 (define_insn "vec_concat"
   [(set (match_operand:LASX 0 "register_operand" "=f")
(vec_concat:LASX
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 072c68d97e3..cd335827570 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -9917,17 +9917,12 @@ loongarch_expand_vector_group_init (rtx target, rtx 
vals)
   gcc_unreachable ();
 }
 
-  if (high == CONST0_RTX (half_mode))
-emit_insn (gen_vec_concatz (vmode, target, low, high));
-  else
-{
-  if (!register_operand (low, half_mode))
-   low = force_reg (half_mode, low);
-  if (!register_operand (high, half_mode))
-   high = force_reg (half_mode, high);
-  emit_insn (gen_rtx_SET (target,
- gen_rtx_VEC_CONCAT (vmode, low, high)));
-}
+  if (!register_operand (low, half_mode))
+low = force_reg (half_mode, low);
+  if (!register_operand (high, half_mode))
+high = force_reg (half_mode, high);
+  emit_insn (gen_rtx_SET (target,
+ gen_rtx_VEC_CONCAT (vmode, low, high)));
 }
 
 /* Expand initialization of a vector which has all same elements.  */
-- 
2.20.1



[PATCH]AArch64: Fix expansion of Advanced SIMD div and mul using SVE [PR109636]

2024-01-24 Thread Tamar Christina
Hi All,

As suggested in the ticket this replaces the expansion by converting the
Advanced SIMD types to SVE types by simply printing out an SVE register for
these instructions.

This fixes the subreg issues since there are no subregs involved anymore.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/109636
* config/aarch64/aarch64-simd.md (div3,
mulv2di3): Remove.
* config/aarch64/iterators.md (VQDIV): Remove.
(SVE_FULL_SDI_SIMD, SVE_FULL_SDI_SIMD_DI, SVE_FULL_HSDI_SIMD_DI,
SVE_I_SIMD_DI): New.
(VPRED, sve_lane_con): Add V4SI and V2DI.
* config/aarch64/aarch64-sve.md (3,
@aarch64_pred_): Support Advanced SIMD types.
(mul3): New, split from 3.
(@aarch64_pred_, *post_ra_3): New.
* config/aarch64/aarch64-sve2.md (@aarch64_mul_lane_,
*aarch64_mul_unpredicated_): Change SVE_FULL_HSDI to
SVE_FULL_HSDI_SIMD_DI.

gcc/testsuite/ChangeLog:

PR target/109636
* gcc.target/aarch64/sve/pr109636_1.c: New test.
* gcc.target/aarch64/sve/pr109636_2.c: New test.
* gcc.target/aarch64/sve2/pr109636_1.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
6f48b4d5f21da9f96a376cd6b34110c2a39deb33..556d0cf359fedf2c28dfe1e0a75e1c12321be68a
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -389,26 +389,6 @@ (define_insn "mul3"
   [(set_attr "type" "neon_mul_")]
 )
 
-;; Advanced SIMD does not support vector DImode MUL, but SVE does.
-;; Make use of the overlap between Z and V registers to implement the V2DI
-;; optab for TARGET_SVE.  The mulvnx2di3 expander can
-;; handle the TARGET_SVE2 case transparently.
-(define_expand "mulv2di3"
-  [(set (match_operand:V2DI 0 "register_operand")
-(mult:V2DI (match_operand:V2DI 1 "register_operand")
-  (match_operand:V2DI 2 "aarch64_sve_vsm_operand")))]
-  "TARGET_SVE"
-  {
-machine_mode sve_mode = VNx2DImode;
-rtx sve_op0 = simplify_gen_subreg (sve_mode, operands[0], V2DImode, 0);
-rtx sve_op1 = simplify_gen_subreg (sve_mode, operands[1], V2DImode, 0);
-rtx sve_op2 = simplify_gen_subreg (sve_mode, operands[2], V2DImode, 0);
-
-emit_insn (gen_mulvnx2di3 (sve_op0, sve_op1, sve_op2));
-DONE;
-  }
-)
-
 (define_insn "bswap2"
   [(set (match_operand:VDQHSD 0 "register_operand" "=w")
 (bswap:VDQHSD (match_operand:VDQHSD 1 "register_operand" "w")))]
@@ -2678,27 +2658,6 @@ (define_insn "*div3"
   [(set_attr "type" "neon_fp_div_")]
 )
 
-;; SVE has vector integer divisions, unlike Advanced SIMD.
-;; We can use it with Advanced SIMD modes to expose the V2DI and V4SI
-;; optabs to the midend.
-(define_expand "div3"
-  [(set (match_operand:VQDIV 0 "register_operand")
-   (ANY_DIV:VQDIV
- (match_operand:VQDIV 1 "register_operand")
- (match_operand:VQDIV 2 "register_operand")))]
-  "TARGET_SVE"
-  {
-machine_mode sve_mode
-  = aarch64_full_sve_mode (GET_MODE_INNER (mode)).require ();
-rtx sve_op0 = simplify_gen_subreg (sve_mode, operands[0], mode, 0);
-rtx sve_op1 = simplify_gen_subreg (sve_mode, operands[1], mode, 0);
-rtx sve_op2 = simplify_gen_subreg (sve_mode, operands[2], mode, 0);
-
-emit_insn (gen_div3 (sve_op0, sve_op1, sve_op2));
-DONE;
-  }
-)
-
 (define_insn "neg2"
  [(set (match_operand:VHSDF 0 "register_operand" "=w")
(neg:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
e1e3c1bd0b7d12eefe43dc95a10716c24e3a48de..eca8623e587af944927a9459e29d5f8af170d347
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3789,16 +3789,35 @@ (define_expand "3"
   [(set (match_operand:SVE_I 0 "register_operand")
(unspec:SVE_I
  [(match_dup 3)
-  (SVE_INT_BINARY_IMM:SVE_I
+  (SVE_INT_BINARY_MULTI:SVE_I
 (match_operand:SVE_I 1 "register_operand")
 (match_operand:SVE_I 2 "aarch64_sve__operand"))]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
+  {
+operands[3] = aarch64_ptrue_reg (mode);
+  }
+)
+
+;; Unpredicated integer binary operations that have an immediate form.
+;; Advanced SIMD does not support vector DImode MUL, but SVE does.
+;; Make use of the overlap between Z and V registers to implement the V2DI
+;; optab for TARGET_SVE.  The mulvnx2di3 expander can
+;; handle the TARGET_SVE2 case transparently.
+(define_expand "mul3"
+  [(set (match_operand:SVE_I_SIMD_DI 0 "register_operand")
+   (unspec:SVE_I_SIMD_DI
+ [(match_dup 3)
+  (mult:SVE_I_SIMD_DI
+(match_operand:SVE_I_SIMD_DI 1 "register_operand")
+(match_operand:SVE_I_SIMD_DI 2 "aarch64_sve_vsm_operand"))]
+ UNSPEC_PRED_X))]
+  "TARGET_SVE"
   {
 /* SVE2 supports th

[PATCH]AArch64: Do not allow SIMD clones with simdlen 1 [PR113552]

2024-01-24 Thread Tamar Christina
Hi All,

The AArch64 vector PCS does not allow simd calls with simdlen 1,
however due to a bug we currently do allow it for num == 0.

This causes us to emit a symbol that doesn't exist and we fail to link.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and for backport to GCC 13,12,11?

Thanks,
Tamar



gcc/ChangeLog:

PR tree-optimization/113552
* config/aarch64/aarch64.cc
(aarch64_simd_clone_compute_vecsize_and_simdlen): Block simdlen 1.

gcc/testsuite/ChangeLog:

PR tree-optimization/113552
* gcc.target/aarch64/pr113552.c: New test.
* gcc.target/aarch64/simd_pcs_attribute-3.c: Remove bogus check.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
e6bd3fd0bb42c70603d5335402b89c9deeaf48d8..a2fc1a5d9d27e9d837e4d616e3feaf38f7272b4f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28620,7 +28620,8 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct 
cgraph_node *node,
   if (known_eq (clonei->simdlen, 0U))
 {
   simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
-  simdlens.safe_push (simdlen);
+  if (known_ne (simdlen, 1U))
+   simdlens.safe_push (simdlen);
   simdlens.safe_push (simdlen * 2);
 }
   else
diff --git a/gcc/testsuite/gcc.target/aarch64/pr113552.c 
b/gcc/testsuite/gcc.target/aarch64/pr113552.c
new file mode 100644
index 
..9c96b061ed2b4fcc57e58925277f74d14f79c51f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr113552.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8-a" } */
+
+__attribute__ ((__simd__ ("notinbranch"), const))
+double cos (double);
+
+void foo (float *a, double *b)
+{
+for (int i = 0; i < 12; i+=3)
+  {
+b[i] = cos (5.0 * a[i]);
+b[i+1] = cos (5.0 * a[i+1]);
+b[i+2] = cos (5.0 * a[i+2]);
+  }
+}
+
+/* { dg-final { scan-assembler-times {bl\t_ZGVnN2v_cos} 6 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c 
b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
index 
95f6a6803e889c02177ef10972962ed62d2095eb..661764b3d4a89e08951a7a3c0495d5b7ba7f0871
 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
@@ -18,7 +18,5 @@ double foo(double x)
 }
 
 /* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
-/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM1v_foo} 1 } } */
 /* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM2v_foo} 1 } } */
-/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN1v_foo} 1 } } */
 /* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */




-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
e6bd3fd0bb42c70603d5335402b89c9deeaf48d8..a2fc1a5d9d27e9d837e4d616e3feaf38f7272b4f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28620,7 +28620,8 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct 
cgraph_node *node,
   if (known_eq (clonei->simdlen, 0U))
 {
   simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
-  simdlens.safe_push (simdlen);
+  if (known_ne (simdlen, 1U))
+   simdlens.safe_push (simdlen);
   simdlens.safe_push (simdlen * 2);
 }
   else
diff --git a/gcc/testsuite/gcc.target/aarch64/pr113552.c 
b/gcc/testsuite/gcc.target/aarch64/pr113552.c
new file mode 100644
index 
..9c96b061ed2b4fcc57e58925277f74d14f79c51f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr113552.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8-a" } */
+
+__attribute__ ((__simd__ ("notinbranch"), const))
+double cos (double);
+
+void foo (float *a, double *b)
+{
+for (int i = 0; i < 12; i+=3)
+  {
+b[i] = cos (5.0 * a[i]);
+b[i+1] = cos (5.0 * a[i+1]);
+b[i+2] = cos (5.0 * a[i+2]);
+  }
+}
+
+/* { dg-final { scan-assembler-times {bl\t_ZGVnN2v_cos} 6 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c 
b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
index 
95f6a6803e889c02177ef10972962ed62d2095eb..661764b3d4a89e08951a7a3c0495d5b7ba7f0871
 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
@@ -18,7 +18,5 @@ double foo(double x)
 }
 
 /* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
-/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM1v_foo} 1 } } */
 /* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM2v_foo} 1 } } */
-/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN1v_foo} 1 } } */
 /* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */





Re: [PATCH]AArch64: Do not allow SIMD clones with simdlen 1 [PR113552]

2024-01-24 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> The AArch64 vector PCS does not allow simd calls with simdlen 1,
> however due to a bug we currently do allow it for num == 0.
>
> This causes us to emit a symbol that doesn't exist and we fail to link.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and for backport to GCC 13,12,11?
>
> Thanks,
> Tamar
>
>
>
> gcc/ChangeLog:
>
>   PR tree-optimization/113552
>   * config/aarch64/aarch64.cc
>   (aarch64_simd_clone_compute_vecsize_and_simdlen): Block simdlen 1.
>
> gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/113552
>   * gcc.target/aarch64/pr113552.c: New test.
>   * gcc.target/aarch64/simd_pcs_attribute-3.c: Remove bogus check.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> e6bd3fd0bb42c70603d5335402b89c9deeaf48d8..a2fc1a5d9d27e9d837e4d616e3feaf38f7272b4f
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -28620,7 +28620,8 @@ aarch64_simd_clone_compute_vecsize_and_simdlen 
> (struct cgraph_node *node,
>if (known_eq (clonei->simdlen, 0U))
>  {
>simdlen = exact_div (poly_uint64 (64), nds_elt_bits);
> -  simdlens.safe_push (simdlen);
> +  if (known_ne (simdlen, 1U))

maybe_ne (i.e. !known_eq) is more canonical.

> + simdlens.safe_push (simdlen);
>simdlens.safe_push (simdlen * 2);
>  }
>else
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr113552.c 
> b/gcc/testsuite/gcc.target/aarch64/pr113552.c
> new file mode 100644
> index 
> ..9c96b061ed2b4fcc57e58925277f74d14f79c51f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr113552.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -march=armv8-a" } */
> +
> +__attribute__ ((__simd__ ("notinbranch"), const))
> +double cos (double);
> +
> +void foo (float *a, double *b)
> +{
> +for (int i = 0; i < 12; i+=3)
> +  {
> +b[i] = cos (5.0 * a[i]);
> +b[i+1] = cos (5.0 * a[i+1]);
> +b[i+2] = cos (5.0 * a[i+2]);
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-times {bl\t_ZGVnN2v_cos} 6 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c 
> b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
> index 
> 95f6a6803e889c02177ef10972962ed62d2095eb..661764b3d4a89e08951a7a3c0495d5b7ba7f0871
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
> +++ b/gcc/testsuite/gcc.target/aarch64/simd_pcs_attribute-3.c
> @@ -18,7 +18,5 @@ double foo(double x)
>  }
>  
>  /* { dg-final { scan-assembler-not {\.variant_pcs\tfoo} } } */
> -/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM1v_foo} 1 } } */
>  /* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnM2v_foo} 1 } } */
> -/* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN1v_foo} 1 } } */
>  /* { dg-final { scan-assembler-times {\.variant_pcs\t_ZGVnN2v_foo} 1 } } */

Think it'd be a bit safer to turn these into scan-assembler-nots,
rather than remove them.

OK for trunk and branches with those changes, thanks.

Richard


Re: [PATCH] aarch64: Re-enable ldp/stp fusion pass

2024-01-24 Thread Alex Coplan
On 24/01/2024 09:15, Kyrylo Tkachov wrote:
> Hi Alex,
> 
> > -Original Message-
> > From: Alex Coplan 
> > Sent: Wednesday, January 24, 2024 8:34 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Earnshaw ; Richard Sandiford
> > ; Kyrylo Tkachov ;
> > Jakub Jelinek 
> > Subject: [PATCH] aarch64: Re-enable ldp/stp fusion pass
> > 
> > Hi,
> > 
> > Since, to the best of my knowledge, all reported regressions related to
> > the ldp/stp fusion pass have now been fixed, and PGO+LTO bootstrap with
> > --enable-languages=all is working again with the passes enabled, this
> > patch turns the passes back on by default, as agreed with Jakub here:
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642478.html
> > 
> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> > 
> 
> If we were super-pedantic about the GCC rules we could say that this is a 
> revert of 8ed77a2356c3562f96c64f968e7529065c128c6a and therefore:
> "Similarly, no outside approval is needed to revert a patch that you checked 
> in." 😊
> But that would go against the spirit of the rule.

Heh, definitely seems against the spirit of the rule.

> Anyway, this is ok. Thanks for working through the regressions so diligently.

Thanks! Pushed as g:da9647e98aa289ba3aba41cf5bbe14d0f5f27e77.

I'll keep an eye on gcc-bugs for any further fallout.

Alex

> Kyrill
> 
> > Thanks,
> > Alex
> > 
> > gcc/ChangeLog:
> > 
> > * config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
> > to 1.
> > (-mlate-ldp-fusion): Likewise.


[PATCH v1] LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

2024-01-24 Thread Li Wei
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_multiply_add_p): New.
(loongarch_vector_costs::add_stmt_cost): Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/vect/vect-10.f90: New test.
---
 gcc/config/loongarch/loongarch.cc  | 42 +
 gcc/testsuite/gfortran.dg/vect/vect-10.f90 | 71 ++
 2 files changed, 113 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/vect/vect-10.f90

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 072c68d97e3..32a0b6f43e8 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4096,6 +4096,36 @@ 
loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vi
   return 1 << ceil_log2 (uf);
 }
 
+static bool
+loongarch_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
+{
+  gassign *assign = dyn_cast (stmt_info->stmt);
+  if (!assign)
+return false;
+  tree_code code = gimple_assign_rhs_code (assign);
+  if (code != PLUS_EXPR && code != MINUS_EXPR)
+return false;
+
+  auto is_mul_result = [&](int i)
+{
+  tree rhs = gimple_op (assign, i);
+  if (TREE_CODE (rhs) != SSA_NAME)
+   return false;
+
+  stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
+  if (!def_stmt_info
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
+   return false;
+  gassign *rhs_assign = dyn_cast (def_stmt_info->stmt);
+  if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
+   return false;
+
+  return true;
+};
+
+  return is_mul_result (1) || is_mul_result (2);
+}
+
 unsigned
 loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
   stmt_vec_info stmt_info, slp_tree,
@@ -4108,6 +4138,18 @@ loongarch_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
 {
   int stmt_cost = loongarch_builtin_vectorization_cost (kind, vectype,
misalign);
+  if (vectype && stmt_info)
+   {
+ gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
+ machine_mode mode = TYPE_MODE (vectype);
+ if (kind == vector_stmt && GET_MODE_SIZE (mode) == 16 && assign)
+   {
+ if (!vect_is_reduction (stmt_info)
+ && loongarch_multiply_add_p (m_vinfo, stmt_info))
+   stmt_cost = 0;
+   }
+   }
+
   retval = adjust_cost_for_freq (stmt_info, where, count * stmt_cost);
   m_costs[where] += retval;
 
diff --git a/gcc/testsuite/gfortran.dg/vect/vect-10.f90 
b/gcc/testsuite/gfortran.dg/vect/vect-10.f90
new file mode 100644
index 000..b85bc2702a3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/vect/vect-10.f90
@@ -0,0 +1,71 @@
+! { dg-do compile }
+! { dg-additional-options "-Ofast -mlsx -fvect-cost-model=dynamic" { target 
loongarch64*-*-* } }
+
+MODULE material_mod
+
+IMPLICIT NONE
+
+integer, parameter :: dfp = selected_real_kind (13, 99)
+integer, parameter :: rfp = dfp
+
+PUBLIC Mat_updateE, iepx, iepy, iepz
+
+PRIVATE
+
+integer, dimension (:, :, :), allocatable :: iepx, iepy, iepz
+real (kind = rfp), dimension (:), allocatable :: Dbdx, Dbdy, Dbdz
+integer :: imin, jmin, kmin
+integer, dimension (6) :: Exsize
+integer, dimension (6) :: Eysize
+integer, dimension (6) :: Ezsize
+integer, dimension (6) :: Hxsize
+integer, dimension (6) :: Hysize
+integer, dimension (6) :: Hzsize
+
+CONTAINS
+
+SUBROUTINE mat_updateE (nx, ny, nz, Hx, Hy, Hz, Ex, Ey, Ez)
+
+integer, intent (in) :: nx, ny, nz
+
+real (kind = rfp), intent (inout), &
+  dimension (Exsize (1) : Exsize (2), Exsize (3) : Exsize (4), Exsize (5) : 
Exsize (6)) :: Ex
+real (kind = rfp), intent (inout), &
+  dimension (Eysize (1) : Eysize (2), Eysize (3) : Eysize (4), Eysize (5) : 
Eysize (6)) :: Ey
+real (kind = rfp), intent (inout), &
+  dimension (Ezsize (1) : Ezsize (2), Ezsize (3) : Ezsize (4), Ezsize (5) : 
Ezsize (6)) :: Ez
+real (kind = rfp), intent (in),&
+  dimension (Hxsize (1) : Hxsize (2), Hxsize (3) : Hxsize (4), Hxsize (5) : 
Hxsize (6)) :: Hx
+real (kind = rfp), intent (in),&
+  dimension (Hysize (1) : Hysize (2), Hysize (3) : Hysize (4), Hysize (5) : 
Hysize (6)) :: Hy
+real (kind = rfp), intent (in),&
+  dimension (Hzsize (

[PATCH] testsuite: i386: Don't restrict gcc.dg/vect/vect-simd-clone-16c.c etc. to it686 [PR113556]

2024-01-24 Thread Rainer Orth
A couple of gcc.dg/vect/vect-simd-clone-1*.c tests FAIL on 32-bit
Solaris/x86 since 20230222:

FAIL: gcc.dg/vect/vect-simd-clone-16c.c scan-tree-dump-times vect "[nr] 
[^n]* = foo.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16d.c scan-tree-dump-times vect "[nr] 
[^n]* = foo.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17c.c scan-tree-dump-times vect "[nr] 
[^n]* = foo.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17d.c scan-tree-dump-times vect "[nr] 
[^n]* = foo.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18c.c scan-tree-dump-times vect "[nr] 
[^n]* = foo.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18d.c scan-tree-dump-times vect "[nr] 
[^n]* = foo.simdclone" 2

The problem is that the 32-bit Solaris/x86 triple still uses i386,
although gcc defaults to -mpentium4.  However, the tests only handle
x86_64* and i686*, although the tests don't seem to require some
specific ISA extension not covered by vect_simd_clones.

To fix this, the tests now allow generic i?86.  At the same time, I've
removed the wildcards from x86_64* and i686* since DejaGnu uses the
canonical forms.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-01-24  Rainer Orth  

gcc/testsuite:
PR target/113556
* gcc.dg/vect/vect-simd-clone-16c.c: Don't wildcard x86_64 in
target specs.  Allow any i?86 target instead of i686 only.
* gcc.dg/vect/vect-simd-clone-16d.c: Likewise.
* gcc.dg/vect/vect-simd-clone-17c.c: Likewise.
* gcc.dg/vect/vect-simd-clone-17d.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18c.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18d.c: Likewise.

# HG changeset patch
# Parent  fdb3dfc0639b142e4a092ef5f9c68954d3e5bf74
testsuite: i386: Don't restrict gcc.dg/vect/vect-simd-clone-16c.c etc. to it686 [PR113556]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
@@ -7,11 +7,11 @@
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
Some targets use another call for the epilogue loops.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64*-*-* || { i686*-*-* || aarch64*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64-*-* || { i?86-*-* || aarch64*-*-* } } } } } } */
 /* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { aarch64*-*-* } } } } */
 
 /* x86_64 fails to use in-branch clones for TYPE=short.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64*-*-* i686*-*-* } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64-*-* i?86-*-* } } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
@@ -7,11 +7,11 @@
 
 /* Ensure the the in-branch simd clones are used on targets that support them.
Some targets use another call for the epilogue loops.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64*-*-* || { i686*-*-* || aarch64*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64-*-* || { i?86-*-* || aarch64*-*-* } } } } } } */
 /* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { aarch64*-*-* } } } } */
 
 /* x86_64 fails to use in-branch clones for TYPE=char.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64*-*-* i686*-*-* } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64-*-* i?86-*-* } } } */
 
 /* The LTO test produces two dump files and we scan the wrong one.  */
 /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
@@ -7,11 +7,11 @@
  
 /* Ensure the the in-branch simd clones are used on targets that support them.
Some targets use another call for the epilogue loops.  */
-/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { ta

Re: [PATCH]AArch64: Fix expansion of Advanced SIMD div and mul using SVE [PR109636]

2024-01-24 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> As suggested in the ticket this replaces the expansion by converting the
> Advanced SIMD types to SVE types by simply printing out an SVE register for
> these instructions.
>
> This fixes the subreg issues since there are no subregs involved anymore.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/109636
>   * config/aarch64/aarch64-simd.md (div3,
>   mulv2di3): Remove.
>   * config/aarch64/iterators.md (VQDIV): Remove.
>   (SVE_FULL_SDI_SIMD, SVE_FULL_SDI_SIMD_DI, SVE_FULL_HSDI_SIMD_DI,
>   SVE_I_SIMD_DI): New.
>   (VPRED, sve_lane_con): Add V4SI and V2DI.
>   * config/aarch64/aarch64-sve.md (3,
>   @aarch64_pred_): Support Advanced SIMD types.
>   (mul3): New, split from 3.
>   (@aarch64_pred_, *post_ra_3): New.
>   * config/aarch64/aarch64-sve2.md (@aarch64_mul_lane_,
>   *aarch64_mul_unpredicated_): Change SVE_FULL_HSDI to
>   SVE_FULL_HSDI_SIMD_DI.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/109636
>   * gcc.target/aarch64/sve/pr109636_1.c: New test.
>   * gcc.target/aarch64/sve/pr109636_2.c: New test.
>   * gcc.target/aarch64/sve2/pr109636_1.c: New test.
>
> --- inline copy of patch -- 
> [...]
> @@ -550,6 +559,13 @@ (define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI
>VNx4SI VNx2SI
>VNx2DI])
>  
> +;; All SVE integer vector modes and Advanced SIMD 64-bit vector
> +;; element modes
> +(define_mode_iterator SVE_I_SIMD_DI [VNx16QI VNx8QI VNx4QI VNx2QI
> +  VNx8HI VNx4HI VNx2HI
> +  VNx4SI VNx2SI
> +  VNx2DI V2DI])
> +

IMO this would be more robust as:

(define_mode_iterator SVE_I_SIMD_DI [SVE_I V2DI])

Your call on whether that's better or worse for the others.

OK with that changes, thanks.  I suppose at some point we should extend
the division patterns to V2SI, but that's clearly not stage 4 material.

Richard

>  ;; SVE integer vector modes whose elements are 16 bits or wider.
>  (define_mode_iterator SVE_HSDI [VNx8HI VNx4HI VNx2HI
>   VNx4SI VNx2SI
> @@ -2268,7 +2284,8 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI 
> "VNx8BI")
>(VNx32HI "VNx8BI") (VNx32HF "VNx8BI")
>(VNx32BF "VNx8BI")
>(VNx16SI "VNx4BI") (VNx16SF "VNx4BI")
> -  (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")])
> +  (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
> +  (V4SI "VNx4BI") (V2DI "VNx2BI")])
>  
>  ;; ...and again in lower case.
>  (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi")
> @@ -2370,6 +2387,7 @@ (define_mode_attr narrower_mask [(VNx8HI "0x81") 
> (VNx4HI "0x41")
>  
>  ;; The constraint to use for an SVE [SU]DOT, FMUL, FMLA or FMLS lane index.
>  (define_mode_attr sve_lane_con [(VNx8HI "y") (VNx4SI "y") (VNx2DI "x")
> +   (V2DI "x")
>   (VNx8HF "y") (VNx4SF "y") (VNx2DF "x")])
>  
>  ;; The constraint to use for an SVE FCMLA lane index.
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr109636_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_1.c
> new file mode 100644
> index 
> ..5b37ddd2770bcbbec37b9563644da0ba061d3789
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_1.c
> @@ -0,0 +1,13 @@
> +/* { dg-additional-options "-O -mtune=a64fx" } */
> +
> +typedef unsigned long long __attribute__((__vector_size__ (16))) V;
> +typedef unsigned long long __attribute__((__vector_size__ (32))) W;
> +
> +extern void bar (V v);
> +
> +void foo (V v, W w)
> +{
> +  bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) / v));
> +}
> +
> +/* { dg-final { scan-assembler {udiv\tz[0-9]+.d, p[0-9]+/m, z[0-9]+.d, 
> z[0-9]+.d} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr109636_2.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_2.c
> new file mode 100644
> index 
> ..6d39dc8e590a04a486a300de10c5480d9c33afba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_2.c
> @@ -0,0 +1,13 @@
> +/* { dg-additional-options "-O -mcpu=a64fx" } */
> +
> +typedef unsigned long long __attribute__((__vector_size__ (16))) V;
> +typedef unsigned long long __attribute__((__vector_size__ (32))) W;
> +
> +extern void bar (V v);
> +
> +void foom (V v, W w)
> +{
> +  bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) * v));
> +}
> +
> +/* { dg-final { scan-assembler {mul\tz[0-9]+.d, p[0-9]+/m, z[0-9]+.d, 
> z[0-9]+.d} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr109636_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pr109636_1.c
> new file mode 100644
> index 
> 0

[PATCH v1] LoongArch: Optimize implementation of single-precision floating-point approximate division.

2024-01-24 Thread Li Wei
We found that in the spec17 521.wrf program, some loop invariant code generated
from single-precision floating-point approximate division calculation failed to
propose a loop. This is because the pseudo-register that stores the
intermediate temporary calculation results is rewritten in the implementation
of single-precision floating-point approximate division, failing to propose
invariants in the loop2_invariant pass. To this end, the intermediate temporary
calculation results are stored in new pseudo-registers without destroying the
read-write dependency, so that they could be recognized as loop invariants in
the loop2_invariant pass.
After optimization, the number of instructions of 521.wrf is reduced by 0.18%
compared with before optimization (1716612948501 -> 1713471771364).

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_emit_swdivsf): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/invariant-recip.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 19 +++
 .../gcc.target/loongarch/invariant-recip.c| 33 +++
 2 files changed, 46 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/invariant-recip.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 32a0b6f43e8..1b88147fd8c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -10894,16 +10894,23 @@ void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, 
machine_mode mode)
   /* x0 = 1./b estimate.  */
   emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
  unspec)));
-  /* 2.0 - b * x0  */
+  /* e0 = 2.0 - b * x0.  */
   emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode,
   gen_rtx_NEG (mode, b), x0, mtwo)));
 
-  /* x0 = a * x0  */
   if (a != CONST1_RTX (mode))
-emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0)));
-
-  /* res = e0 * x0  */
-  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0)));
+{
+  rtx e1 = gen_reg_rtx (mode);
+  /* e1 = a * x0.  */
+  emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, a, x0)));
+  /* res = e0 * e1.  */
+  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, e1)));
+}
+  else
+{
+  /* res = e0 * x0.  */
+  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0)));
+}
 }
 
 static bool
diff --git a/gcc/testsuite/gcc.target/loongarch/invariant-recip.c 
b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c
new file mode 100644
index 000..2f64f6ed5e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=loongarch64 -mabi=lp64d -mrecip -mfrecipe 
-fdump-rtl-loop2_invariant " } */
+/* { dg-final { scan-rtl-dump "Decided to move dependent invariant" 
"loop2_invariant" } } */
+
+void
+nislfv_rain_plm (int im, int km, float dzl[im][km], float rql[im][km],
+ float dt)
+{
+  int i, k;
+  float con1, decfl;
+  float dz[km], qn[km], wi[km + 1];
+
+  for (i = 0; i < im; i++)
+{
+  for (k = 0; k < km; k++)
+{
+  dz[k] = dzl[i][k];
+}
+  con1 = 0.05;
+  for (k = km - 1; k >= 0; k--)
+{
+  decfl = (wi[k + 1] - wi[k]) * dt / dz[k];
+  if (decfl > con1)
+{
+  wi[k] = wi[k + 1] - con1 * dz[k] / dt;
+}
+}
+  for (k = 0; k < km; k++)
+{
+  rql[i][k] = qn[k];
+}
+}
+}
-- 
2.39.3



Re: [PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-01-24 Thread Xi Ruoyao
On Wed, 2024-01-24 at 17:19 +0800, Jiahao Xu wrote:
> gcc/ChangeLog:
> 
>   * config/loongarch/larchintrin.h
>   (__frecipe_s): Update function return type.
>   (__frecipe_d): Ditto.
>   (__frsqrte_s): Ditto.
>   (__frsqrte_d): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.
> 
> diff --git a/gcc/config/loongarch/larchintrin.h 
> b/gcc/config/loongarch/larchintrin.h
> index 7692415e04d..ff2c9f460ac 100644
> --- a/gcc/config/loongarch/larchintrin.h
> +++ b/gcc/config/loongarch/larchintrin.h
> @@ -336,38 +336,38 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2)
>  #ifdef __loongarch_frecipe
>  /* Assembly instruction format: fd, fj.  */
>  /* Data types in instruction templates:  SF, SF.  */
> -extern __inline void
> +extern __inline float
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  __frecipe_s (float _1)
>  {
> -  __builtin_loongarch_frecipe_s ((float) _1);
> +  return (float) __builtin_loongarch_frecipe_s ((float) _1);

I don't think the (float) conversion is needed.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-01-24 Thread Jiahao Xu



在 2024/1/24 下午5:48, Xi Ruoyao 写道:

On Wed, 2024-01-24 at 17:19 +0800, Jiahao Xu wrote:

gcc/ChangeLog:

* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.

diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h
index 7692415e04d..ff2c9f460ac 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -336,38 +336,38 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2)
  #ifdef __loongarch_frecipe
  /* Assembly instruction format: fd, fj.  */
  /* Data types in instruction templates:  SF, SF.  */
-extern __inline void
+extern __inline float
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  __frecipe_s (float _1)
  {
-  __builtin_loongarch_frecipe_s ((float) _1);
+  return (float) __builtin_loongarch_frecipe_s ((float) _1);

I don't think the (float) conversion is needed.


Indeed, this float conversion is unnecessary; I simply included it to 
align with the definitions of other intrinsic functions.




Re: [PATCH] testsuite: i386: Don't restrict gcc.dg/vect/vect-simd-clone-16c.c etc. to it686 [PR113556]

2024-01-24 Thread Jakub Jelinek
On Wed, Jan 24, 2024 at 10:37:47AM +0100, Rainer Orth wrote:
> A couple of gcc.dg/vect/vect-simd-clone-1*.c tests FAIL on 32-bit
> Solaris/x86 since 20230222:
> 
> FAIL: gcc.dg/vect/vect-simd-clone-16c.c scan-tree-dump-times vect 
> "[nr] [^n]* = foo.simdclone" 2
> FAIL: gcc.dg/vect/vect-simd-clone-16d.c scan-tree-dump-times vect 
> "[nr] [^n]* = foo.simdclone" 2
> FAIL: gcc.dg/vect/vect-simd-clone-17c.c scan-tree-dump-times vect 
> "[nr] [^n]* = foo.simdclone" 2
> FAIL: gcc.dg/vect/vect-simd-clone-17d.c scan-tree-dump-times vect 
> "[nr] [^n]* = foo.simdclone" 2
> FAIL: gcc.dg/vect/vect-simd-clone-18c.c scan-tree-dump-times vect 
> "[nr] [^n]* = foo.simdclone" 2
> FAIL: gcc.dg/vect/vect-simd-clone-18d.c scan-tree-dump-times vect 
> "[nr] [^n]* = foo.simdclone" 2
> 
> The problem is that the 32-bit Solaris/x86 triple still uses i386,
> although gcc defaults to -mpentium4.  However, the tests only handle
> x86_64* and i686*, although the tests don't seem to require some
> specific ISA extension not covered by vect_simd_clones.
> 
> To fix this, the tests now allow generic i?86.  At the same time, I've
> removed the wildcards from x86_64* and i686* since DejaGnu uses the
> canonical forms.
> 
> Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.
> 
> Ok for trunk?

Ok, thanks.

> 2024-01-24  Rainer Orth  
> 
>   gcc/testsuite:
>   PR target/113556
>   * gcc.dg/vect/vect-simd-clone-16c.c: Don't wildcard x86_64 in
>   target specs.  Allow any i?86 target instead of i686 only.
>   * gcc.dg/vect/vect-simd-clone-16d.c: Likewise.
>   * gcc.dg/vect/vect-simd-clone-17c.c: Likewise.
>   * gcc.dg/vect/vect-simd-clone-17d.c: Likewise.
>   * gcc.dg/vect/vect-simd-clone-18c.c: Likewise.
>   * gcc.dg/vect/vect-simd-clone-18d.c: Likewise.

Jakub



Re: [PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-24 Thread chenxiaolong
On 20:09 +0800 on Tuesday, 2024-01-23, Xi Ruoyao wrote:
> The vect_int_mod target selector is evaluated with the options in
> DEFAULT_VECTCFLAGS in effect, but these options are not automatically
> passed to tests out of the vect directories.  So this test fails on
> targets where integer vector modulo operation is supported but
> requiring
> an option to enable, for example LoongArch.
> 
> In this test case, the only expected optimization not happened in
> original is in corge because it needs forward propogation.  So we can
> scan the forwprop2 dump (where the vector operation is not expanded
> to
> scalars yet) instead of optimized, then we don't need to consider
> vect_int_mod or not.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR testsuite/113418
>   * gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
>   instead of -fdump-tree-optimized.
>   (dg-final): Scan forwprop2 dump instead of optimized, and
> remove
>   the use of vect_int_mod.
> ---
> 
> This fixes the test failure on loongarch64-linux-gnu, and I've also
> tested it on x86_64-linux-gnu.  Ok for trunk?
> 
>  gcc/testsuite/gcc.dg/pr104992.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/pr104992.c
> b/gcc/testsuite/gcc.dg/pr104992.c
> index 82f8c75559c..6fd513d34b2 100644
> --- a/gcc/testsuite/gcc.dg/pr104992.c
> +++ b/gcc/testsuite/gcc.dg/pr104992.c
> @@ -1,6 +1,6 @@
>  /* PR tree-optimization/104992 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
>  
>  #define vector __attribute__((vector_size(4*sizeof(int
>  
> @@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned x,
> unsigned y, unsigned z) {
>  return x / y * z == x;
>  }
>  
> -/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target {
> ! vect_int_mod } } } } */
> -/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target
> vect_int_mod } } } */
> +/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */

Hello, currently vect_int_mod vectorization operation detection only
ppc,amd,riscv,LoongArch architecture support. When -fdump-tree-
forwprop2 is used instead of -fdump-tree-optimized, The
check_effective_target_vect_int_mod procedure defined in the target-
supports.exp file will never be called. It will only be called on
pr104992.c, should we consider supporting other architectures?




[PATCH] [testsuite] Fix pretty printers regexps for GDB output

2024-01-24 Thread Christophe Lyon
GDB emits end of lines as \r\n, we currently match the reverse \n\r,
possibly leading to mismatches under racy conditions.

I noticed this while running the GCC testsuite using the equivalent of
GDB's READ1 feature [1] which helps detecting bufferization issues.

Adjusting the first regexp to match the right order implied fixing the
second one, to skip the empty lines.

Tested on aarch64-linux-gnu.

[1] https//github.com/bminor/binutils-gdb/blob/master/gdb/testsuite/README#L269

2024-01-24  Christophe Lyon  

libstdc++-v3/
* testsuite/lib/gdb-test.exp (gdb-test): Fix regexps.
---
 libstdc++-v3/testsuite/lib/gdb-test.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/testsuite/lib/gdb-test.exp 
b/libstdc++-v3/testsuite/lib/gdb-test.exp
index 31206f2fc32..0de8d9ee153 100644
--- a/libstdc++-v3/testsuite/lib/gdb-test.exp
+++ b/libstdc++-v3/testsuite/lib/gdb-test.exp
@@ -194,7 +194,7 @@ proc gdb-test { marker {selector {}} {load_xmethods 0} } {
 
 set test_counter 0
 remote_expect target [timeout_value] {
-   -re {^(type|\$([0-9]+)) = ([^\n\r]*)[\n\r]+} {
+   -re {^(type|\$([0-9]+)) = ([^\n\r]*)\r\n} {
send_log "got: $expect_out(buffer)"
 
incr test_counter
@@ -250,7 +250,7 @@ proc gdb-test { marker {selector {}} {load_xmethods 0} } {
return
}
 
-   -re {^[^$][^\n\r]*[\n\r]+} {
+   -re {^[\r\n]*[^$][^\n\r]*\r\n} {
send_log "skipping: $expect_out(buffer)"
exp_continue
}
-- 
2.34.1



Re: [PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-24 Thread Xi Ruoyao
On Wed, 2024-01-24 at 18:32 +0800, chenxiaolong wrote:
> On 20:09 +0800 on Tuesday, 2024-01-23, Xi Ruoyao wrote:
> > The vect_int_mod target selector is evaluated with the options in
> > DEFAULT_VECTCFLAGS in effect, but these options are not automatically
> > passed to tests out of the vect directories.  So this test fails on
> > targets where integer vector modulo operation is supported but
> > requiring
> > an option to enable, for example LoongArch.
> > 
> > In this test case, the only expected optimization not happened in
> > original is in corge because it needs forward propogation.  So we can
> > scan the forwprop2 dump (where the vector operation is not expanded
> > to
> > scalars yet) instead of optimized, then we don't need to consider
> > vect_int_mod or not.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR testsuite/113418
> > * gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
> > instead of -fdump-tree-optimized.
> > (dg-final): Scan forwprop2 dump instead of optimized, and
> > remove
> > the use of vect_int_mod.
> > ---
> > 
> > This fixes the test failure on loongarch64-linux-gnu, and I've also
> > tested it on x86_64-linux-gnu.  Ok for trunk?
> > 
> >  gcc/testsuite/gcc.dg/pr104992.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.dg/pr104992.c
> > b/gcc/testsuite/gcc.dg/pr104992.c
> > index 82f8c75559c..6fd513d34b2 100644
> > --- a/gcc/testsuite/gcc.dg/pr104992.c
> > +++ b/gcc/testsuite/gcc.dg/pr104992.c
> > @@ -1,6 +1,6 @@
> >  /* PR tree-optimization/104992 */
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
> > +/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
> >  
> >  #define vector __attribute__((vector_size(4*sizeof(int
> >  
> > @@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned x,
> > unsigned y, unsigned z) {
> >  return x / y * z == x;
> >  }
> >  
> > -/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target {
> > ! vect_int_mod } } } } */
> > -/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target
> > vect_int_mod } } } */
> > +/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
> 
> Hello, currently vect_int_mod vectorization operation detection only
> ppc,amd,riscv,LoongArch architecture support. When -fdump-tree-
> forwprop2 is used instead of -fdump-tree-optimized, The
> check_effective_target_vect_int_mod procedure defined in the target-
> supports.exp file will never be called. It will only be called on
> pr104992.c, should we consider supporting other architectures?

Hmm, then we should remove check_effective_target_vect_int_mod.

If we want to keep -fdump-tree-optimized for this test case and also
make it correct, we'll at least have to move it into vect/, and write
something like

{ dg-final { scan-tree-dump-times " % " 9 "optimized" { target { ! vect_int_mod 
} } } }
{ dg-final { scan-tree-dump-times " % " 6 "optimized" { target { vect_int_mod 
&& vect128 } } } }
{ dg-final { scan-tree-dump-times " % " 7 "optimized" { target { vect_int_mod 
&& vect64 && !vect128 } } } }

and how about vect256 etc?  This would be very nasty and deviating from
the original purpose of this test case (against PR104992, which is a
missed-optimization issue unrelated to vectors).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] [testsuite] Fix pretty printers regexps for GDB output

2024-01-24 Thread Jonathan Wakely
On Wed, 24 Jan 2024 at 10:48, Christophe Lyon wrote:
>
> GDB emits end of lines as \r\n, we currently match the reverse \n\r,

We currently match [\n\r]+ which should match any of \n, \r, \n\r or \r\n


> possibly leading to mismatches under racy conditions.

What do we incorrectly match? Is the problem that a \r\n sequence
might be incompletely printed, due to buffering, and so the regex only
sees (and matches) the \r which then leaves an unwanted \n in the
stream, which then interferes with the next match? I don't understand
why that problem wouldn't just result in a failed match with your new
regex though.


>
> I noticed this while running the GCC testsuite using the equivalent of
> GDB's READ1 feature [1] which helps detecting bufferization issues.
>
> Adjusting the first regexp to match the right order implied fixing the
> second one, to skip the empty lines.

At the very least, this part of the description is misleading. The
existing regex matches "the right order" already. The change is to
match *exactly* \r\n instead of any mix of CR and LF characters.
That's not about matching "the right order", it's being more precise
in what we match.

But I'm still confused about what the failure scenario is and how the
change fixes it.

>
> Tested on aarch64-linux-gnu.
>
> [1] 
> https//github.com/bminor/binutils-gdb/blob/master/gdb/testsuite/README#L269
>
> 2024-01-24  Christophe Lyon  
>
> libstdc++-v3/
> * testsuite/lib/gdb-test.exp (gdb-test): Fix regexps.
> ---
>  libstdc++-v3/testsuite/lib/gdb-test.exp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/lib/gdb-test.exp 
> b/libstdc++-v3/testsuite/lib/gdb-test.exp
> index 31206f2fc32..0de8d9ee153 100644
> --- a/libstdc++-v3/testsuite/lib/gdb-test.exp
> +++ b/libstdc++-v3/testsuite/lib/gdb-test.exp
> @@ -194,7 +194,7 @@ proc gdb-test { marker {selector {}} {load_xmethods 0} } {
>
>  set test_counter 0
>  remote_expect target [timeout_value] {
> -   -re {^(type|\$([0-9]+)) = ([^\n\r]*)[\n\r]+} {
> +   -re {^(type|\$([0-9]+)) = ([^\n\r]*)\r\n} {
> send_log "got: $expect_out(buffer)"
>
> incr test_counter
> @@ -250,7 +250,7 @@ proc gdb-test { marker {selector {}} {load_xmethods 0} } {
> return
> }
>
> -   -re {^[^$][^\n\r]*[\n\r]+} {
> +   -re {^[\r\n]*[^$][^\n\r]*\r\n} {
> send_log "skipping: $expect_out(buffer)"
> exp_continue
> }
> --
> 2.34.1
>



[PATCH v1 1/4] Improve must tail in RTL backend

2024-01-24 Thread Andi Kleen
- Give error messages for all causes of non sibling call generation
- Don't override choices of other non sibling call checks with
must tail. This causes ICEs. The must tail attribute now only
overrides flag_optimize_sibling_calls locally.
- Error out when tree-tailcall failed to mark a must-tail call
sibcall. In this case it doesn't know the true reason and only gives
a vague message (this could be improved, but it's already useful without
that) tree-tailcall usually fails without optimization, so must
adjust the existing must-tail plugin test to specify -O2.
---
 gcc/calls.cc  | 31 +--
 .../gcc.dg/plugin/must-tail-call-1.c  |  1 +
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 01f447347437..3115807b7788 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -2650,7 +2650,9 @@ expand_call (tree exp, rtx target, int ignore)
   /* The type of the function being called.  */
   tree fntype;
   bool try_tail_call = CALL_EXPR_TAILCALL (exp);
-  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
+  /* tree-tailcall decided not to do tail calls. Error for the musttail case.  
*/
+  if (!try_tail_call)
+  maybe_complain_about_tail_call (exp, "cannot tail-call: other reasons");
   int pass;
 
   /* Register in which non-BLKmode value will be returned,
@@ -3021,10 +3023,22 @@ expand_call (tree exp, rtx target, int ignore)
  pushed these optimizations into -O2.  Don't try if we're already
  expanding a call, as that means we're an argument.  Don't try if
  there's cleanups, as we know there's code to follow the call.  */
-  if (currently_expanding_call++ != 0
-  || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
-  || args_size.var
-  || dbg_cnt (tail_call) == false)
+  if (currently_expanding_call++ != 0)
+{
+  maybe_complain_about_tail_call (exp, "cannot tail-call: inside another 
call");
+  try_tail_call = 0;
+}
+  if (!flag_optimize_sibling_calls
+   && !CALL_FROM_THUNK_P (exp)
+   && !CALL_EXPR_MUST_TAIL_CALL (exp))
+try_tail_call = 0;
+  if (args_size.var)
+{
+  /* ??? correct message?  */
+  maybe_complain_about_tail_call (exp, "cannot tail-call: stack space 
needed");
+  try_tail_call = 0;
+}
+  if (dbg_cnt (tail_call) == false)
 try_tail_call = 0;
 
   /* Workaround buggy C/C++ wrappers around Fortran routines with
@@ -3045,15 +3059,12 @@ expand_call (tree exp, rtx target, int ignore)
if (MEM_P (*iter))
  {
try_tail_call = 0;
+   maybe_complain_about_tail_call (exp,
+   "cannot tail-call: hidden string length 
argument");
break;
  }
}
 
-  /* If the user has marked the function as requiring tail-call
- optimization, attempt it.  */
-  if (must_tail_call)
-try_tail_call = 1;
-
   /*  Rest of purposes for tail call optimizations to fail.  */
   if (try_tail_call)
 try_tail_call = can_implement_as_sibling_call_p (exp,
diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c 
b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
index 3a6d4cceaba7..44af361e2925 100644
--- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
@@ -1,4 +1,5 @@
 /* { dg-do compile { target tail_call } } */
+/* { dg-options "-O2" } */
 /* { dg-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 extern void abort (void);
-- 
2.43.0



[PATCH v1 3/4] Add tests for C++ musttail attribute

2024-01-24 Thread Andi Kleen
Mostly adopted from the existing C musttail plugin tests.
---
 gcc/testsuite/g++.dg/musttail1.C | 15 
 gcc/testsuite/g++.dg/musttail2.C | 35 ++
 gcc/testsuite/g++.dg/musttail3.C | 42 
 gcc/testsuite/g++.dg/musttail4.C | 19 +++
 4 files changed, 111 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/musttail1.C
 create mode 100644 gcc/testsuite/g++.dg/musttail2.C
 create mode 100644 gcc/testsuite/g++.dg/musttail3.C
 create mode 100644 gcc/testsuite/g++.dg/musttail4.C

diff --git a/gcc/testsuite/g++.dg/musttail1.C b/gcc/testsuite/g++.dg/musttail1.C
new file mode 100644
index ..c9276e0ae86a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/musttail1.C
@@ -0,0 +1,15 @@
+/* { dg-do compile { target tail_call } } */
+/* { dg-options "-std=c++11 -O2" } */
+/* { dg-options "-fdelayed-branch" { target sparc*-*-* } } */
+
+int __attribute__((noinline,noclone))
+callee (int i)
+{
+  return i * i;
+}
+
+int __attribute__((noinline,noclone))
+caller (int i)
+{
+  [[gnu::musttail]] return callee (i + 1);
+}
diff --git a/gcc/testsuite/g++.dg/musttail2.C b/gcc/testsuite/g++.dg/musttail2.C
new file mode 100644
index ..d9151d25f517
--- /dev/null
+++ b/gcc/testsuite/g++.dg/musttail2.C
@@ -0,0 +1,35 @@
+/* { dg-do compile { target tail_call } } */
+/* { dg-options "-std=c++11" } */
+
+struct box { char field[256]; int i; };
+
+int __attribute__((noinline,noclone))
+test_2_callee (int i, struct box b)
+{
+  if (b.field[0])
+return 5;
+  return i * i;
+}
+
+int __attribute__((noinline,noclone))
+test_2_caller (int i)
+{
+  struct box b;
+  [[gnu::musttail]] return test_2_callee (i + 1, b); /* { dg-error "cannot 
tail-call: " } */
+}
+
+extern void setjmp (void);
+void
+test_3 (void)
+{
+  [[gnu::musttail]] return setjmp (); /* { dg-error "cannot tail-call: " } */
+}
+
+typedef void (fn_ptr_t) (void);
+volatile fn_ptr_t fn_ptr;
+
+void
+test_5 (void)
+{
+  [[gnu::musttail]] return fn_ptr (); /* { dg-error "cannot tail-call: " } */
+}
diff --git a/gcc/testsuite/g++.dg/musttail3.C b/gcc/testsuite/g++.dg/musttail3.C
new file mode 100644
index ..7d55f44124ee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/musttail3.C
@@ -0,0 +1,42 @@
+/* { dg-do compile { target tail_call } } */
+/* { dg-options "-std=c++11" } */
+
+extern int foo2 (int x, ...);
+
+struct str
+{
+  int a, b;
+};
+
+str
+cstruct (int x)
+{
+  if (x < 10)
+[[clang::musttail]] return cstruct (x + 1);
+  return { x, 0 };
+}
+
+int
+cstruct2 (int x, str & ref)
+{
+  if (x < 10)
+{
+  str r = { };
+  [[clang::musttail]] return cstruct2 (x + 1, r);
+}
+  return x + 1;
+}
+
+
+int
+foo (int x)
+{
+  if (x < 10)
+[[clang::musttail]] return foo2 (x, 29);
+  if (x < 100)
+{
+  int k = foo (x + 1);
+  [[clang::musttail]] return k;/* { dg-error "cannot tail-call: " } */
+}
+  return x;
+}
diff --git a/gcc/testsuite/g++.dg/musttail4.C b/gcc/testsuite/g++.dg/musttail4.C
new file mode 100644
index ..3122acfb1719
--- /dev/null
+++ b/gcc/testsuite/g++.dg/musttail4.C
@@ -0,0 +1,19 @@
+/* { dg-do compile { target tail_call } } */
+/* Allow nested functions.  */
+/* { dg-options "-Wno-pedantic -std=c++11" } */
+
+struct box { char field[64]; int i; };
+
+struct box __attribute__((noinline,noclone))
+returns_struct (int i)
+{
+  struct box b;
+  b.i = i * i;
+  return b;
+}
+
+int __attribute__((noinline,noclone))
+test_1 (int i)
+{
+  [[gnu::musttail]] return returns_struct (i * 5).i; /* { dg-error "return 
value must be a call" } */
+}
-- 
2.43.0



Re: [PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-24 Thread chenxiaolong
At 19:00 +0800 on Wednesday, 2024-01-24, Xi Ruoyao wrote:
> On Wed, 2024-01-24 at 18:32 +0800, chenxiaolong wrote:
> > On 20:09 +0800 on Tuesday, 2024-01-23, Xi Ruoyao wrote:
> > > The vect_int_mod target selector is evaluated with the options in
> > > DEFAULT_VECTCFLAGS in effect, but these options are not
> > > automatically
> > > passed to tests out of the vect directories.  So this test fails
> > > on
> > > targets where integer vector modulo operation is supported but
> > > requiring
> > > an option to enable, for example LoongArch.
> > > 
> > > In this test case, the only expected optimization not happened in
> > > original is in corge because it needs forward propogation.  So we
> > > can
> > > scan the forwprop2 dump (where the vector operation is not
> > > expanded
> > > to
> > > scalars yet) instead of optimized, then we don't need to consider
> > > vect_int_mod or not.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR testsuite/113418
> > >   * gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
> > >   instead of -fdump-tree-optimized.
> > >   (dg-final): Scan forwprop2 dump instead of optimized, and
> > > remove
> > >   the use of vect_int_mod.
> > > ---
> > > 
> > > This fixes the test failure on loongarch64-linux-gnu, and I've
> > > also
> > > tested it on x86_64-linux-gnu.  Ok for trunk?
> > > 
> > >  gcc/testsuite/gcc.dg/pr104992.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/gcc/testsuite/gcc.dg/pr104992.c
> > > b/gcc/testsuite/gcc.dg/pr104992.c
> > > index 82f8c75559c..6fd513d34b2 100644
> > > --- a/gcc/testsuite/gcc.dg/pr104992.c
> > > +++ b/gcc/testsuite/gcc.dg/pr104992.c
> > > @@ -1,6 +1,6 @@
> > >  /* PR tree-optimization/104992 */
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
> > > +/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
> > >  
> > >  #define vector __attribute__((vector_size(4*sizeof(int
> > >  
> > > @@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned
> > > x,
> > > unsigned y, unsigned z) {
> > >  return x / y * z == x;
> > >  }
> > >  
> > > -/* { dg-final { scan-tree-dump-times " % " 9 "optimized" {
> > > target {
> > > ! vect_int_mod } } } } */
> > > -/* { dg-final { scan-tree-dump-times " % " 6 "optimized" {
> > > target
> > > vect_int_mod } } } */
> > > +/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
> > 
> > Hello, currently vect_int_mod vectorization operation detection
> > only
> > ppc,amd,riscv,LoongArch architecture support. When -fdump-tree-
> > forwprop2 is used instead of -fdump-tree-optimized, The
> > check_effective_target_vect_int_mod procedure defined in the
> > target-
> > supports.exp file will never be called. It will only be called on
> > pr104992.c, should we consider supporting other architectures?
> 
> Hmm, then we should remove check_effective_target_vect_int_mod.
> 
> If we want to keep -fdump-tree-optimized for this test case and also
> make it correct, we'll at least have to move it into vect/, and write
> something like
> 
> { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { !
> vect_int_mod } } } }
> { dg-final { scan-tree-dump-times " % " 6 "optimized" { target {
> vect_int_mod && vect128 } } } }
> { dg-final { scan-tree-dump-times " % " 7 "optimized" { target {
> vect_int_mod && vect64 && !vect128 } } } }
> 
> and how about vect256 etc?  This would be very nasty and deviating
> from
> the original purpose of this test case (against PR104992, which is a
> missed-optimization issue unrelated to vectors).
> 
Ok, let me think about how to make the pr104992.c test case more
reasonable.



[PATCH v1 4/4] Add documentation for musttail attribute

2024-01-24 Thread Andi Kleen
---
 gcc/doc/extend.texi | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0bc586d120e7..444b68f5d071 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -27890,7 +27890,8 @@ Predefined Macros,cpp,The GNU C Preprocessor}).
 each needed template instantiation is emitted.
 * Bound member functions:: You can extract a function pointer to the
 method denoted by a @samp{->*} or @samp{.*} expression.
-* C++ Attributes::  Variable, function, and type attributes for C++ only.
+* C++ Attributes::  Variable, function, statement, and type attributes
+   for C++ only.
 * Function Multiversioning::   Declaring multiple function versions.
 * Type Traits:: Compiler support for type traits.
 * C++ Concepts::Improved support for generic programming.
@@ -28458,6 +28459,22 @@ precedence and the @code{hot} attribute is not 
propagated.
 For the effects of the @code{hot} attribute on functions, see
 @ref{Common Function Attributes}.
 
+@cindex @code{musttail} statement attribute
+@item musttail
+
+The @code{gnu::musttail} or @code{clang::hottail} attribute
+can be applied to a return statement that returns the value
+of a call to indicate that the call must be a tail call
+that does not allocate extra stack space.
+
+@smallexample
+[[gnu::musttail]] return foo();
+@end smallexample
+
+If the compiler cannot generate a tail call it will generate
+an error. Tail calls generally require enabling optimization.
+On some targets they may not be supported.
+
 @end table
 
 @node Function Multiversioning
-- 
2.43.0



[PATCH v1 2/4] C++: Support clang compatible [[musttail]]

2024-01-24 Thread Andi Kleen
This patch implements a clang compatible [[musttail]] attribute for
returns.

musttail is useful as an alternative to computed goto for interpreters.
With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.

It turns out that David Malcolm had already implemented middle/backend
support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.

This patch adds a [[gnu::musttail]] attribute for C++ that can be added
to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.

For compatibility it also detects clang::musttail

One problem is that tree-tailcall usually fails when optimization
is disabled, which implies the attribute only really works with
optimization on. But that seems to be a reasonable limitation.

The attribute is only supported for C++, since the C-parser
has no support for statement attributes for non empty statements.
It could be added there with __attribute__ too but would need
some minor grammar adjustments.

Passes bootstrap and full test
---
 gcc/c-family/c-attribs.cc | 25 +
 gcc/cp/cp-tree.h  |  4 ++--
 gcc/cp/parser.cc  | 28 +++-
 gcc/cp/semantics.cc   |  6 +++---
 gcc/cp/typeck.cc  | 20 ++--
 5 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 40a0cf90295d..f31c62e76665 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -54,6 +54,7 @@ static tree handle_nocommon_attribute (tree *, tree, tree, 
int, bool *);
 static tree handle_common_attribute (tree *, tree, tree, int, bool *);
 static tree handle_hot_attribute (tree *, tree, tree, int, bool *);
 static tree handle_cold_attribute (tree *, tree, tree, int, bool *);
+static tree handle_musttail_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
  int, bool *);
@@ -499,6 +500,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
   { "hot",   0, 0, false,  false, false, false,
  handle_hot_attribute,
  attr_cold_hot_exclusions },
+  { "musttail",  0, 0, false,  false, false, false,
+ handle_musttail_attribute, NULL },
   { "no_address_safety_analysis",
  0, 0, true, false, false, false,
  handle_no_address_safety_analysis_attribute,
@@ -1290,6 +1293,28 @@ handle_hot_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
   return NULL_TREE;
 }
 
+/* Handle a "musttail" and attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_musttail_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) == FUNCTION_DECL
+  || TREE_CODE (*node) == LABEL_DECL)
+{
+  /* Attribute musttail processing is done later with lookup_attribute.  */
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
+
 /* Handle a "cold" and attribute; arguments as in
struct attribute_spec.handler.  */
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 60e6dafc5494..bed52e860a00 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7763,7 +7763,7 @@ extern void finish_while_stmt (tree);
 extern tree begin_do_stmt  (void);
 extern void finish_do_body (tree);
 extern void finish_do_stmt (tree, tree, bool, tree, bool);
-extern tree finish_return_stmt (tree);
+extern tree finish_return_stmt (tree, bool = false);
 extern tree begin_for_scope(tree *);
 extern tree begin_for_stmt (tree, tree);
 extern void finish_init_stmt   (tree);
@@ -8275,7 +8275,7 @@ extern tree composite_pointer_type(const 
op_location_t &,
 tsubst_flags_t);
 extern tree merge_types(tree, tree

MAINTAINERS: Update my work email address

2024-01-24 Thread Thomas Schwinge
Hi!

Pushed to master branch commit 7fcdb501366632fbf98a1eff275d76b9eea91aa1
"MAINTAINERS: Update my work email address", see attached.

(Happy to talk, of course!)


| Excited to announce that Sourcery Services are now available via BayLibre, 
!  🧙‍♂️
| 
| GCC, GNU Toolchain, HPC, embedded -- and more to come!
| 
| (Please allow us some time to regroup.)


Copyright assignment for BayLibre is in progress.


Grüße
 Thomas


>From 7fcdb501366632fbf98a1eff275d76b9eea91aa1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 24 Jan 2024 12:03:03 +0100
Subject: [PATCH] MAINTAINERS: Update my work email address

	* MAINTAINERS: Update my work email address.
---
 MAINTAINERS | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ade5c9f0181f..7d3b78d276eb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -102,7 +102,7 @@ nds32 port		Shiva Chen		
 nios2 port		Chung-Lin Tang		
 nios2 port		Sandra Loosemore	
 nvptx port		Tom de Vries		
-nvptx port		Thomas Schwinge		
+nvptx port		Thomas Schwinge		
 or1k port		Stafford Horne		
 pdp11 port		Paul Koning		
 powerpcspe port		Andrew Jenner		
@@ -181,7 +181,7 @@ libgcc			Ian Lance Taylor	
 libgo			Ian Lance Taylor	
 libgomp			Jakub Jelinek		
 libgomp			Tobias Burnus		
-libgomp (OpenACC)	Thomas Schwinge		
+libgomp (OpenACC)	Thomas Schwinge		
 libgrust		All Rust front end maintainers
 libiberty		Ian Lance Taylor	
 libitm			Torvald Riegel		
@@ -253,7 +253,7 @@ auto-vectorizer		Zdenek Dvorak		
 loop infrastructure	Zdenek Dvorak		
 loop ivopts		Bin Cheng		
 loop optimizer		Bin Cheng		
-OpenACC			Thomas Schwinge		
+OpenACC			Thomas Schwinge		
 OpenACC			Tobias Burnus		
 OpenMP			Jakub Jelinek		
 OpenMP			Tobias Burnus		
-- 
2.40.1



Re: [PATCH v1 2/4] C++: Support clang compatible [[musttail]]

2024-01-24 Thread Sam James


Andi Kleen  writes:

> This patch implements a clang compatible [[musttail]] attribute for
> returns.

This is PR83324. See also PR52067 and PR110899.

>
> musttail is useful as an alternative to computed goto for interpreters.
> With computed goto the interpreter function usually ends up very big
> which causes problems with register allocation and other per function
> optimizations not scaling. With musttail the interpreter can be instead
> written as a sequence of smaller functions that call each other. To
> avoid unbounded stack growth this requires forcing a sibling call, which
> this attribute does. It guarantees an error if the call cannot be tail
> called which allows the programmer to fix it instead of risking a stack
> overflow. Unlike computed goto it is also type-safe.
>

Yeah, CPython is going to require this for its new JIT.

> The attribute is only supported for C++, since the C-parser
> has no support for statement attributes for non empty statements.
> It could be added there with __attribute__ too but would need
> some minor grammar adjustments.

... although it'll need C there.



[PATCH 0/2] RISC-V/testsuite: Add RTL if-conversion testcases

2024-01-24 Thread Maciej W. Rozycki
Hi,

 As discussed previously here are a bunch of RTL if-conversion testcases 
corresponding to and produced from existing pr105314.c and cset-sext.c C 
testcases.  Verified with RV64 and RV32.  OK to apply?

  Maciej


[PATCH 1/2] RISC-V/testsuite: Add RTL pr105314.c testcase variants

2024-01-24 Thread Maciej W. Rozycki
Add a pair of RTL tests, for RV64 and RV32 respectively, corresponding 
to the existing pr105314.c test.  They have been produced from RTL code 
as at the entry of the "ce1" pass for pr105314.c compiled at -O3.

gcc/testsuite/
* gcc.target/riscv/pr105314-rtl.c: New file.
* gcc.target/riscv/pr105314-rtl32.c: New file.
---
 gcc/testsuite/gcc.target/riscv/pr105314-rtl.c   |   78 
 gcc/testsuite/gcc.target/riscv/pr105314-rtl32.c |   78 
 2 files changed, 156 insertions(+)

gcc-test-riscv-pr105314-rtl.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/pr105314-rtl.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/pr105314-rtl.c
@@ -0,0 +1,78 @@
+/* PR rtl-optimization/105314 */
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" "-flto" } } */
+/* { dg-options "-fdump-rtl-ce1" } */
+
+long __RTL (startwith ("ce1"))
+foo (long a, long b, long c)
+{
+(function "foo"
+  (param "a"
+(DECL_RTL (reg/v:DI <1> [ a ]))
+(DECL_RTL_INCOMING (reg:DI a0 [ a ])))
+  (param "b"
+(DECL_RTL (reg/v:DI <2> [ b ]))
+(DECL_RTL_INCOMING (reg:DI a1 [ b ])))
+  (param "c"
+(DECL_RTL (reg/v:DI <3> [ c ]))
+(DECL_RTL_INCOMING (reg:DI a2 [ c ])))
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 8 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 2 (set (reg/v:DI <1> [ a ])
+(reg:DI a0 [ a ])) "pr105314.c":8:1
+   (expr_list:REG_DEAD (reg:DI a0 [ a ])))
+  (cinsn 4 (set (reg/v:DI <3> [ c ])
+(reg:DI a2 [ c ])) "pr105314.c":8:1
+   (expr_list:REG_DEAD (reg:DI a2 [ c ])))
+  (cnote 5 NOTE_INSN_FUNCTION_BEG)
+  (cjump_insn 10 (set (pc)
+  (if_then_else (ne (reg/v:DI <3> [ c ])
+(const_int 0))
+(label_ref:DI 23)
+(pc))) "pr105314.c":9:6
+ (expr_list:REG_DEAD (reg/v:DI <3> [ c ])
+ (int_list:REG_BR_PROB 536870916)))
+  (edge-to 4)
+  (edge-to 3 (flags "FALLTHRU"))
+) ;; block 2
+(block 3
+  (edge-from 2 (flags "FALLTHRU"))
+  (cnote 11 [bb 3] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 6 (set (reg/v:DI <0> [  ])
+(reg/v:DI <1> [ a ])) "pr105314.c":9:6
+   (expr_list:REG_DEAD (reg/v:DI <1> [ a ])))
+  (edge-to 5 (flags "FALLTHRU"))
+) ;; block 3
+(block 4
+  (edge-from 2)
+  (clabel 23 3)
+  (cnote 22 [bb 4] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 7 (set (reg/v:DI <0> [  ])
+(const_int 0)) "pr105314.c":10:7)
+  (edge-to 5 (flags "FALLTHRU"))
+) ;; block 4
+(block 5
+  (edge-from 4 (flags "FALLTHRU"))
+  (edge-from 3 (flags "FALLTHRU"))
+  (clabel 16 1)
+  (cnote 19 [bb 5] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 17 (set (reg/i:DI a0)
+ (reg/v:DI <0> [  ])) "pr105314.c":12:1
+(expr_list:REG_DEAD (reg/v:DI <0> [  ])))
+  (cinsn 18 (use (reg/i:DI a0)) "pr105314.c":12:1)
+  (edge-to exit (flags "FALLTHRU"))
+) ;; block 5
+  ) ;; insn-chain
+  (crtl
+(return_rtx
+  (reg/i:DI a0)
+) ;; return_rtx
+  ) ;; crtl
+) ;; function "foo"
+}
+
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_store_flag_mask" 1 "ce1" } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/pr105314-rtl32.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/pr105314-rtl32.c
@@ -0,0 +1,78 @@
+/* PR rtl-optimization/105314 */
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" "-flto" } } */
+/* { dg-options "-fdump-rtl-ce1" } */
+
+long __RTL (startwith ("ce1"))
+foo (long a, long b, long c)
+{
+(function "foo"
+  (param "a"
+(DECL_RTL (reg/v:SI <1> [ a ]))
+(DECL_RTL_INCOMING (reg:SI a0 [ a ])))
+  (param "b"
+(DECL_RTL (reg/v:SI <2> [ b ]))
+(DECL_RTL_INCOMING (reg:SI a1 [ b ])))
+  (param "c"
+(DECL_RTL (reg/v:SI <3> [ c ]))
+(DECL_RTL_INCOMING (reg:SI a2 [ c ])))
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 8 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 2 (set (reg/v:SI <1> [ a ])
+(reg:SI a0 [ a ])) "pr105314.c":8:1
+   (expr_list:REG_DEAD (reg:SI a0 [ a ])))
+  (cinsn 4 (set (reg/v:SI <3> [ c ])
+(reg:SI a2 [ c ])) "pr105314.c":8:1
+   (expr_list:REG_DEAD (reg:SI a2 [ c ])))
+  (cnote 5 NOTE_INSN_FUNCTION_BEG)
+  (cjump_insn 10 (set (pc)
+  (if_then_else (ne (reg/v:SI <3> 

[PATCH 2/2] RISC-V/testsuite: Add RTL cset-sext.c testcase variants

2024-01-24 Thread Maciej W. Rozycki
Add RTL tests, for RV64 and RV32 where appropriate, corresponding to the 
existing cset-sext.c tests.  They have been produced from RTL code as at 
the entry of the "ce1" pass for the respective cset-sext.c tests built 
at -O3.

gcc/testsuite/
* gcc.target/riscv/cset-sext-rtl.c: New file.
* gcc.target/riscv/cset-sext-rtl32.c: New file.
* gcc.target/riscv/cset-sext-sfb-rtl.c: New file.
* gcc.target/riscv/cset-sext-sfb-rtl32.c: New file.
* gcc.target/riscv/cset-sext-thead-rtl.c: New file.
* gcc.target/riscv/cset-sext-ventana-rtl.c: New file.
* gcc.target/riscv/cset-sext-zicond-rtl.c: New file.
* gcc.target/riscv/cset-sext-zicond-rtl32.c: New file.
---
 gcc/testsuite/gcc.target/riscv/cset-sext-rtl.c  |   87 +++
 gcc/testsuite/gcc.target/riscv/cset-sext-rtl32.c|   84 +++
 gcc/testsuite/gcc.target/riscv/cset-sext-sfb-rtl.c  |   88 
 gcc/testsuite/gcc.target/riscv/cset-sext-sfb-rtl32.c|   85 +++
 gcc/testsuite/gcc.target/riscv/cset-sext-thead-rtl.c|   86 +++
 gcc/testsuite/gcc.target/riscv/cset-sext-ventana-rtl.c  |   86 +++
 gcc/testsuite/gcc.target/riscv/cset-sext-zicond-rtl.c   |   86 +++
 gcc/testsuite/gcc.target/riscv/cset-sext-zicond-rtl32.c |   83 +++
 8 files changed, 685 insertions(+)

gcc-test-riscv-cset-sext-rtl.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/cset-sext-rtl.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/cset-sext-rtl.c
@@ -0,0 +1,87 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" "-flto" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=6 -mmovcc 
-fdump-rtl-ce1" } */
+
+int __RTL (startwith ("ce1"))
+foo (long a, long b)
+{
+(function "foo"
+  (param "a"
+(DECL_RTL (reg/v:DI <2> [ a ]))
+(DECL_RTL_INCOMING (reg:DI a0 [ a ])))
+  (param "b"
+(DECL_RTL (reg/v:DI <3> [ b ]))
+(DECL_RTL_INCOMING (reg:DI a1 [ b ])))
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 6 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 2 (set (reg/v:DI <2> [ a ])
+(reg:DI a0 [ a ])) "cset-sext.c":8:1
+   (expr_list:REG_DEAD (reg:DI a0 [ a ])))
+  (cinsn 3 (set (reg/v:DI <3> [ b ])
+(reg:DI a1 [ b ])) "cset-sext.c":8:1
+   (expr_list:REG_DEAD (reg:DI a1 [ b ])))
+  (cnote 4 NOTE_INSN_FUNCTION_BEG)
+  (cjump_insn 8 (set (pc)
+ (if_then_else (eq (reg/v:DI <3> [ b ])
+   (const_int 0))
+   (label_ref:DI 24)
+   (pc))) "cset-sext.c":9:6
+(expr_list:REG_DEAD (reg/v:DI <3> [ b ])
+(int_list:REG_BR_PROB 365072228)))
+  (edge-to 4)
+  (edge-to 3 (flags "FALLTHRU"))
+) ;; block 2
+(block 3
+  (edge-from 2 (flags "FALLTHRU"))
+  (cnote 9 [bb 3] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 10 (set (reg:SI <5>)
+ (ne:SI (reg/v:DI <2> [ a ])
+(const_int 0))) "cset-sext.c":11:11
+(expr_list:REG_DEAD (reg/v:DI <2> [ a ])))
+  (cinsn 11 (set (reg:DI <1> [  ])
+ (sign_extend:DI (reg:SI <5>))) "cset-sext.c":11:11
+(expr_list:REG_DEAD (reg:SI <5>)))
+  (edge-to 5 (flags "FALLTHRU"))
+) ;; block 3
+(block 4
+  (edge-from 2)
+  (clabel 24 3)
+  (cnote 23 [bb 4] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 5 (set (reg:DI <1> [  ])
+(const_int 0)) "cset-sext.c":10:12)
+  (edge-to 5 (flags "FALLTHRU"))
+) ;; block 4
+(block 5
+  (edge-from 4 (flags "FALLTHRU"))
+  (edge-from 3 (flags "FALLTHRU"))
+  (clabel 12 2)
+  (cnote 13 [bb 5] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 18 (set (reg/i:DI a0)
+ (reg:DI <1> [  ])) "cset-sext.c":15:1
+(expr_list:REG_DEAD (reg:DI <1> [  ])))
+  (cinsn 19 (use (reg/i:DI a0)) "cset-sext.c":15:1)
+  (edge-to exit (flags "FALLTHRU"))
+) ;; block 5
+  ) ;; insn-chain
+  (crtl
+(return_rtx
+  (reg/i:DI a0)
+) ;; return_rtx
+  ) ;; crtl
+) ;; function "foo"
+}
+
+/* Expect branchless assembly like:
+
+   sneza1,a1
+   neg a1,a1
+   sneza0,a0
+   and a0,a1,a0
+ */
+
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove_arith" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\ssnez\\s" 2 } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/cset-sext-rtl32.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gc

Re: [PATCH 2/2] RISC-V/testsuite: Also verify if-conversion runs for pr105314.c

2024-01-24 Thread Maciej W. Rozycki
On Tue, 16 Jan 2024, Maciej W. Rozycki wrote:

> > I don't have a strong opinion on this.  I certainly see Andrew's point, but
> > it's also the case that if some work earlier in the RTL or gimple pipeline
> > comes along and compromises the test, then we'd see the failure and deal 
> > with
> > it.  It's pretty standard procedure.
> 
>  I'll be happy to add an RTL test case, also for my recent complementary 
> cset-sext.c addition and maybe other if-conversion pieces recently added.  
> I think that does not preclude arming pr105314.c with RTL scanning though.

 I have made a buch of testcases as we discussed at the meeting last week 
and the RTL parser did not blow up, so I have now submitted them.  See: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643802.html>
 
and the next two messages (threading broke with this submission for some 
reason, probably due to a glitch in my mail client I've seen from time to 
time; I guess it's not worth it to get the patch series resubmitted as 
they are independent from each other really and can be applied in any 
order).

 I haven't heard back from Andrew beyond his initial message, so it's not 
clear to me whether he maintains his objection in spite the arguments 
given.  Andrew?

 Do we have consensus now to move forward with this change as posted?  I'd 
like to get these patches ticked off ASAP.

  Maciej


Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].

2024-01-24 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, 15 Jan 2024, Robin Dapp wrote:
>
>> I gave it another shot now by introducing a separate function as
>> Richard suggested.  It's probably not at the location he intended.
>> 
>> The way I read the discussion there hasn't been any consensus
>> on how (or rather where) to properly tackle the problem.  Any
>> other ideas still?
>
> I'm happy enough with the patch, esp. at this stage.  OK if
> Richard S. doesn't disagree.

Yeah, no objection from me.

Richard


Re: [PATCH] AArch64: aarch64_class_max_nregs mishandles 64-bit structure modes [PR112577]

2024-01-24 Thread Richard Sandiford
Tejas Belagod  writes:
> The target hook aarch64_class_max_nregs returns the incorrect result for 
> 64-bit
> structure modes like V31DImode or V41DFmode etc.  The calculation of the nregs
> is based on the size of AdvSIMD vector register for 64-bit modes which ought 
> to
> be UNITS_PER_VREG / 2.  This patch fixes the register size.
>
> Existing tests like gcc.target/aarch64/advsimd-intrinsics/vld1x3.c cover this 
> change.
>
> Regression tested on aarch64-linux. Bootstrapped on aarch64-linux.
>
> OK for trunk?
>
> gcc/ChangeLog:
>
>   PR target/112577
>   * config/aarch64/aarch64.cc (aarch64_class_max_nregs): Handle 64-bit
>   vector structure modes correctly.
> ---
>  gcc/config/aarch64/aarch64.cc | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index a5a6b52730d..b9f00bdce3b 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -12914,10 +12914,12 @@ aarch64_class_max_nregs (reg_class_t regclass, 
> machine_mode mode)
> && constant_multiple_p (GET_MODE_SIZE (mode),
> aarch64_vl_bytes (mode, vec_flags), &nregs))
>   return nregs;
> -  return (vec_flags & VEC_ADVSIMD
> -   ? CEIL (lowest_size, UNITS_PER_VREG)
> -   : CEIL (lowest_size, UNITS_PER_WORD));
> -
> +  if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL))
> + return GET_MODE_SIZE (mode).to_constant () / 8;
> +  else
> + return (vec_flags & VEC_ADVSIMD
> + ? CEIL (lowest_size, UNITS_PER_VREG)
> + : CEIL (lowest_size, UNITS_PER_WORD));

Very minor, sorry, but I think it would be more usual style to add the
new condition as an early-out and so not add an "else", especially since
there's alreaedy an early-out for SVE above:

  if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL))
return GET_MODE_SIZE (mode).to_constant () / 8;
  return (vec_flags & VEC_ADVSIMD
  ? CEIL (lowest_size, UNITS_PER_VREG)
  : CEIL (lowest_size, UNITS_PER_WORD));

I think it's also worth keeping the blank line between this and the
following block of cases.

OK with that change, thanks.

Richard


>  case PR_REGS:
>  case PR_LO_REGS:
>  case PR_HI_REGS:


Re: [PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-24 Thread Xi Ruoyao
On Wed, 2024-01-24 at 19:08 +0800, chenxiaolong wrote:
> At 19:00 +0800 on Wednesday, 2024-01-24, Xi Ruoyao wrote:
> > On Wed, 2024-01-24 at 18:32 +0800, chenxiaolong wrote:
> > > On 20:09 +0800 on Tuesday, 2024-01-23, Xi Ruoyao wrote:
> > > > The vect_int_mod target selector is evaluated with the options in
> > > > DEFAULT_VECTCFLAGS in effect, but these options are not
> > > > automatically
> > > > passed to tests out of the vect directories.  So this test fails
> > > > on
> > > > targets where integer vector modulo operation is supported but
> > > > requiring
> > > > an option to enable, for example LoongArch.
> > > > 
> > > > In this test case, the only expected optimization not happened in
> > > > original is in corge because it needs forward propogation.  So we
> > > > can
> > > > scan the forwprop2 dump (where the vector operation is not
> > > > expanded
> > > > to
> > > > scalars yet) instead of optimized, then we don't need to consider
> > > > vect_int_mod or not.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > PR testsuite/113418
> > > > * gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
> > > > instead of -fdump-tree-optimized.
> > > > (dg-final): Scan forwprop2 dump instead of optimized, and
> > > > remove
> > > > the use of vect_int_mod.
> > > > ---
> > > > 
> > > > This fixes the test failure on loongarch64-linux-gnu, and I've
> > > > also
> > > > tested it on x86_64-linux-gnu.  Ok for trunk?
> > > > 
> > > >  gcc/testsuite/gcc.dg/pr104992.c | 5 ++---
> > > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/gcc/testsuite/gcc.dg/pr104992.c
> > > > b/gcc/testsuite/gcc.dg/pr104992.c
> > > > index 82f8c75559c..6fd513d34b2 100644
> > > > --- a/gcc/testsuite/gcc.dg/pr104992.c
> > > > +++ b/gcc/testsuite/gcc.dg/pr104992.c
> > > > @@ -1,6 +1,6 @@
> > > >  /* PR tree-optimization/104992 */
> > > >  /* { dg-do compile } */
> > > > -/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
> > > > +/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
> > > >  
> > > >  #define vector __attribute__((vector_size(4*sizeof(int
> > > >  
> > > > @@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned
> > > > x,
> > > > unsigned y, unsigned z) {
> > > >  return x / y * z == x;
> > > >  }
> > > >  
> > > > -/* { dg-final { scan-tree-dump-times " % " 9 "optimized" {
> > > > target {
> > > > ! vect_int_mod } } } } */
> > > > -/* { dg-final { scan-tree-dump-times " % " 6 "optimized" {
> > > > target
> > > > vect_int_mod } } } */
> > > > +/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
> > > 
> > > Hello, currently vect_int_mod vectorization operation detection
> > > only
> > > ppc,amd,riscv,LoongArch architecture support. When -fdump-tree-
> > > forwprop2 is used instead of -fdump-tree-optimized, The
> > > check_effective_target_vect_int_mod procedure defined in the
> > > target-
> > > supports.exp file will never be called. It will only be called on
> > > pr104992.c, should we consider supporting other architectures?
> > 
> > Hmm, then we should remove check_effective_target_vect_int_mod.
> > 
> > If we want to keep -fdump-tree-optimized for this test case and also
> > make it correct, we'll at least have to move it into vect/, and write
> > something like
> > 
> > { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { !
> > vect_int_mod } } } }
> > { dg-final { scan-tree-dump-times " % " 6 "optimized" { target {
> > vect_int_mod && vect128 } } } }
> > { dg-final { scan-tree-dump-times " % " 7 "optimized" { target {
> > vect_int_mod && vect64 && !vect128 } } } }
> > 
> > and how about vect256 etc?  This would be very nasty and deviating
> > from
> > the original purpose of this test case (against PR104992, which is a
> > missed-optimization issue unrelated to vectors).
> > 
> Ok, let me think about how to make the pr104992.c test case more
> reasonable.

It *is* reasonable with -fdump-tree-forwprop2.  It's purposed to test a
/ b * b -> a - a % b simplification, not vector operations.

If we need a test to test vector int modulo operations we should write a
new test in vect/, like

/* ... */

for (int i = 0; i < 4; i++)
  x[i] %= y[i];

/* ... */

/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
vect_int_mod } } } } */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] aarch64: Check the ldp/stp policy model correctly when mem ops are reversed.

2024-01-24 Thread Richard Sandiford
Manos Anagnostakis  writes:
> The current ldp/stp policy framework implementation was missing cases, where
> the memory operands were reversed. Therefore the call to the framework 
> function
> is moved after the lower mem check with the suitable parameters. Also removes
> the mode of aarch64_operands_ok_for_ldpstp, which becomes unused and triggers
> a warning on bootstrap.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-ldpstp.md: Remove unused mode.
> * config/aarch64/aarch64-protos.h (aarch64_operands_ok_for_ldpstp):
>   Likewise.
> * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
>   Call on framework moved later.

OK, thanks.  The policy infrastructure is new to GCC 14 and so I think
the change qualifies for stage 4.

Richard

> Signed-off-by: Manos Anagnostakis 
> Co-Authored-By: Manolis Tsamis 
> ---
>  gcc/config/aarch64/aarch64-ldpstp.md | 22 +++---
>  gcc/config/aarch64/aarch64-protos.h  |  2 +-
>  gcc/config/aarch64/aarch64.cc| 18 +-
>  3 files changed, 21 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-ldpstp.md 
> b/gcc/config/aarch64/aarch64-ldpstp.md
> index b668fa8e2a6..b7c0bf05cd1 100644
> --- a/gcc/config/aarch64/aarch64-ldpstp.md
> +++ b/gcc/config/aarch64/aarch64-ldpstp.md
> @@ -23,7 +23,7 @@
>   (match_operand:GPI 1 "memory_operand" ""))
> (set (match_operand:GPI 2 "register_operand" "")
>   (match_operand:GPI 3 "memory_operand" ""))]
> -  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
> +  "aarch64_operands_ok_for_ldpstp (operands, true)"
>[(const_int 0)]
>  {
>aarch64_finish_ldpstp_peephole (operands, true);
> @@ -35,7 +35,7 @@
>   (match_operand:GPI 1 "aarch64_reg_or_zero" ""))
> (set (match_operand:GPI 2 "memory_operand" "")
>   (match_operand:GPI 3 "aarch64_reg_or_zero" ""))]
> -  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
> +  "aarch64_operands_ok_for_ldpstp (operands, false)"
>[(const_int 0)]
>  {
>aarch64_finish_ldpstp_peephole (operands, false);
> @@ -47,7 +47,7 @@
>   (match_operand:GPF 1 "memory_operand" ""))
> (set (match_operand:GPF 2 "register_operand" "")
>   (match_operand:GPF 3 "memory_operand" ""))]
> -  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
> +  "aarch64_operands_ok_for_ldpstp (operands, true)"
>[(const_int 0)]
>  {
>aarch64_finish_ldpstp_peephole (operands, true);
> @@ -59,7 +59,7 @@
>   (match_operand:GPF 1 "aarch64_reg_or_fp_zero" ""))
> (set (match_operand:GPF 2 "memory_operand" "")
>   (match_operand:GPF 3 "aarch64_reg_or_fp_zero" ""))]
> -  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
> +  "aarch64_operands_ok_for_ldpstp (operands, false)"
>[(const_int 0)]
>  {
>aarch64_finish_ldpstp_peephole (operands, false);
> @@ -71,7 +71,7 @@
>   (match_operand:DREG 1 "memory_operand" ""))
> (set (match_operand:DREG2 2 "register_operand" "")
>   (match_operand:DREG2 3 "memory_operand" ""))]
> -  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
> +  "aarch64_operands_ok_for_ldpstp (operands, true)"
>[(const_int 0)]
>  {
>aarch64_finish_ldpstp_peephole (operands, true);
> @@ -83,7 +83,7 @@
>   (match_operand:DREG 1 "register_operand" ""))
> (set (match_operand:DREG2 2 "memory_operand" "")
>   (match_operand:DREG2 3 "register_operand" ""))]
> -  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
> +  "aarch64_operands_ok_for_ldpstp (operands, false)"
>[(const_int 0)]
>  {
>aarch64_finish_ldpstp_peephole (operands, false);
> @@ -96,7 +96,7 @@
> (set (match_operand:VQ2 2 "register_operand" "")
>   (match_operand:VQ2 3 "memory_operand" ""))]
>"TARGET_FLOAT
> -   && aarch64_operands_ok_for_ldpstp (operands, true, mode)
> +   && aarch64_operands_ok_for_ldpstp (operands, true)
> && (aarch64_tune_params.extra_tuning_flags
>   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
>[(const_int 0)]
> @@ -111,7 +111,7 @@
> (set (match_operand:VQ2 2 "memory_operand" "")
>   (match_operand:VQ2 3 "register_operand" ""))]
>"TARGET_FLOAT
> -   && aarch64_operands_ok_for_ldpstp (operands, false, mode)
> +   && aarch64_operands_ok_for_ldpstp (operands, false)
> && (aarch64_tune_params.extra_tuning_flags
>   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
>[(const_int 0)]
> @@ -128,7 +128,7 @@
>   (sign_extend:DI (match_operand:SI 1 "memory_operand" "")))
> (set (match_operand:DI 2 "register_operand" "")
>   (sign_extend:DI (match_operand:SI 3 "memory_operand" "")))]
> -  "aarch64_operands_ok_for_ldpstp (operands, true, SImode)"
> +  "aarch64_operands_ok_for_ldpstp (operands, true)"
>[(const_int 0)]
>  {
>aarch64_finish_ldpstp_peephole (operands, true, SIGN_EXTEND);
> @@ -140,7 +140,7 @@
>   (zero_extend:DI (match_operand:SI 1 "memory_operand" "")))
> (set (match_operand:DI 2 "register_operand" "")
>   (zero_extend:DI (match_operand:S

Re: [PATCH v1 2/4] C++: Support clang compatible [[musttail]]

2024-01-24 Thread Richard Sandiford
Thanks for doing this.  I'm not qualified to review the patch properly,
but was just curious...

Andi Kleen  writes:
> This patch implements a clang compatible [[musttail]] attribute for
> returns.
>
> musttail is useful as an alternative to computed goto for interpreters.
> With computed goto the interpreter function usually ends up very big
> which causes problems with register allocation and other per function
> optimizations not scaling. With musttail the interpreter can be instead
> written as a sequence of smaller functions that call each other. To
> avoid unbounded stack growth this requires forcing a sibling call, which
> this attribute does. It guarantees an error if the call cannot be tail
> called which allows the programmer to fix it instead of risking a stack
> overflow. Unlike computed goto it is also type-safe.
>
> It turns out that David Malcolm had already implemented middle/backend
> support for a musttail attribute back in 2016, but it wasn't exposed
> to any frontend other than a special plugin.
>
> This patch adds a [[gnu::musttail]] attribute for C++ that can be added
> to return statements. The return statement must be a direct call
> (it does not follow dependencies), which is similar to what clang
> implements. It then uses the existing must tail infrastructure.
>
> For compatibility it also detects clang::musttail
>
> One problem is that tree-tailcall usually fails when optimization
> is disabled, which implies the attribute only really works with
> optimization on. But that seems to be a reasonable limitation.
>
> The attribute is only supported for C++, since the C-parser
> has no support for statement attributes for non empty statements.
> It could be added there with __attribute__ too but would need
> some minor grammar adjustments.
>
> Passes bootstrap and full test
> ---
>  gcc/c-family/c-attribs.cc | 25 +
>  gcc/cp/cp-tree.h  |  4 ++--
>  gcc/cp/parser.cc  | 28 +++-
>  gcc/cp/semantics.cc   |  6 +++---
>  gcc/cp/typeck.cc  | 20 ++--
>  5 files changed, 71 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index 40a0cf90295d..f31c62e76665 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -54,6 +54,7 @@ static tree handle_nocommon_attribute (tree *, tree, tree, 
> int, bool *);
>  static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_hot_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_cold_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_musttail_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
> int, bool *);
> @@ -499,6 +500,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
>{ "hot", 0, 0, false,  false, false, false,
> handle_hot_attribute,
> attr_cold_hot_exclusions },
> +  { "musttail",0, 0, false,  false, false, false,
> +   handle_musttail_attribute, NULL },
>{ "no_address_safety_analysis",
> 0, 0, true, false, false, false,
> handle_no_address_safety_analysis_attribute,
> @@ -1290,6 +1293,28 @@ handle_hot_attribute (tree *node, tree name, tree 
> ARG_UNUSED (args),
>return NULL_TREE;
>  }
>  
> +/* Handle a "musttail" and attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_musttail_attribute (tree *node, tree name, tree ARG_UNUSED (args),
> +int ARG_UNUSED (flags), bool *no_add_attrs)
> +{
> +  if (TREE_CODE (*node) == FUNCTION_DECL
> +  || TREE_CODE (*node) == LABEL_DECL)
> +{
> +  /* Attribute musttail processing is done later with lookup_attribute.  
> */
> +}
> +  else
> +{
> +  warning (OPT_Wattributes, "%qE attribute ignored", name);
> +  *no_add_attrs = true;
> +}
> +
> +  return NULL_TREE;
> +}
> +
> +

...are the three hunks above needed?  The reason for asking is that,
if they were needed, I'd have expected that we'd also need a table
entry for clang::musttail (which is possible to add).  But trying it
locally, the patch seemed to work without this.

Also, including the table entry and accepting FUNCTION_DECL means that:

[[gnu::musttail]] void f();
[[gnu::musttail]] void g() { return f(); }

is silently accepted but seems to have no effect.

Thanks,
Richard


>  /* Handle a "cold" and attribute; arguments as in
> struct attribute_spec.handler.  */
>  
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 60e6dafc5494..bed52e860a00 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7763,7 +7763,7 @@ extern void f

Re: [PATCH 2/2] libstdc++: Implement P2165R4 changes to std::pair/tuple/etc

2024-01-24 Thread Jonathan Wakely
On Tue, 23 Jan 2024 at 23:54, Patrick Palka wrote:
> diff --git a/libstdc++-v3/include/bits/stl_pair.h 
> b/libstdc++-v3/include/bits/stl_pair.h
> index b81b479ad43..a9b20fbe7ca 100644
> --- a/libstdc++-v3/include/bits/stl_pair.h
> +++ b/libstdc++-v3/include/bits/stl_pair.h
> @@ -85,12 +85,70 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>/// @cond undocumented
>
>// Forward declarations.
> +  template
> +struct pair;

We have a compiler bug where a forward declaration without template
parameter names causes bad diagnostics later. The compiler seems to
try to use the parameter names from the first decl it sees, so we end
up with things like  even when there's a name
available at the site of the actual error. So I think we should name
these _T1 and _T2 here.

> +
>template
>  class tuple;
>
> +  // Declarations of std::array and its std::get overloads, so that
> +  // std::tuple_cat can use them if  is included before .
> +  // We also declare the other std::get overloads here so that they're
> +  // visible to the P2165R4 tuple-like constructors of pair and tuple.
> +  template
> +struct array;
> +
>template
>  struct _Index_tuple;
>
> +  template
> +constexpr typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&
> +get(pair<_Tp1, _Tp2>& __in) noexcept;
> +
> +  template
> +constexpr typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&&
> +get(pair<_Tp1, _Tp2>&& __in) noexcept;
> +
> +  template
> +constexpr const typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&
> +get(const pair<_Tp1, _Tp2>& __in) noexcept;
> +
> +  template
> +constexpr const typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&&
> +get(const pair<_Tp1, _Tp2>&& __in) noexcept;
> +
> +  template
> +constexpr __tuple_element_t<__i, tuple<_Elements...>>&
> +get(tuple<_Elements...>& __t) noexcept;
> +
> +  template
> +constexpr const __tuple_element_t<__i, tuple<_Elements...>>&
> +get(const tuple<_Elements...>& __t) noexcept;
> +
> +  template
> +constexpr __tuple_element_t<__i, tuple<_Elements...>>&&
> +get(tuple<_Elements...>&& __t) noexcept;
> +
> +  template
> +constexpr const __tuple_element_t<__i, tuple<_Elements...>>&&
> +get(const tuple<_Elements...>&& __t) noexcept;
> +
> +  template
> +constexpr _Tp&
> +get(array<_Tp, _Nm>&) noexcept;
> +
> +  template
> +constexpr _Tp&&
> +get(array<_Tp, _Nm>&&) noexcept;
> +
> +  template
> +constexpr const _Tp&
> +get(const array<_Tp, _Nm>&) noexcept;
> +
> +  template
> +constexpr const _Tp&&
> +get(const array<_Tp, _Nm>&&) noexcept;
> +
>  #if ! __cpp_lib_concepts
>// Concept utility functions, reused in conditionally-explicit
>// constructors.
> @@ -159,6 +217,46 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #endif // lib concepts
>  #endif // C++11
>
> +#if __glibcxx_tuple_like // >= C++23
> +  template
> +inline constexpr bool __is_tuple_v = false;
> +
> +  template
> +inline constexpr bool __is_tuple_v> = true;
> +
> +  // TODO: Reuse __is_tuple_like from ?
> +  template
> +inline constexpr bool __is_tuple_like_v = false;
> +
> +  template
> +inline constexpr bool __is_tuple_like_v> = true;
> +
> +  template
> +inline constexpr bool __is_tuple_like_v> = true;
> +
> +  template
> +inline constexpr bool __is_tuple_like_v> = true;
> +
> +  // __is_tuple_like_v is defined in .
> +
> +  template
> +concept __tuple_like = __is_tuple_like_v>;
> +
> +  template
> +concept __pair_like = __tuple_like<_Tp> && 
> tuple_size_v> == 2;
> +
> +  template
> +concept __eligible_tuple_like
> +  = __detail::__different_from<_Tp, _Tuple> && __tuple_like<_Tp>
> +   && (tuple_size_v> == tuple_size_v<_Tuple>)
> +   && !ranges::__detail::__is_subrange>;
> +
> +  template
> +concept __eligible_pair_like
> +  = __detail::__different_from<_Tp, _Pair> && __pair_like<_Tp>
> +   && !ranges::__detail::__is_subrange>;
> +#endif // C++23
> +
>template class __pair_base
>{
>  #if __cplusplus >= 201103L && ! __cpp_lib_concepts
> @@ -295,6 +393,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   return false;
>  #endif
> }
> +
> +#if __glibcxx_tuple_like // >= C++23
> +  template
> +   static constexpr bool
> +   _S_constructible_from_pair_like()
> +   {
> + return 
> _S_constructible(std::declval<_UPair>())),
> + 
> decltype(std::get<1>(std::declval<_UPair>()))>();
> +   }
> +
> +  template
> +   static constexpr bool
> +   _S_convertible_from_pair_like()
> +   {
> + return _S_convertible(std::declval<_UPair>())),
> +   
> decltype(std::get<1>(std::declval<_UPair>()))>();
> +   }
> +#endif // C++23
>/// @endcond
>
>  public:
> @@ -393,6 +509,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> pair(const pair<_U1, _U2>&&) = delete;
>  #endif // C++23
>
> +#if __glibcxx_tuple_like // >= C++23
> +  templa

Re: [PATCH v2] RISC-V: Add split pattern to generate SFB instructions. [PR113095]

2024-01-24 Thread Monk Chiang
Thank you for your help. I will update the test case.
I test on the Coremark and have 5% improvement on the SiFive CPU.

On Tue, Jan 23, 2024 at 12:24 PM Jeff Law  wrote:

>
>
> On 1/21/24 23:12, Monk Chiang wrote:
> > Since the match.pd transforms (zero_one == 0) ? y : z  y,
> > into ((typeof(y))zero_one * z)  y. Add splitters to recongize
> > this expression to generate SFB instructions.
> >
> > gcc/ChangeLog:
> >   PR target/113095
> >   * config/riscv/sfb.md: New splitters to rewrite single bit
> >   sign extension as the condition to SFB instructions.
> >
> > gcc/testsuite/ChangeLog:
> >  * gcc.target/riscv/sfb.c: New test.
> >   * gcc.target/riscv/pr113095.c: New test.
> So the 113095 test is going to fail to link on rv64 causing a testsuite
> failure.  I would suggest it have these dg-options lines instead of the
> one you provided:
>
> /* { dg-options "-O2 -march=rv32gc -mabi=ilp32d -mtune=sifive-7-series"
> { target { rv32 } } } */
> /* { dg-options "-O2 -march=rv64gc -mabi=lp64d -mtune=sifive-7-series" {
> target { rv64 } } } */
>
>
> A similar change is not strictly needed for the new sfb.c test since it
> only does a compile (but not a link) test.
>
> You still didn't indicating what testing was done for this patch.
> Standard practice is to build the compiler and run the testsuite with
> and without your change and verify there are no regressions.  Ideally
> new tests should pass as well.
>
> I made the change above locally to pr113095.c to fix those failures on
> rv64.   So this is OK with the adjustment to the dg-options line in the
> new pr113095 test.
>
> Jeff
>
>


Re: [PATCH 2/2] libstdc++: Implement P2165R4 changes to std::pair/tuple/etc

2024-01-24 Thread Jonathan Wakely
On Wed, 24 Jan 2024 at 12:01, Jonathan Wakely wrote:
>
> On Tue, 23 Jan 2024 at 23:54, Patrick Palka wrote:
> > diff --git a/libstdc++-v3/include/bits/stl_pair.h 
> > b/libstdc++-v3/include/bits/stl_pair.h
> > index b81b479ad43..a9b20fbe7ca 100644
> > --- a/libstdc++-v3/include/bits/stl_pair.h
> > +++ b/libstdc++-v3/include/bits/stl_pair.h
> > @@ -85,12 +85,70 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >/// @cond undocumented
> >
> >// Forward declarations.
> > +  template
> > +struct pair;
>
> We have a compiler bug where a forward declaration without template
> parameter names causes bad diagnostics later. The compiler seems to
> try to use the parameter names from the first decl it sees, so we end
> up with things like  even when there's a name
> available at the site of the actual error. So I think we should name
> these _T1 and _T2 here.

The bug I'm thinking of is https://gcc.gnu.org/PR54948



Re: [PATCH] x86: Update PR 35513 tests

2024-01-24 Thread Thomas Schwinge
Hi!

On 2022-02-10T05:55:15-0800, "H.J. Lu via Gcc-patches" 
 wrote:
> 1. Require linker with GNU_PROPERTY_1_NEEDED support for PR 35513
> run-time tests.

Moving my x86_64-pc-linux-gnu testing from an old to a newish system
(Ubuntu 20.04), I notice:

[-PASS: g++.target/i386/pr35513-1.C  -std=gnu++98 (test for excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} g++.target/i386/pr35513-1.C  
-std=gnu++98[-execution test-]

Etc.

[-PASS: g++.target/i386/pr35513-2.C  -std=gnu++98 (test for excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} g++.target/i386/pr35513-2.C  
-std=gnu++98[-execution test-]

Etc.

..., due to the 'property_1_needed' effective-target check now
diagnosing:

/usr/bin/ld: warning: /tmp/ccFNkvfI.o: unsupported GNU_PROPERTY_TYPE (5) 
type: 0xb0008000

..., with:

$ /usr/bin/ld --version | head -n 1
GNU ld (GNU Binutils for Ubuntu) 2.34

I'm not familiar with these properties, but I wonder if really some
support has been removed (so that this indeed is now UNSUPPORTED), or if
something's wrong somewhere (so that this should still PASS).

For reference:

> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp

> +proc check_effective_target_property_1_needed { } {
> +  return [check_no_compiler_messages_nocache property_1_needed executable {
> +/* Assembly code */
> +#ifdef __LP64__
> +# define __PROPERTY_ALIGN 3
> +#else
> +# define __PROPERTY_ALIGN 2
> +#endif
> +
> + .section ".note.gnu.property", "a"
> + .p2align __PROPERTY_ALIGN
> + .long 1f - 0f   /* name length.  */
> + .long 4f - 1f   /* data length.  */
> + /* NT_GNU_PROPERTY_TYPE_0.   */
> + .long 5 /* note type.  */
> +0:
> + .asciz "GNU"/* vendor name.  */
> +1:
> + .p2align __PROPERTY_ALIGN
> + /* GNU_PROPERTY_1_NEEDED.  */
> + .long 0xb0008000/* pr_type.  */
> + .long 3f - 2f   /* pr_datasz.  */
> +2:
> + /* GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS.  */
> + .long 1
> +3:
> + .p2align __PROPERTY_ALIGN
> +4:
> + .text
> + .globl main
> +main:
> + .byte 0
> +  } ""]
> +}


Grüße
 Thomas


Re: [PATCH v1 2/4] C++: Support clang compatible [[musttail]]

2024-01-24 Thread Andi Kleen
> ...are the three hunks above needed?  The reason for asking is that,
> if they were needed, I'd have expected that we'd also need a table
> entry for clang::musttail (which is possible to add).  But trying it
> locally, the patch seemed to work without this.

Interesting thanks. I think I copied this from likely/unlikely,

But I suppose it would be needed for the later C implementation,
unless we want this to only work with the C23 attribute syntax.
But I'll drop it for now. Perhaps it should be dropped for likely/unlikely too.


> Also, including the table entry and accepting FUNCTION_DECL means that:
> 
> [[gnu::musttail]] void f();
> [[gnu::musttail]] void g() { return f(); }
> 
> is silently accepted but seems to have no effect.

Yes that is indeed not intended.


-Andi


Re: [PATCH v1 2/4] C++: Support clang compatible [[musttail]]

2024-01-24 Thread Andi Kleen
On Wed, Jan 24, 2024 at 11:13:44AM +, Sam James wrote:
> 
> Andi Kleen  writes:
> 
> > This patch implements a clang compatible [[musttail]] attribute for
> > returns.
> 
> This is PR83324. See also PR52067 and PR110899.

Thanks for the references. I'll add it there.
> 
> > The attribute is only supported for C++, since the C-parser
> > has no support for statement attributes for non empty statements.
> > It could be added there with __attribute__ too but would need
> > some minor grammar adjustments.
> 
> ... although it'll need C there.

Okay I will look into it (although I suppose that file could be also
built as C++)

-Andi



[PATCH] amdgcn: additional gfx1100 support

2024-01-24 Thread Andrew Stubbs
This is enough to get gfx1100 working for most purposes, on top of the
patch that Tobias committed a week or so ago; there are still some test
failures to investigate, and probably some tuning to do.

It might also get gfx1030 working too. @Richi, could you test it,
please?

I can't test the other multilibs right now. @PA, can you test it please?

I can self-approve the patch, but I'll hold off the commit until the
test results come back.

Andrew

gcc/ChangeLog:

* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
* config/gcn/gcn-valu.md (all_convert): New iterator.
(2): New
define_expand, and rename the old one to ...
(*_sdwa): ... this.
(extend2): Likewise, to ...
(extend_sdwa): .. this.
(*_shift): New.
* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
* config/gcn/gcn.md (mulhisi3): Disable on RDNA3.
(mulqihi3_scalar): Likewise.

libgcc/ChangeLog:

* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.

libgomp/ChangeLog:

* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
(omp_get_wtime): Add RDNA3-compatible variant.
* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.

Signed-off-by:  Andrew Stubbs 
---
 gcc/config/gcn/gcn-opts.h |  2 +-
 gcc/config/gcn/gcn-valu.md| 41 ---
 gcc/config/gcn/gcn.cc | 31 ---
 gcc/config/gcn/gcn.md |  4 +--
 libgcc/config/gcn/amdgcn_veclib.h |  2 +-
 libgomp/config/gcn/time.c | 10 
 libgomp/plugin/plugin-gcn.c   |  6 +++--
 7 files changed, 77 insertions(+), 19 deletions(-)

diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index 79fbda3ab25..6be2c9204fa 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -62,7 +62,7 @@ extern enum gcn_isa {
 
 
 #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
-#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
+#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3)
 
 #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
 
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 3d5b6271ee6..cd027f8b369 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3555,30 +3555,63 @@
 ;; }}}
 ;; {{{ Int/int conversions
 
+(define_code_iterator all_convert [truncate zero_extend sign_extend])
 (define_code_iterator zero_convert [truncate zero_extend])
 (define_code_attr convop [
(sign_extend "extend")
(zero_extend "zero_extend")
(truncate "trunc")])
 
-(define_insn "2"
+(define_expand "2"
+  [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
+(all_convert:V_INT_1REG
+ (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
+  "")
+
+(define_insn "*_sdwa"
   [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
 (zero_convert:V_INT_1REG
  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
-  ""
+  "!TARGET_RDNA3"
   "v_mov_b32_sdwa\t%0, %1 dst_sel: dst_unused:UNUSED_PAD 
src0_sel:"
   [(set_attr "type" "vop_sdwa")
(set_attr "length" "8")])
 
-(define_insn "extend2"
+(define_insn "extend_sdwa"
   [(set (match_operand:V_INT_1REG 0 "register_operand" "=v")
 (sign_extend:V_INT_1REG
  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
-  ""
+  "!TARGET_RDNA3"
   "v_mov_b32_sdwa\t%0, sext(%1) src0_sel:"
   [(set_attr "type" "vop_sdwa")
(set_attr "length" "8")])
 
+(define_insn "*_shift"
+  [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
+(all_convert:V_INT_1REG
+ (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
+  "TARGET_RDNA3"
+  {
+enum {extend, zero_extend, trunc};
+rtx shiftwidth = (mode == QImode
+ || mode == QImode
+ ? GEN_INT (24)
+ : mode == HImode
+   || mode == HImode
+ ? GEN_INT (16)
+ : NULL);
+operands[2] = shiftwidth;
+
+if (!shiftwidth)
+  return "v_mov_b32 %0, %1";
+else if ( == extend ||  == trunc)
+  return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0";
+else
+  return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0";
+  }
+  [(set_attr "type" "mult")
+   (set_attr "length" "8")])
+
 ;; GCC can already do these for scalar types, but not for vector types.
 ;; Unfortunately you can't just do SUBREG on a vector to select the low part,
 ;; so there must be a few tricks here.
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index e668ce7c69e..e80de2ce056 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -1597,8 +1597,8 @@ gcn_global_address_p (rtx addr)
   rtx offset = XEXP (addr, 1);
   int offsetbits = (TARGET_RDNA2_PLUS ? 11 : 

[PATCH v3] RISC-V: Add split pattern to generate SFB instructions. [PR113095]

2024-01-24 Thread Monk Chiang
Since the match.pd transforms (zero_one == 0) ? y : z  y,
into ((typeof(y))zero_one * z)  y. Add splitters to recongize
this expression to generate SFB instructions.

gcc/ChangeLog:
PR target/113095
* config/riscv/sfb.md: New splitters to rewrite single bit
sign extension as the condition to SFB instructions.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/sfb.c: New test.
* gcc.target/riscv/pr113095.c: New test.
---
 gcc/config/riscv/sfb.md   | 32 +++
 gcc/testsuite/gcc.target/riscv/pr113095.c | 21 +++
 gcc/testsuite/gcc.target/riscv/sfb.c  | 24 +
 3 files changed, 77 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr113095.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sfb.c

diff --git a/gcc/config/riscv/sfb.md b/gcc/config/riscv/sfb.md
index 52af4b17d46..5a510fe9f09 100644
--- a/gcc/config/riscv/sfb.md
+++ b/gcc/config/riscv/sfb.md
@@ -35,3 +35,35 @@
   [(set_attr "length" "8")
(set_attr "type" "sfb_alu")
(set_attr "mode" "")])
+
+;; Combine creates this form ((typeof(y))zero_one * z)  y
+;; for SiFive short forward branches.
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_SFB_ALU"
+  [(set (match_dup 4) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (if_then_else:X (ne (match_dup 4) (const_int 0))
+ (match_dup 3)
+ (const_int 0)))])
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_SFB_ALU && (UINTVAL (operands[2]) < 11)"
+  [(set (match_dup 4) (and:X (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (if_then_else:X (ne (match_dup 4) (const_int 0))
+ (match_dup 3)
+ (const_int 0)))]
+{
+  operands[2] = GEN_INT (1 << UINTVAL(operands[2]));
+})
diff --git a/gcc/testsuite/gcc.target/riscv/pr113095.c 
b/gcc/testsuite/gcc.target/riscv/pr113095.c
new file mode 100644
index 000..04321b58dc7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr113095.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -march=rv32gc -mabi=ilp32d -mtune=sifive-7-series" { 
target { rv32 } } } */
+/* { dg-options "-O2 -march=rv64gc -mabi=lp64d -mtune=sifive-7-series" {target 
{ rv64 } } } */
+
+extern void abort (void);
+extern void exit (int);
+
+unsigned short __attribute__ ((noinline, noclone))
+foo (unsigned short x) {
+  if (x == 1)
+x ^= 0x4002;
+
+  return x;
+}
+
+int main () {
+  if (foo(1) != 0x4003)
+abort ();
+
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/sfb.c 
b/gcc/testsuite/gcc.target/riscv/sfb.c
new file mode 100644
index 000..22f164051f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sfb.c
@@ -0,0 +1,24 @@
+//* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32gc -mabi=ilp32d -mtune=sifive-7-series" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+/* { dg-final { scan-assembler-times "bne" 4 } } */
+/* { dg-final { scan-assembler-times "movcc" 4 } } */
-- 
2.40.1



Re: [PATCH] arm: Fix missing bti instruction for virtual thunks

2024-01-24 Thread Richard Earnshaw (lists)
On 23/01/2024 15:53, Richard Ball wrote:
> Adds missing bti instruction at the beginning of a virtual
> thunk, when bti is enabled.
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.cc (arm_output_mi_thunk): Emit
>   insn for bti_c when bti is enabled.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.target/arm/bti_thunk.C: New test.


diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
e5a944486d7bd583627b0e22dfe8f95862e975bb..91eee8be7c1a59118fbf443557561fb3e0689d61
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -29257,6 +29257,8 @@ arm_output_mi_thunk (FILE *file, tree thunk, 
HOST_WIDE_INT delta,
   const char *fnname = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk));
 
   assemble_start_function (thunk, fnname);
+  if (aarch_bti_enabled ())
+emit_insn (aarch_gen_bti_c());

Missing space between ...bit_c and the parenthesis.

   if (TARGET_32BIT)
 arm32_output_mi_thunk (file, thunk, delta, vcall_offset, function);
   else

diff --git a/gcc/testsuite/g++.target/arm/bti_thunk.C 
b/gcc/testsuite/g++.target/arm/bti_thunk.C
new file mode 100644
index 
..5c4a8e5a8d74581eca2b877c000a5b34ddca0e9b
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/bti_thunk.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.1-m.main+pacbti -O1 -mbranch-protection=bti 
--save-temps" } */

You can't just add options like this; they may not work with other options 
passed by the testsuite framework.  Instead, you should a suitable entry to 
lib/target-supports.exp in the table starting "foreach { armfunc armflag 
armdefs } {" that tests whether the options can be safely added, and then use 
dg-require-effective-target and dg-add-options for your new set of options.

\ No newline at end of file

Please add one :)

R.


[PATCH] testsuite/vect: Add target checks to refined patterns [PR113558]

2024-01-24 Thread Robin Dapp
Hi,

on Solaris/SPARC several vector tests appeared to be regressing.  They
were never vectorized but the checks before r14-3612-ge40edf64995769
would match regardless if a loop was actually vectorized or not.
The refined checks only match a successful vectorization attempt
but are run unconditionally.  This patch adds target checks to them.

Bootstrapped (unnecessarily) and regtested on x86, aarch64 and
power10.  Regtested on riscv and (the previous version that 
missed vect-reduc-pattern-2a.c) on Solaris/SPARC by Rainer Orth.

Is this OK if Rainer's second run is successful?

Regards
 Robin

gcc/testsuite/ChangeLog:

PR testsuite/113558

* gcc.dg/vect/no-scevccp-outer-7.c: Add target check.
* gcc.dg/vect/vect-outer-4c-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s16a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u16b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8b.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1b-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1c-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2b-big-array.c: Ditto.
* gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Ditto.
---
 gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-outer-4c-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c  | 4 ++--
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16b.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8a.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8b.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1a.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1b-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1c-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2a.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2b-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c| 4 ++--
 14 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
index 058d1d2db2d..87048422013 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
@@ -77,4 +77,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
target vect_widen_mult_hi_to_si } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" { target 
vect_widen_mult_hi_to_si } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4c-big-array.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-4c-big-array.c
index 5c3eea95476..4aaf2932006 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4c-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4c-big-array.c
@@ -24,4 +24,4 @@ foo (){
 }
 
 /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { target 
{ vect_short_mult && { ! vect_no_align } } } } } */
-/* { dg-final { scan-tree-dump-times "zero step in outer 
loop.(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "zero step in outer 
loop.(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" { target { 
vect_short_mult && { ! vect_no_align } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
index d826828e3d6..86fdcf37df8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
@@ -51,7 +51,7 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" { target { 
vect_sdot_hi || vect_widen_mult_hi_to_si } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_sdot_hi } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_widen_mult_hi_to_si } } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
index 4e1e0b234f4..99c53d0ff02 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c

[PATCH] tree-optimization/113576 - non-empty latch and may_be_zero vectorization

2024-01-24 Thread Richard Biener
We can't support niters with may_be_zero when we end up with a
non-empty latch due to early exit peeling.  At least not in
the simplistic way the vectorizer handles this now.  Disallow
it again for exits that are not the last one.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/113576
* tree-vect-loop.cc (vec_init_loop_exit_info): Only allow
exits with may_be_zero niters when its the last one.

* gcc.dg/vect/pr113576.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr113576.c | 157 +++
 gcc/tree-vect-loop.cc|   9 +-
 2 files changed, 164 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr113576.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr113576.c 
b/gcc/testsuite/gcc.dg/vect/pr113576.c
new file mode 100644
index 000..da5ddb09e33
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113576.c
@@ -0,0 +1,157 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-march=skylake-avx512" } */
+
+#include "tree-vect.h"
+
+#include
+#include
+#include
+#include
+
+#define SBITMAP_ELT_BITS ((unsigned) 64)
+#define SBITMAP_ELT_TYPE unsigned long long
+#define SBITMAP_SIZE_BYTES(BITMAP) ((BITMAP)->size * sizeof (SBITMAP_ELT_TYPE))
+#define do_popcount(x) __builtin_popcountll(x)
+
+typedef struct simple_bitmap_def
+{
+  unsigned char *popcount;  /* Population count.  */
+  unsigned int n_bits; /* Number of bits.  */
+  unsigned int size;   /* Size in elements.  */
+  SBITMAP_ELT_TYPE elms[1];/* The elements.  */
+} *sbitmap;
+typedef const struct simple_bitmap_def *const_sbitmap;
+
+/* The iterator for sbitmap.  */
+typedef struct {
+  /* The pointer to the first word of the bitmap.  */
+  const SBITMAP_ELT_TYPE *ptr;
+
+  /* The size of the bitmap.  */
+  unsigned int size;
+
+  /* The current word index.  */
+  unsigned int word_num;
+
+  /* The current bit index (not modulo SBITMAP_ELT_BITS).  */
+  unsigned int bit_num;
+
+  /* The words currently visited.  */
+  SBITMAP_ELT_TYPE word;
+} sbitmap_iterator;
+
+static inline void
+sbitmap_iter_init (sbitmap_iterator *i, const_sbitmap bmp, unsigned int min)
+{
+  i->word_num = min / (unsigned int) SBITMAP_ELT_BITS;
+  i->bit_num = min;
+  i->size = bmp->size;
+  i->ptr = bmp->elms;
+
+  if (i->word_num >= i->size)
+i->word = 0;
+  else
+i->word = (i->ptr[i->word_num]
+  >> (i->bit_num % (unsigned int) SBITMAP_ELT_BITS));
+}
+
+/* Return true if we have more bits to visit, in which case *N is set
+   to the index of the bit to be visited.  Otherwise, return
+   false.  */
+
+static inline bool
+sbitmap_iter_cond (sbitmap_iterator *i, unsigned int *n)
+{
+  /* Skip words that are zeros.  */
+  for (; i->word == 0; i->word = i->ptr[i->word_num])
+{
+  i->word_num++;
+
+  /* If we have reached the end, break.  */
+  if (i->word_num >= i->size)
+   return false;
+
+  i->bit_num = i->word_num * SBITMAP_ELT_BITS;
+}
+
+  /* Skip bits that are zero.  */
+  for (; (i->word & 1) == 0; i->word >>= 1)
+i->bit_num++;
+
+  *n = i->bit_num;
+
+  return true;
+}
+
+/* Advance to the next bit.  */
+
+static inline void
+sbitmap_iter_next (sbitmap_iterator *i)
+{
+  i->word >>= 1;
+  i->bit_num++;
+}
+
+#define SBITMAP_SET_SIZE(N) (((N) + SBITMAP_ELT_BITS - 1) / SBITMAP_ELT_BITS)
+/* Allocate a simple bitmap of N_ELMS bits.  */
+
+sbitmap
+sbitmap_alloc (unsigned int n_elms)
+{
+  unsigned int bytes, size, amt;
+  sbitmap bmap;
+
+  size = SBITMAP_SET_SIZE (n_elms);
+  bytes = size * sizeof (SBITMAP_ELT_TYPE);
+  amt = (sizeof (struct simple_bitmap_def)
++ bytes - sizeof (SBITMAP_ELT_TYPE));
+  bmap = (sbitmap) malloc (amt);
+  bmap->n_bits = n_elms;
+  bmap->size = size;
+  bmap->popcount = NULL;
+  return bmap;
+}
+
+#define sbitmap_free(MAP)  (free((MAP)->popcount), free((MAP)))
+/* Loop over all elements of SBITMAP, starting with MIN.  In each
+   iteration, N is set to the index of the bit being visited.  ITER is
+   an instance of sbitmap_iterator used to iterate the bitmap.  */
+
+#define EXECUTE_IF_SET_IN_SBITMAP(SBITMAP, MIN, N, ITER)   \
+  for (sbitmap_iter_init (&(ITER), (SBITMAP), (MIN));  \
+   sbitmap_iter_cond (&(ITER), &(N));  \
+   sbitmap_iter_next (&(ITER)))
+
+int
+__attribute__((noinline))
+sbitmap_first_set_bit (const_sbitmap bmap)
+{
+  unsigned int n = 0;
+  sbitmap_iterator sbi;
+
+  EXECUTE_IF_SET_IN_SBITMAP (bmap, 0, n, sbi)
+return n;
+  return -1;
+}
+
+void
+sbitmap_zero (sbitmap bmap)
+{
+  memset (bmap->elms, 0, SBITMAP_SIZE_BYTES (bmap));
+  if (bmap->popcount)
+memset (bmap->popcount, 0, bmap->size * sizeof (unsigned char));
+}
+
+int main ()
+{
+  check_vect ();
+
+  sbitmap tmp = sbitmap_alloc(1856);
+  sbitmap_zero (tmp);
+  int res = sbitmap_first_set_bit (tmp);
+  if (res != -1)
+abort ();
+  sbitmap_free (tmp);
+  return 0;
+}
diff --git a/gc

Re: [PATCH] libgccjit: Add support for creating temporary variables

2024-01-24 Thread David Malcolm
On Fri, 2024-01-19 at 16:54 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds a new way to create local variable that won't
> generate
> debug info: it is to be used for compiler-generated variables.
> Thanks for the review.

Thanks for the patch.

> diff --git a/gcc/jit/docs/topics/compatibility.rst 
> b/gcc/jit/docs/topics/compatibility.rst
> index cbf5b414d8c..5d62e264a00 100644
> --- a/gcc/jit/docs/topics/compatibility.rst
> +++ b/gcc/jit/docs/topics/compatibility.rst
> @@ -390,3 +390,12 @@ on functions and variables:
>* :func:`gcc_jit_function_add_string_attribute`
>* :func:`gcc_jit_function_add_integer_array_attribute`
>* :func:`gcc_jit_lvalue_add_string_attribute`
> +
> +.. _LIBGCCJIT_ABI_27:
> +
> +``LIBGCCJIT_ABI_27``
> +
> +``LIBGCCJIT_ABI_27`` covers the addition of a functions to create a new

"functions" -> "function"

> +temporary variable:
> +
> +  * :func:`gcc_jit_function_new_temp`
> diff --git a/gcc/jit/docs/topics/functions.rst 
> b/gcc/jit/docs/topics/functions.rst
> index 804605ea939..230caf42466 100644
> --- a/gcc/jit/docs/topics/functions.rst
> +++ b/gcc/jit/docs/topics/functions.rst
> @@ -171,6 +171,26 @@ Functions
> underlying string, so it is valid to pass in a pointer to an on-stack
> buffer.
>  
> +.. function:: gcc_jit_lvalue *\
> +  gcc_jit_function_new_temp (gcc_jit_function *func,\
> + gcc_jit_location *loc,\
> + gcc_jit_type *type)
> +
> +   Create a new local variable within the function, of the given type.
> +   This function is similar to :func:`gcc_jit_function_new_local`, but
> +   it is to be used for compiler-generated variables (as opposed to
> +   user-defined variables in the language to be compiled) and these
> +   variables won't show up in the debug info.
> +
> +   The parameter ``type`` must be non-`void`.
> +
> +   This entrypoint was added in :ref:`LIBGCCJIT_ABI_26`; you can test
> +   for its presence using

The ABI number is inconsistent here (it's 27 above and in the .map
file), but obviously you can fix this when you eventually commit this
based on what the ABI number actually is.

[...snip...]

> diff --git a/gcc/jit/jit-playback.cc b/gcc/jit/jit-playback.cc
> index 84df6c100e6..cb6b2f66276 100644
> --- a/gcc/jit/jit-playback.cc
> +++ b/gcc/jit/jit-playback.cc
> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "toplev.h"
>  #include "tree-cfg.h"
>  #include "convert.h"
> +#include "gimple-expr.h"
>  #include "stor-layout.h"
>  #include "print-tree.h"
>  #include "gimplify.h"
> @@ -1950,13 +1951,27 @@ new_local (location *loc,
>  type *type,
>  const char *name,
>  const std::vector -std::string>> &attributes)
> +std::string>> &attributes,
> +bool is_temp)
>  {
>gcc_assert (type);
> -  gcc_assert (name);
> -  tree inner = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +  tree inner;
> +  if (is_temp)
> +  {
> +inner = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> + create_tmp_var_name ("JITTMP"),
> + type->as_tree ());
> +DECL_ARTIFICIAL (inner) = 1;
> +DECL_IGNORED_P (inner) = 1;
> +DECL_NAMELESS (inner) = 1;

We could assert that "name" is null in the is_temp branch.

An alternative approach might be to drop "is_temp", and instead make
"name" being null signify that it's a temporary, if you prefer that
approach.  Would client code ever want to specify a name prefix for a
temporary?


> +  }
> +  else
> +  {
> +gcc_assert (name);
> +inner = build_decl (UNKNOWN_LOCATION, VAR_DECL,
>  get_identifier (name),
>  type->as_tree ());
> +  }
>DECL_CONTEXT (inner) = this->m_inner_fndecl;
>  
>/* Prepend to BIND_EXPR_VARS: */

[...snip...]

Thanks again for the patch.  Looks good to me as-is (apart from the
grammar and ABI number nits), but what do you think of eliminating
"is_temp" in favor of the "name" ptr being null?  I think it's your
call.

Dave



Re: [PATCH] libgccjit: Add gcc_jit_global_set_readonly

2024-01-24 Thread David Malcolm
On Fri, 2024-01-19 at 16:57 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds a new API gcc_jit_global_set_readonly: it's
> equivalent
> to having a const global variable, but it is useful in the case of
> complex compilers where it is not convenient to use const.
> Thanks for the review.

Hi Antoni; thanks for the patch.

Can you give an example of where/why this might be used?
Presumably this is motivated by a use case you had inside the rustc
backend?

Thanks
Dave



[pushed] analyzer kernel plugin: implement __check_object_size [PR112927]

2024-01-24 Thread David Malcolm
PR analyzer/112927 reports a false positive from -Wanalyzer-tainted-size
seen on the Linux kernel's drivers/char/ipmi/ipmi_devintf.c with the
analyzer kernel plugin.

The issue is that in:

(A):
  if (msg->data_len > 272) {
return -90;
  }

(B):
  n = msg->data_len;
  __check_object_size(to, n);
  n = copy_from_user(to, from, n);

the analyzer is treating __check_object_size as having arbitrary side
effects, and, in particular could modify msg->data_len.  Hence the
sanitization that occurs at (A) above is treated as being for a
different value than the size obtained at (B), hence the bogus warning
at the call to copy_from_user.

Fixed by extending the analyzer kernel plugin to "teach" it that
__check_object_size has no side effects.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-8390-gb6e537571c21d8.

gcc/testsuite/ChangeLog:
PR analyzer/112927
* gcc.dg/plugin/analyzer_kernel_plugin.c
(class known_function___check_object_size): New.
(kernel_analyzer_init_cb): Register it.
* gcc.dg/plugin/plugin.exp: Add taint-pr112927.c.
* gcc.dg/plugin/taint-pr112927.c: New test.

Signed-off-by: David Malcolm 
---
 .../gcc.dg/plugin/analyzer_kernel_plugin.c| 18 +++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|  3 +-
 gcc/testsuite/gcc.dg/plugin/taint-pr112927.c  | 49 +++
 3 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/taint-pr112927.c

diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_kernel_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/analyzer_kernel_plugin.c
index 02dba7a3234..5a32f8cc620 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_kernel_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_kernel_plugin.c
@@ -209,6 +209,22 @@ public:
   }
 };
 
+/* Implementation of "__check_object_size".  */
+  
+class known_function___check_object_size : public known_function
+{
+ public:
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+return cd.num_args () == 2;
+  }
+
+  void impl_call_pre (const call_details &) const final override
+  {
+/* No-op.  */
+  }
+};
+
 /* Callback handler for the PLUGIN_ANALYZER_INIT event.  */
 
 static void
@@ -224,6 +240,8 @@ kernel_analyzer_init_cb (void *gcc_data, void 
*/*user_data*/)
  make_unique ());
   iface->register_known_function ("copy_to_user",
  make_unique ());
+  iface->register_known_function ("__check_object_size",
+ 
make_unique ());
 }
 
 } // namespace ana
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp 
b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index b3782f9c575..a5a72daac1a 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -169,7 +169,8 @@ set plugin_test_list [list \
  taint-pr112850.c \
  taint-pr112850-precise.c \
  taint-pr112850-too-complex.c \
- taint-pr112850-unsanitized.c } \
+ taint-pr112850-unsanitized.c \
+ taint-pr112927.c } \
 { analyzer_cpython_plugin.c \
  cpython-plugin-test-no-Python-h.c \
  cpython-plugin-test-PyList_Append.c \
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-pr112927.c 
b/gcc/testsuite/gcc.dg/plugin/taint-pr112927.c
new file mode 100644
index 000..9c3f7ab6708
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/taint-pr112927.c
@@ -0,0 +1,49 @@
+/* Reduced from false positive in Linux kernel
+   in drivers/char/ipmi/ipmi_devintf.c.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fanalyzer -O2 -Wno-attributes" } */
+/* { dg-require-effective-target analyzer } */
+
+typedef __SIZE_TYPE__ size_t;
+extern void
+__check_object_size(const void* ptr, unsigned long n);
+
+extern unsigned long
+copy_from_user(void*, const void*, unsigned long);
+
+__attribute__((__always_inline__)) unsigned long
+call_copy_from_user(void* to, const void* from, unsigned long n)
+{
+  __check_object_size(to, n);
+  n = copy_from_user(to, from, n); /* { dg-bogus "use of attacker-controlled 
value as size without upper-bounds checking" } */
+  return n;
+}
+struct ipmi_msg
+{
+  unsigned short data_len;
+  unsigned char* data;
+};
+
+static int
+handle_send_req(struct ipmi_msg* msg)
+{
+  char buf[273];
+  if (msg->data_len > 272) {
+return -90;
+  }
+  if (call_copy_from_user(buf, msg->data, msg->data_len)) {
+return -14;
+  }
+  return 0;
+}
+long
+ipmi_ioctl(void* arg)
+{
+  struct ipmi_msg msg;
+  if (call_copy_from_user(&msg, arg, sizeof(msg))) {
+return -14;
+  }
+
+  return handle_send_req(&msg);
+}
-- 
2.26.3



[pushed] analyzer: fix taint false +ve due to overzealous state purging [PR112977]

2024-01-24 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-8391-ge503f9aca91926.

gcc/analyzer/ChangeLog:
PR analyzer/112977
* engine.cc (impl_region_model_context::on_liveness_change): Pass
m_ext_state to sm_state_map::on_liveness_change.
* program-state.cc (sm_state_map::on_svalue_leak): Guard removal
of map entry based on can_purge_p.
(sm_state_map::on_liveness_change): Add ext_state param.  Add
workaround for bad interaction between state purging and
alt-inherited sm-state.
* program-state.h (sm_state_map::on_liveness_change): Add
ext_state param.
* sm-taint.cc
(taint_state_machine::has_alt_get_inherited_state_p): New.
(taint_state_machine::can_purge_p): Return false for "has_lb" and
"has_ub".
* sm.h (state_machine::has_alt_get_inherited_state_p): New vfunc.

gcc/testsuite/ChangeLog:
PR analyzer/112977
* gcc.dg/plugin/plugin.exp: Add taint-pr112977.c.
* gcc.dg/plugin/taint-pr112977.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc   |  2 +-
 gcc/analyzer/program-state.cc| 65 +++-
 gcc/analyzer/program-state.h |  1 +
 gcc/analyzer/sm-taint.cc |  9 +++
 gcc/analyzer/sm.h|  6 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp   |  3 +-
 gcc/testsuite/gcc.dg/plugin/taint-pr112977.c | 44 +
 7 files changed, 126 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/taint-pr112977.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index fde8412bc15..44ff20cf9af 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -179,7 +179,7 @@ on_liveness_change (const svalue_set &live_svalues,
const region_model *model)
 {
   for (sm_state_map *smap : m_new_state->m_checker_states)
-smap->on_liveness_change (live_svalues, model, this);
+smap->on_liveness_change (live_svalues, model, m_ext_state, this);
 }
 
 void
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index 888f2a9c40b..55dd6ca7166 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -560,9 +560,10 @@ sm_state_map::on_svalue_leak (const svalue *sval,
 {
   if (state_machine::state_t state = get_state (sval, ctxt->m_ext_state))
 {
-  if (!m_sm.can_purge_p (state))
+  if (m_sm.can_purge_p (state))
+   m_map.remove (sval);
+  else
ctxt->on_state_leak (m_sm, sval, state);
-  m_map.remove (sval);
 }
 }
 
@@ -572,6 +573,7 @@ sm_state_map::on_svalue_leak (const svalue *sval,
 void
 sm_state_map::on_liveness_change (const svalue_set &live_svalues,
  const region_model *model,
+ const extrinsic_state &ext_state,
  impl_region_model_context *ctxt)
 {
   svalue_set svals_to_unset;
@@ -605,9 +607,68 @@ sm_state_map::on_liveness_change (const svalue_set 
&live_svalues,
   ctxt->on_state_leak (m_sm, sval, e.m_state);
 }
 
+  sm_state_map old_sm_map = *this;
+
   for (svalue_set::iterator iter = svals_to_unset.begin ();
iter != svals_to_unset.end (); ++iter)
 m_map.remove (*iter);
+
+  /* For state machines like "taint" where states can be
+ alt-inherited from other svalues, ensure that state purging doesn't
+ make us lose sm-state.
+
+ Consider e.g.:
+
+ make_tainted(foo);
+ if (foo.field > 128)
+   return;
+ arr[foo.field].f1 = v1;
+
+ where the last line is:
+
+ (A): _t1 = foo.field;
+ (B): _t2 = _t1 * sizeof(arr[0]);
+ (C): [arr + _t2].f1 = val;
+
+ At (A), foo is 'tainted' and foo.field is 'has_ub'.
+ After (B), foo.field's value (in _t1) is no longer directly
+ within LIVE_SVALUES, so with state purging enabled, we would
+ erroneously purge the "has_ub" state from the svalue.
+
+ Given that _t2's value's state comes from _t1's value's state,
+ we need to preserve that information.
+
+ Hence for all svalues that have had their explicit sm-state unset,
+ having their sm-state being unset, determine if doing so has changed
+ their effective state, and if so, explicitly set their state.
+
+ For example, in the above, unsetting the "has_ub" for _t1's value means
+ that _t2's effective value changes from "has_ub" (from alt-inherited
+ from _t1's value) to "tainted" (inherited from "foo"'s value).
+
+ For such cases, preserve the effective state by explicitly setting the
+ new state.  In the above example, this means explicitly setting _t2's
+ value to the value ("has_ub") it was previously alt-inheriting from _t1's
+ value.  */
+  if (m_sm.has_alt_get_inherited_state_p ())
+{
+  auto_vec svalues_needing

Ping: Re: [PATCH] libgcc: fix SEH C++ rethrow semantics [PR113337]

2024-01-24 Thread Matteo Italia
Ping! That's a one-line fix, and you can find all the details in the 
bugzilla entry. Also, I can provide executables built with the affected 
toolchains, demonstrating the problem and the fix.


Thanks,
Matteo

Il 17/01/24 12:51, Matteo Italia ha scritto:

SEH _Unwind_Resume_or_Rethrow invokes abort directly if
_Unwind_RaiseException doesn't manage to find a handler for the rethrown
exception; this is incorrect, as in this case std::terminate should be
invoked, allowing an application-provided terminate handler to handle
the situation instead of straight crashing the application through
abort.

The bug can be demonstrated with this simple test case:
===
static void custom_terminate_handler() {
 fprintf(stderr, "custom_terminate_handler invoked\n");
 std::exit(1);
}

int main(int argc, char *argv[]) {
 std::set_terminate(&custom_terminate_handler);
 if (argc < 2) return 1;
 const char *mode = argv[1];
 fprintf(stderr, "%s\n", mode);
 if (strcmp(mode, "throw") == 0) {
 throw std::exception();
 } else if (strcmp(mode, "rethrow") == 0) {
 try {
 throw std::exception();
 } catch (...) {
 throw;
 }
 } else {
 return 1;
 }
 return 0;
}
===

On all gcc builds with non-SEH exceptions, this will print
"custom_terminate_handler invoked" both if launched as ./a.out throw or
as ./a.out rethrow, on SEH builds instead if will work as expected only
with ./a.exe throw, but will crash with the "built-in" abort message
with ./a.exe rethrow.

This patch fixes the problem, forwarding back the error code to the
caller (__cxa_rethrow), that calls std::terminate if
_Unwind_Resume_or_Rethrow returns.

The change makes the code path coherent with SEH _Unwind_RaiseException,
and with the generic _Unwind_Resume_or_Rethrow from libgcc/unwind.inc
(used for SjLj and Dw2 exception backend).

libgcc/ChangeLog:

 * unwind-seh.c (_Unwind_Resume_or_Rethrow): forward
 _Unwind_RaiseException return code back to caller instead of
 calling abort, allowing __cxa_rethrow to invoke std::terminate
 in case of uncaught rethrown exception
---
  libgcc/unwind-seh.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgcc/unwind-seh.c b/libgcc/unwind-seh.c
index 8ef0257b616..f1b8f5a8519 100644
--- a/libgcc/unwind-seh.c
+++ b/libgcc/unwind-seh.c
@@ -395,9 +395,9 @@ _Unwind_Reason_Code
  _Unwind_Resume_or_Rethrow (struct _Unwind_Exception *exc)
  {
if (exc->private_[0] == 0)
-_Unwind_RaiseException (exc);
-  else
-_Unwind_ForcedUnwind_Phase2 (exc);
+return _Unwind_RaiseException (exc);
+
+  _Unwind_ForcedUnwind_Phase2 (exc);
abort ();
  }
  


Re: [PATCH] ipa-cp: Fix check for exceeding param_ipa_cp_value_list_size (PR 113490)

2024-01-24 Thread Martin Jambor
Hi,

On Mon, Jan 22 2024, Jan Hubicka wrote:
>> Hi,
>> 
>> When the check for exceeding param_ipa_cp_value_list_size limit was
>> modified to be ignored for generating values from self-recursive
>> calls, it should have been changed from equal to, to equals toor is
>> greater than.  This omission manifests itself as PR 113490.
>> 
>> When I examined the condition I also noticed that the parameter should
>> come from the callee rather than the caller, since the value list is
>> associated with the former and not the latter.  In practice the limit
>> is of course very likely to be the same, but I fixed this aspect of
>> the condition too.  I briefly audited all other uses of opt_for_fn in
>> ipa-cp.cc and all the others looked OK.
>> 
>> Bootstrapped and tested on x86_64-linux.  OK for master?
>> 
>> Thanks,
>> 
>> Martin
>> 
>> 
>> gcc/ChangeLog:
>> 
>> 2024-01-19  Martin Jambor  
>> 
>>  PR ipa/113490
>>  * ipa-cp.cc (ipcp_lattice::add_value): Bail out if value
>>  count is equal or greater than the limit.  Use the limit from the
>>  callee.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2024-01-19  Martin Jambor  
>> 
>>  PR ipa/113490
>>  * gcc.dg/ipa/pr113490.c: New test.
> OK,
> thanks!

thank you, I have pushed the following, which has a tweak in the added
test so that it is only run on targets which support the required vectors.

Martin




When the check for exceeding param_ipa_cp_value_list_size limit was
modified to be ignored for generating values from self-recursive
calls, it should have been changed from equal to, to equals to or is
greater than.  This omission manifests itself as PR 113490.

When I examined the condition I also noticed that the parameter should
come from the callee rather than the caller, since the value list is
associated with the former and not the latter.  In practice the limit
is of course very likely to be the same, but I fixed this aspect of
the condition too.  I briefly audited all other uses of opt_for_fn in
ipa-cp.cc and all the others looked OK.

gcc/ChangeLog:

2024-01-19  Martin Jambor  

PR ipa/113490
* ipa-cp.cc (ipcp_lattice::add_value): Bail out if value
count is equal or greater than the limit.  Use the limit from the
callee.

gcc/testsuite/ChangeLog:

2024-01-22  Martin Jambor  

PR ipa/113490
* gcc.dg/ipa/pr113490.c: New test.
---
 gcc/ipa-cp.cc   |  2 +-
 gcc/testsuite/gcc.dg/ipa/pr113490.c | 31 +
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr113490.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index b1e2a3a829a..e85477df32d 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -2298,7 +2298,7 @@ ipcp_lattice::add_value (valtype newval, 
cgraph_edge *cs,
return false;
   }
 
-  if (!same_lat_gen_level && values_count == opt_for_fn (cs->caller->decl,
+  if (!same_lat_gen_level && values_count >= opt_for_fn (cs->callee->decl,
param_ipa_cp_value_list_size))
 {
   /* We can only free sources, not the values themselves, because sources
diff --git a/gcc/testsuite/gcc.dg/ipa/pr113490.c 
b/gcc/testsuite/gcc.dg/ipa/pr113490.c
new file mode 100644
index 000..526e22b3787
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr113490.c
@@ -0,0 +1,31 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O3 -Wno-psabi"  } */
+
+typedef char A __attribute__((vector_size (64)));
+typedef short B __attribute__((vector_size (64)));
+typedef unsigned C __attribute__((vector_size (64)));
+typedef long D __attribute__((vector_size (64)));
+typedef __int128 E __attribute__((vector_size (64)));
+
+D bar1_D_0;
+E bar4 (A, D);
+
+E
+bar1 (C C_0)
+{
+  C_0 >>= 1;
+  bar4 ((A) C_0, bar1_D_0);
+  bar4 ((A) (E) {~0 }, (D) (A){ ~0 });
+  bar4 ((A) (B) { ~0 }, (D) (C) { ~0 });
+  bar1 ((C) (D){ 0, ~0});
+  bar4 ((A) C_0, bar1_D_0);
+  (A) { bar1 ((C) { 7})[5] - C_0[63], bar4 ((A) (D) {~0}, (D) (C) { 0, 
~0})[3]};
+}
+
+E
+bar4 (A A_0, D D_0)
+{
+  bar1 ((C) A_0);
+  bar1 ((C) {5});
+  bar1 ((C) D_0);
+}
-- 
2.43.0



Re: [PATCH 2/2] libstdc++: Implement P2165R4 changes to std::pair/tuple/etc

2024-01-24 Thread Patrick Palka
On Wed, 24 Jan 2024, Jonathan Wakely wrote:

> On Tue, 23 Jan 2024 at 23:54, Patrick Palka wrote:
> > diff --git a/libstdc++-v3/include/bits/stl_pair.h 
> > b/libstdc++-v3/include/bits/stl_pair.h
> > index b81b479ad43..a9b20fbe7ca 100644
> > --- a/libstdc++-v3/include/bits/stl_pair.h
> > +++ b/libstdc++-v3/include/bits/stl_pair.h
> > @@ -85,12 +85,70 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >/// @cond undocumented
> >
> >// Forward declarations.
> > +  template
> > +struct pair;
> 
> We have a compiler bug where a forward declaration without template
> parameter names causes bad diagnostics later. The compiler seems to
> try to use the parameter names from the first decl it sees, so we end
> up with things like  even when there's a name
> available at the site of the actual error. So I think we should name
> these _T1 and _T2 here.

Will fix.

> 
> > +
> >template
> >  class tuple;
> >
> > +  // Declarations of std::array and its std::get overloads, so that
> > +  // std::tuple_cat can use them if  is included before .
> > +  // We also declare the other std::get overloads here so that they're
> > +  // visible to the P2165R4 tuple-like constructors of pair and tuple.
> > +  template
> > +struct array;
> > +
> >template
> >  struct _Index_tuple;
> >
> > +  template
> > +constexpr typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&
> > +get(pair<_Tp1, _Tp2>& __in) noexcept;
> > +
> > +  template
> > +constexpr typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&&
> > +get(pair<_Tp1, _Tp2>&& __in) noexcept;
> > +
> > +  template
> > +constexpr const typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&
> > +get(const pair<_Tp1, _Tp2>& __in) noexcept;
> > +
> > +  template
> > +constexpr const typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&&
> > +get(const pair<_Tp1, _Tp2>&& __in) noexcept;
> > +
> > +  template
> > +constexpr __tuple_element_t<__i, tuple<_Elements...>>&
> > +get(tuple<_Elements...>& __t) noexcept;
> > +
> > +  template
> > +constexpr const __tuple_element_t<__i, tuple<_Elements...>>&
> > +get(const tuple<_Elements...>& __t) noexcept;
> > +
> > +  template
> > +constexpr __tuple_element_t<__i, tuple<_Elements...>>&&
> > +get(tuple<_Elements...>&& __t) noexcept;
> > +
> > +  template
> > +constexpr const __tuple_element_t<__i, tuple<_Elements...>>&&
> > +get(const tuple<_Elements...>&& __t) noexcept;
> > +
> > +  template
> > +constexpr _Tp&
> > +get(array<_Tp, _Nm>&) noexcept;
> > +
> > +  template
> > +constexpr _Tp&&
> > +get(array<_Tp, _Nm>&&) noexcept;
> > +
> > +  template
> > +constexpr const _Tp&
> > +get(const array<_Tp, _Nm>&) noexcept;
> > +
> > +  template
> > +constexpr const _Tp&&
> > +get(const array<_Tp, _Nm>&&) noexcept;
> > +
> >  #if ! __cpp_lib_concepts
> >// Concept utility functions, reused in conditionally-explicit
> >// constructors.
> > @@ -159,6 +217,46 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  #endif // lib concepts
> >  #endif // C++11
> >
> > +#if __glibcxx_tuple_like // >= C++23
> > +  template
> > +inline constexpr bool __is_tuple_v = false;
> > +
> > +  template
> > +inline constexpr bool __is_tuple_v> = true;
> > +
> > +  // TODO: Reuse __is_tuple_like from ?
> > +  template
> > +inline constexpr bool __is_tuple_like_v = false;
> > +
> > +  template
> > +inline constexpr bool __is_tuple_like_v> = true;
> > +
> > +  template
> > +inline constexpr bool __is_tuple_like_v> = true;
> > +
> > +  template
> > +inline constexpr bool __is_tuple_like_v> = true;
> > +
> > +  // __is_tuple_like_v is defined in .
> > +
> > +  template
> > +concept __tuple_like = __is_tuple_like_v>;
> > +
> > +  template
> > +concept __pair_like = __tuple_like<_Tp> && 
> > tuple_size_v> == 2;
> > +
> > +  template
> > +concept __eligible_tuple_like
> > +  = __detail::__different_from<_Tp, _Tuple> && __tuple_like<_Tp>
> > +   && (tuple_size_v> == tuple_size_v<_Tuple>)
> > +   && !ranges::__detail::__is_subrange>;
> > +
> > +  template
> > +concept __eligible_pair_like
> > +  = __detail::__different_from<_Tp, _Pair> && __pair_like<_Tp>
> > +   && !ranges::__detail::__is_subrange>;
> > +#endif // C++23
> > +
> >template class __pair_base
> >{
> >  #if __cplusplus >= 201103L && ! __cpp_lib_concepts
> > @@ -295,6 +393,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >   return false;
> >  #endif
> > }
> > +
> > +#if __glibcxx_tuple_like // >= C++23
> > +  template
> > +   static constexpr bool
> > +   _S_constructible_from_pair_like()
> > +   {
> > + return 
> > _S_constructible(std::declval<_UPair>())),
> > + 
> > decltype(std::get<1>(std::declval<_UPair>()))>();
> > +   }
> > +
> > +  template
> > +   static constexpr bool
> > +   _S_convertible_from_pair_like()
> > +   {
> > + return 
> > _S_conver

Re: [PATCH] libgccjit: Add gcc_jit_global_set_readonly

2024-01-24 Thread Antoni Boucher
Yes, it is for a use case inside of rustc_codegen_gcc.
The compiler is structured in a way where we don't know if a global
variable might be constant when it is created.

On Wed, 2024-01-24 at 10:09 -0500, David Malcolm wrote:
> On Fri, 2024-01-19 at 16:57 -0500, Antoni Boucher wrote:
> > Hi.
> > This patch adds a new API gcc_jit_global_set_readonly: it's
> > equivalent
> > to having a const global variable, but it is useful in the case of
> > complex compilers where it is not convenient to use const.
> > Thanks for the review.
> 
> Hi Antoni; thanks for the patch.
> 
> Can you give an example of where/why this might be used?
> Presumably this is motivated by a use case you had inside the rustc
> backend?
> 
> Thanks
> Dave
> 



Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-01-24 Thread Georg-Johann Lay




Am 22.01.24 um 08:45 schrieb Richard Biener:

On Fri, Jan 19, 2024 at 5:06 PM Georg-Johann Lay  wrote:




Am 18.01.24 um 20:54 schrieb Roger Sayle:


This patch tweaks RTL expansion of multi-word shifts and rotates to use
PLUS rather than IOR for disjunctive operations.  During expansion of
these operations, the middle-end creates RTL like (X<>C2)
where the constants C1 and C2 guarantee that bits don't overlap.
Hence the IOR can be performed by any any_or_plus operation, such as
IOR, XOR or PLUS; for word-size operations where carry chains aren't
an issue these should all be equally fast (single-cycle) instructions.
The benefit of this change is that targets with shift-and-add insns,
like x86's lea, can benefit from the LSHIFT-ADD form.

An example of a backend that benefits is ARC, which is demonstrated
by these two simple functions:


But there are also back-ends where this is bad.

The reason is that with ORI, the back-end needs only to operate no
these sub-words where the sub-mask is non-zero.  But for PLUS this
is not the case because the back-end does not know that intermediate
carry will be zero.  Hence, with PLUS, more instructions are needed.
An example is AVR, but maybe much more target with multi-word operations
are affected in a bad way.

Take for example the case with 2 words and a value of 1.

LO |= 1
HI |= 0

can be optimized to

LO |= 1

but for addition this is not the case:

LO += 1
HI +=c 0 ;; Does not know that always carry = 0.


I wonder if the PLUS can be done on the lowpart only to make this
detail obvious?


For AVR, word_mode is HImode, but the hardware has only 8-bit registers.

Moreover splitting insns is not wanted or not possible (due to CCmode).

Johann


unsigned long long foo(unsigned long long x) { return x<<2; }

which with -O2 is currently compiled to:

foo:lsr r2,r0,30
  asl_s   r1,r1,2
  asl_s   r0,r0,2
  j_s.d   [blink]
  or_sr1,r1,r2

with this patch becomes:

foo:lsr r2,r0,30
  add2r1,r2,r1
  j_s.d   [blink]
  asl_s   r0,r0,2

unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }

which with -O2 is currently compiled to 6 insns + return:

bar:lsr r12,r0,30
  asl_s   r3,r1,2
  asl_s   r0,r0,2
  lsr_s   r1,r1,30
  or_sr0,r0,r1
  j_s.d   [blink]
  or  r1,r12,r3

with this patch becomes 4 insns + return:

bar:lsr r3,r1,30
  lsr r2,r0,30
  add2r1,r2,r1
  j_s.d   [blink]
  add2r0,r3,r0


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-01-18  Roger Sayle  

gcc/ChangeLog
  * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
  to generate PLUS instead or IOR when unioning disjoint bitfields.
  * optabs.cc (expand_subword_shift): Likewise.
  (expand_binop): Likewise for double-word rotate.


Thanks in advance,
Roger


Re: [PATCH] libstdc++: atomic: Add missing clear_padding in __atomic_float constructor

2024-01-24 Thread xndcn
Hi, is it OK for trunk? I do not have access to the repo, can you
please help me submit the patch? Thanks.

xndcn  于2024年1月17日周三 00:16写道:
>
> Sorry about the mangled content...
> So I add a new add-options for libatomic_16b:
>
> ---
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_base.h: add __builtin_clear_padding in
> __atomic_float constructor.
> * testsuite/lib/dg-options.exp: add new add-options for libatomic_16b.
> * testsuite/29_atomics/atomic_float/compare_exchange_padding.cc: New test.
> ---
>  libstdc++-v3/include/bits/atomic_base.h   |  7 ++-
>  .../atomic_float/compare_exchange_padding.cc  | 50 +++
>  libstdc++-v3/testsuite/lib/dg-options.exp | 22 
>  3 files changed, 78 insertions(+), 1 deletion(-)
>  create mode 100644
> libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc
>
> diff --git a/libstdc++-v3/include/bits/atomic_base.h
> b/libstdc++-v3/include/bits/atomic_base.h
> index f4ce0fa53..104ddfdbe 100644
> --- a/libstdc++-v3/include/bits/atomic_base.h
> +++ b/libstdc++-v3/include/bits/atomic_base.h
> @@ -1283,7 +1283,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>constexpr
>__atomic_float(_Fp __t) : _M_fp(__t)
> -  { }
> +  {
> +#if __cplusplus >= 201402L && __has_builtin(__builtin_clear_padding)
> + if _GLIBCXX17_CONSTEXPR (__atomic_impl::__maybe_has_padding<_Fp>())
> +   __builtin_clear_padding(std::__addressof(_M_fp));
> +#endif
> +  }
>
>__atomic_float(const __atomic_float&) = delete;
>__atomic_float& operator=(const __atomic_float&) = delete;
> diff --git 
> a/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc
> b/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc
> new file mode 100644
> index 0..d538b3d55
> --- /dev/null
> +++ 
> b/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc
> @@ -0,0 +1,50 @@
> +// { dg-do run { target c++20 } }
> +// { dg-options "-O0" }
> +// { dg-timeout 10 }
> +// { dg-additional-options "-mlong-double-80" { target x86_64-*-* } }
> +// { dg-do run { target { ia32 || x86_64-*-* } } }
> +// { dg-add-options libatomic_16b }
> +
> +#include 
> +#include 
> +
> +template
> +void __attribute__((noinline,noipa))
> +fill_padding(T& f)
> +{
> +  T mask;
> +  __builtin_memset(&mask, 0xff, sizeof(T));
> +  __builtin_clear_padding(&mask);
> +  unsigned char* ptr_f = (unsigned char*)&f;
> +  unsigned char* ptr_mask = (unsigned char*)&mask;
> +  for (int i = 0; i < sizeof(T); i++)
> +  {
> +if (ptr_mask[i] == 0x00)
> +{
> +  ptr_f[i] = 0xff;
> +}
> +  }
> +}
> +
> +void
> +test01()
> +{
> +  long double f = 0.5f; // long double may contains padding on X86
> +  fill_padding(f);
> +  std::atomic as{ f }; // padding cleared on constructor
> +  long double t = 1.5;
> +
> +  as.fetch_add(t);
> +  long double s = f + t;
> +  t = as.load();
> +  VERIFY(s == t); // padding ignored on float comparing
> +  fill_padding(s);
> +  VERIFY(as.compare_exchange_weak(s, f)); // padding cleared on cmpexchg
> +  fill_padding(f);
> +  VERIFY(as.compare_exchange_strong(f, t)); // padding cleared on cmpexchg
> +}
> +
> +int main()
> +{
> +  test01();
> +}
> diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp
> b/libstdc++-v3/testsuite/lib/dg-options.exp
> index 850442b6b..25da20d58 100644
> --- a/libstdc++-v3/testsuite/lib/dg-options.exp
> +++ b/libstdc++-v3/testsuite/lib/dg-options.exp
> @@ -356,6 +356,28 @@ proc add_options_for_libatomic { flags } {
>  return $flags
>  }
>
> +# Add option to link to libatomic, if required for atomics on 16-byte 
> (128-bit)
> +proc add_options_for_libatomic_16b { flags } {
> +if { ([istarget i?86-*-*] || [istarget x86_64-*-*])
> +   } {
> + global TOOL_OPTIONS
> +
> + set link_flags ""
> + if ![is_remote host] {
> + if [info exists TOOL_OPTIONS] {
> + set link_flags "[atomic_link_flags [get_multilibs ${TOOL_OPTIONS}]]"
> + } else {
> + set link_flags "[atomic_link_flags [get_multilibs]]"
> + }
> + }
> +
> + append link_flags " -latomic "
> +
> + return "$flags $link_flags"
> +}
> +return $flags
> +}
> +
>  # Add options to enable use of deprecated features.
>  proc add_options_for_using-deprecated { flags } {
>  return "$flags -U_GLIBCXX_USE_DEPRECATED -D_GLIBCXX_USE_DEPRECATED=1"
> --
> 2.25.1
>
>
> Xi Ruoyao  于2024年1月16日周二 18:12写道:
> >
> > On Tue, 2024-01-16 at 17:53 +0800, xndcn wrote:
> > > Thanks, so I add a test: atomic_float/compare_exchange_padding.cc,
> > > which will fail due to timeout without the patch.
> >
> > Please resend in plain text instead of HTML.  Sending in HTML causes the
> > patch mangled.
> >
> > And libstdc++ patches should CC libstd...@gcc.gnu.org, in addition to
> > gcc-patches@gcc.gnu.org.
> >
> > > ---
> > > libstdc++-v3/ChangeLog:
> > >
> > >  * include/bits/atomic_base.h: add __builtin_clear_padding in 
> > > __atomic_float constructor.
> > >  * testsuite/lib/dg-options.exp: enable libatomic for 

Re: [PATCH, V2] PR target/112886, Add %S to print_operand for vector pair support.

2024-01-24 Thread Peter Bergner
On 1/24/24 12:04 AM, Kewen.Lin wrote:
> on 2024/1/24 11:11, Peter Bergner wrote:
>> But not with this.  The -mdejagnu-cpu=power10 option already enables -mvsx.
>> If the user explcitly forces -mno-vsx via RUNTESTFLAGS, then let them.
>> The options set in RUNTESTFLAGS come after the options in the dg-options
>> line, so even adding -mvsx like the above won't help the test case PASS
> 
> But this is NOT true, at least on one of internal Power10 machine 
> (ltcden2-lp1).
> 
> With the command below:
>   
>   make check-gcc-c RUNTESTFLAGS="--target_board=unix/-mno-vsx 
> powerpc.exp=pr112886.c"
> 
> this test case fails without the explicit -mvsx in dg-options.
> 
> From the verbose dumping, the compilation command looks like:
> 
> /home/linkw/gcc/build/gcc-test-debug/gcc/xgcc 
> -B/home/linkw/gcc/build/gcc-test-debug/gcc/
> /home/linkw/gcc/gcc-test/gcc/testsuite/gcc.target/powerpc/pr112886.c  
> -mno-vsx 
> -fdiagnostics-plain-output  -mdejagnu-cpu=power10 -O2 -ffat-lto-objects 
> -fno-ident -S
> -o pr112886.s
> 
> "-mno-vsx" comes **before** "-mdejagnu-cpu=power10 -O2" rather than **after**.
> 
> I guess it might be due to different behaviors of different versions of 
> runtest framework?

That is confusing, unless as you say, the behavior changed.  The whole reason 
we added
-mdejagnu-cpu= (and the dg-skip usage before that) was due to encountering 
problems
when the test case wanted a specific -mcpu= value and the user overrode it in 
their
RUNTESTFLAGS and that can only happen when its options come last on the command 
line.

Then again, why didn't the powerpc_vsx_ok test not save us here?



> So there can be two cases with user explicitly specified -mno-vsx:
> 
> 1) RUNTESTFLAGS comes after dg-options (assuming same order for -mvsx in 
> powerpc_vsx_ok)
> 
>   powerpc_vsx_ok test failed, so UNSUPPORTED
> 
>   // with explicit -mvsx does nothing as you said.
> 
> 2) RUNTESTFLAGS comes before dg-options
> 
>   powerpc_vsx_ok test succeeds, but FAIL.
>   
>  // with suggested -mvsx, make it match the powerpc_vsx_ok checking and the 
> case not fail.
> 
> As above I think we still need to append the "-mvsx" explicitly.  As 
> tested/verified, it
> does help the case not to fail on ltcden2-lp1.

I'd like to verify that the behavior did change before we enforce adding that 
option.
The problem is, there are a HUGE number of older test cases that would need 
updating
to "fix" them too.  ...and not just adding -mnsx, but -maltivec and basically 
any
-mfoo option where the test case is expecting the feature foo to be used/tested.
It would be a huge mess.

Peter



[patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs (was: [PATCH] amdgcn: additional gfx1100 support)

2024-01-24 Thread Tobias Burnus
This patch obviously depends on Andrew's; he wrote in the previous email 
of this thread regarding his patch:


Andrew Stubbs wrote:

This is enough to get gfx1100 working for most purposes, on top of the
patch that Tobias committed a week or so ago; there are still some test
failures to investigate, and probably some tuning to do.

It might also get gfx1030 working too. @Richi, could you test it,
please?

I can't test the other multilibs right now. @PA, can you test it please?

I can self-approve the patch, but I'll hold off the commit until the
test results come back.


Okay to enable gfx1100 multilib building and to document gfx1100 in the 
manual?


(I mean, obviously, only after Andrew committed his patch. For gfx1030, 
we might eventually also enable gfx1030 multilib support; if Richard 
confirms that collaterally fixes gfx1030, we probably should - and 
depending on the number/kinds of testsuite, we could then document it

or not, I guess.)

Tobiasamdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs

gcc/ChangeLog:

	* config.gcc (amdgcn-*-*): Add gfx1100 to TM_MULTILIB_CONFIG.
	* doc/install.texi (Configuration amdgcn-*-*): Mention gfx1100.
	* doc/invoke.texi (AMD GCN Options): Add gfx1100 to -march/-mtune.

libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-4.h: Add variant functions
	for gfx1030 and gfx1100.
	* testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test.

Signed-off-by: Tobias Burnus 

 gcc/config.gcc  |  2 +-
 gcc/doc/install.texi| 12 ++--
 gcc/doc/invoke.texi |  3 +++
 libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c |  8 
 libgomp/testsuite/libgomp.c/declare-variant-4.h | 16 
 5 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2d7d7dd475..2343e98ebe6 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4564,7 +4564,7 @@ case "${target}" in
 			TM_MULTILIB_CONFIG=
 			;;
 		xdefault | xyes)
-			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a" | sed "s/${with_arch},\?//;s/,$//"`
+			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1100" | sed "s/${with_arch},\?//;s/,$//"`
 			;;
 		*)
 			TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 71593919389..5304ebd36a9 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of
 
 @item amdgcn*-*-*
 @var{list} is a comma separated list of ISA names (allowed values: @code{fiji},
-@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
-include the name of the default ISA, specified via @option{--with-arch}.  If
-@var{list} is empty, then there will be no multilibs and only the default
-run-time library will be built.  If @var{list} is @code{default} or
-@option{--with-multilib-list=} is not specified, then the default set of
-libraries is selected.
+@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1100}).
+It ought not include the name of the default ISA, specified
+via @option{--with-arch}.  If @var{list} is empty, then there will be no
+multilibs and only the default run-time library will be built.  If @var{list}
+is @code{default} or @option{--with-multilib-list=} is not specified, then
+the default set of libraries is selected.
 
 @item arm*-*-*
 @var{list} is a comma separated list of @code{aprofile} and
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5f904cf1ef2..d1b2c284e2b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21723,6 +21723,9 @@ Compile for CDNA1 Instinct MI100 series devices (gfx908).
 @item gfx90a
 Compile for CDNA2 Instinct MI200 series devices (gfx90a).
 
+@item gfx1100
+Compile for RDNA3 gfx1100 devices (GFX11 series).
+
 @end table
 
 @opindex msram-ecc
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c
new file mode 100644
index 000..6ade35224cc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c
@@ -0,0 +1,8 @@
+/* { dg-do link { target { offload_target_amdgcn } } } */
+/* { dg-additional-options -foffload=amdgcn-amdhsa } */
+/* { dg-additional-options -foffload=-march=gfx1100 } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#include "declare-variant-4.h"
+
+/* { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "= gfx1100 \\(\\);" "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4.h b/libgomp/testsuite/libgomp.c/declare-variant-4.h
index a70352430c2..393a5e295cc 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-4.h
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4.h
@@ -35,6 +35,20 @@ gfx90a (void)
   return 0x90a;
 }
 
+__attribute__ ((noipa))
+int
+gfx1030 (void)
+{
+  return 0

Re: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-01-24 Thread Alex Coplan
Hi Ajit,

On 21/01/2024 19:57, Ajit Agarwal wrote:
> 
> Hello All:
> 
> New pass to replace adjacent memory addresses lxv with lxvp.
> Added common infrastructure for load store fusion for
> different targets.

Thanks for this, it would be nice to see the load/store pair pass
generalized to multiple targets.

I assume you are targeting GCC 15 for this, as we are in stage 4 at
the moment?

> 
> Common routines are refactored in fusion-common.h.
> 
> AARCH64 load/store fusion pass is not changed with the 
> common infrastructure.

I think any patch to generalize the load/store pair fusion pass should
update the aarch64 code at the same time to use the generic
infrastructure, instead of duplicating the code.

As a general comment, I think we should move as much of the code as
possible to target-independent code, with only the bits that are truly
target-specific (e.g. deciding which modes to allow for a load/store
pair operand) in target code.

In terms of structuring the interface between generic code and target
code, I think it would be pragmatic to use a class with (in some cases,
pure) virtual functions that can be overriden by targets to implement
any target-specific behaviour.

IMO the generic class should be implemented in its own .cc instead of
using a header-only approach.  The target code would then define a
derived class which overrides the virtual functions (where necessary)
declared in the generic class, and then instantiate the derived class to
create a target-customized instance of the pass.

A more traditional GCC approach would be to use optabs and target hooks
to customize the behaviour of the pass to handle target-specific
aspects, but:
 - Target hooks are quite heavyweight, and we'd potentially have to add
   quite a few hooks just for one pass that (at least initially) will
   only be used by a couple of targets.
 - Using classes allows both sides to easily maintain their own state
   and share that state where appropriate.

Nit on naming: I understand you want to move away from ldp_fusion, but
how about pair_fusion or mem_pair_fusion instead of just "fusion" as a
base name?  IMO just "fusion" isn't very clear as to what the pass is
trying to achieve.

In general the code could do with a lot more commentary to explain the
rationale for various things / explain the high-level intent of the
code.

Unfortunately I'm not familiar with the DF framework (I've only really
worked with RTL-SSA for the aarch64 pass), so I haven't commented on the
use of that framework, but it would be nice if what you're trying to do
could be done using RTL-SSA instead of using DF directly.

Hopefully Richard S can chime in on those aspects.

My main concerns with the patch at the moment (apart from the code
duplication) is that it looks like:

 - The patch removes alias analysis from try_fuse_pair, which is unsafe.
 - The patch tries to make its own RTL changes inside
   rs6000_gen_load_pair, but it should let fuse_pair make those changes
   using RTL-SSA instead.

I've left some more specific (but still mostly high-level) comments below.

> 
> For AARCH64 architectures just include "fusion-common.h"
> and target dependent code can be added to that.
> 
> 
> Alex/Richard:
> 
> If you would like me to add for AARCH64 I can do that for AARCH64.
> 
> If you would like to do that is fine with me.
> 
> Bootstrapped and regtested with powerpc64-linux-gnu.
> 
> Improvement in performance is seen with Spec 2017 spec FP benchmarks.
> 
> Thanks & Regards
> Ajit
> 
> rs6000: New  pass for replacement of adjacent lxv with lxvp.

Are you looking to handle stores eventually, out of interest?  Looking
at rs6000-vecload-opt.cc:fusion_bb it looks like you're just handling
loads at the moment.

> 
> New pass to replace adjacent memory addresses lxv with lxvp.
> Added common infrastructure for load store fusion for
> different targets.
> 
> Common routines are refactored in fusion-common.h.

I've just done a very quick scan through this file as it mostly just
looks to be idential to existing code in aarch64-ldp-fusion.cc.

> 
> 2024-01-21  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-passes.def: New vecload pass
>   before pass_early_remat.
>   * config/rs6000/rs6000-vecload-opt.cc: Add new pass.
>   * config.gcc: Add new executable.
>   * config/rs6000/rs6000-protos.h: Add new prototype for vecload
>   pass.
>   * config/rs6000/rs6000.cc: Add new prototype for vecload pass.
>   * config/rs6000/t-rs6000: Add new rule.
>   * fusion-common.h: Add common infrastructure for load store
>   fusion that can be shared across different architectures.
>   * emit-rtl.cc: Modify assert code.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/vecload.C: New test.
>   * g++.target/powerpc/vecload1.C: New test.
>   * gcc.target/powerpc/mma-builtin-1.c: Modify test.
> ---
>  gcc/config.gcc|4 +-
>  gcc/config/rs6000/rs6000-passe

Re: [PATCH 1/2] RISC-V/testsuite: Add RTL pr105314.c testcase variants

2024-01-24 Thread Jeff Law




On 1/24/24 04:16, Maciej W. Rozycki wrote:

Add a pair of RTL tests, for RV64 and RV32 respectively, corresponding
to the existing pr105314.c test.  They have been produced from RTL code
as at the entry of the "ce1" pass for pr105314.c compiled at -O3.

gcc/testsuite/
* gcc.target/riscv/pr105314-rtl.c: New file.
* gcc.target/riscv/pr105314-rtl32.c: New file.

OK
jeff



Re: [PATCH 2/2] RISC-V/testsuite: Add RTL cset-sext.c testcase variants

2024-01-24 Thread Jeff Law




On 1/24/24 04:16, Maciej W. Rozycki wrote:

Add RTL tests, for RV64 and RV32 where appropriate, corresponding to the
existing cset-sext.c tests.  They have been produced from RTL code as at
the entry of the "ce1" pass for the respective cset-sext.c tests built
at -O3.

gcc/testsuite/
* gcc.target/riscv/cset-sext-rtl.c: New file.
* gcc.target/riscv/cset-sext-rtl32.c: New file.
* gcc.target/riscv/cset-sext-sfb-rtl.c: New file.
* gcc.target/riscv/cset-sext-sfb-rtl32.c: New file.
* gcc.target/riscv/cset-sext-thead-rtl.c: New file.
* gcc.target/riscv/cset-sext-ventana-rtl.c: New file.
* gcc.target/riscv/cset-sext-zicond-rtl.c: New file.
* gcc.target/riscv/cset-sext-zicond-rtl32.c: New file.

OK
jeff


Re: [PATCH] libgccjit: Allow comparing array types

2024-01-24 Thread David Malcolm
On Fri, 2024-01-19 at 16:55 -0500, Antoni Boucher wrote:
> Hi.
> This patch allows comparing different instances of array types as
> equal.
> Thanks for the review.

Thanks; the patch looks good to me.

Dave



Re: [PATCH] libgccjit: Allow comparing aligned int types

2024-01-24 Thread David Malcolm
On Thu, 2023-12-21 at 08:33 -0500, Antoni Boucher wrote:
> Hi.
> This patch allows comparing aligned integer types as equal.
> There's a TODO in the code about whether we should check that the
> alignment is equal.
> What are your thoughts on this?

I think we should check for equal alignment.

[...snip...]

> diff --git a/gcc/testsuite/jit.dg/test-types.c 
> b/gcc/testsuite/jit.dg/test-types.c
> index a01944e35fa..c2f4d2bcb3d 100644
> --- a/gcc/testsuite/jit.dg/test-types.c
> +++ b/gcc/testsuite/jit.dg/test-types.c
> @@ -485,11 +485,15 @@ verify_code (gcc_jit_context *ctxt, gcc_jit_result 
> *result)
>  
>CHECK_VALUE (z.m_FILE_ptr, stderr);
>  
> +  gcc_jit_type *long_type = gcc_jit_context_get_type (ctxt, 
> GCC_JIT_TYPE_LONG);
> +  gcc_jit_type *int64_type = gcc_jit_context_get_type (ctxt, 
> GCC_JIT_TYPE_INT64_T);
>if (sizeof(long) == 8)
> -CHECK (gcc_jit_compatible_types (
> -  gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_LONG),
> -  gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT64_T)));
> +CHECK (gcc_jit_compatible_types (long_type, int64_type));
>  
>CHECK_VALUE (gcc_jit_type_get_size (gcc_jit_context_get_type (ctxt, 
> GCC_JIT_TYPE_FLOAT)), sizeof (float));
>CHECK_VALUE (gcc_jit_type_get_size (gcc_jit_context_get_type (ctxt, 
> GCC_JIT_TYPE_DOUBLE)), sizeof (double));
> +
> +  gcc_jit_type *aligned_long = gcc_jit_type_get_aligned (long_type, 4);
> +  gcc_jit_type *aligned_int64 = gcc_jit_type_get_aligned (int64_type, 4);
> +  CHECK (gcc_jit_compatible_types (aligned_long, aligned_int64));

This CHECK should be guarded on sizeof(long) == 8 like the check above.


Dave



[PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64

2024-01-24 Thread Victor Do Nascimento
v4 updates

  1. Make use of HWCAP2_LSE128, as defined in the  Linux kernel v6.7
  for feature check.  This has required adding a new patch to the
  series, enabling ifunc resolvers to read a second arg of type
  `__ifunc_arg_t *', from which the `_hwcap2' member can be queried
  for LSE128 support.  HWCAP2_LSE128, HWCAP_ATOMICS and __ifunc_arg_t
  are conditionally defined in the `host-config.h' file to allow
  backwards compatibility with older versions of glibc which lack
  definitions for these.

  2. Run configure test LIBAT_TEST_FEAT_LSE128 unconditionally,
  renaming it to LIBAT_TEST_FEAT_AARCH64_LSE128.  While it may seem
  counter-intuitive to run an aarch64 test on non-aarch64 targets, the
  Automake manual makes it clear:

"Note that you must arrange for every AM_CONDITIONAL to be
 invoked every time configure is run. If AM_CONDITIONAL is
 run conditionally (e.g., in a shell if statement), then
 the result will confuse automake."

  Failure to do so has been found to result in Libatomic build
  failures on arm and x86_64 targets.

  3. Minor changes in the implementations of {ENTRY|END}_FEAT and
  ALIAS macros used in `config/linux/aarch64/atomic_16.S'

  4. Improve commit message in PATCH 2/3 documenting design choice
  around merging REL and ACQ_REL memory orderings in LSE128 atomic
  functions.

Regression-tested on aarch64-none-linux-gnu on Cortex-A72 and
LSE128-enabled Armv-A Base RevC AEM FVP.

---

Building upon Wilco Dijkstra's work on AArch64 128-bit atomics for
Libatomic, namely the patches from [1] and [2],  this patch series
extends the library's  capabilities to dynamically select and emit
Armv9.4-a LSE128 implementations of atomic operations via ifuncs at
run-time whenever architectural support is present.

Regression tested on aarch64-linux-gnu target with LSE128-support.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html

Victor Do Nascimento (4):
  libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
  libatomic: Add support for __ifunc_arg_t arg in ifunc resolver
  libatomic: Enable LSE128 128-bit atomics for armv9.4-a
  aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 ++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 247 ---
 libatomic/config/linux/aarch64/host-config.h |  60 -
 libatomic/configure  |  61 -
 libatomic/configure.ac   |   3 +
 libatomic/configure.tgt  |   2 +-
 9 files changed, 358 insertions(+), 41 deletions(-)

-- 
2.42.0



[PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver

2024-01-24 Thread Victor Do Nascimento
With support for new atomic features in Armv9.4-a being indicated by
HWCAP2 bits, Libatomic's ifunc resolver must now query its second
argument, of type __ifunc_arg_t*.

We therefore make this argument known to libatomic, allowing us to
query hwcap2 bits in the following manner:

  bool
  resolver (unsigned long hwcap, const __ifunc_arg_t *features);
  {
return (features->hwcap2 & HWCAP2_);
  }

libatomic/ChangeLog:

* config/linux/aarch64/host-config.h (__ifunc_arg_t):
Conditionally-defined if `sys/ifunc.h' not found.
(_IFUNC_ARG_HWCAP): Likewise.
(IFUNC_COND_1): Pass __ifunc_arg_t argument to ifunc.
(ifunc1): Modify function signature to accept __ifunc_arg_t
argument.
* configure.tgt: Add second `const __ifunc_arg_t *features'
argument to IFUNC_RESOLVER_ARGS.
---
 libatomic/config/linux/aarch64/host-config.h | 15 +--
 libatomic/configure.tgt  |  2 +-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index 4200293c4e3..8fd4fe3321a 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -24,9 +24,20 @@
 #if HAVE_IFUNC
 #include 
 
+#if __has_include()
+# include 
+#else
+typedef struct __ifunc_arg_t {
+  unsigned long _size;
+  unsigned long _hwcap;
+  unsigned long _hwcap2;
+} __ifunc_arg_t;
+# define _IFUNC_ARG_HWCAP (1ULL << 62)
+#endif
+
 #ifdef HWCAP_USCAT
 # if N == 16
-#  define IFUNC_COND_1 ifunc1 (hwcap)
+#  define IFUNC_COND_1 ifunc1 (hwcap, features)
 # else
 #  define IFUNC_COND_1 (hwcap & HWCAP_ATOMICS)
 # endif
@@ -48,7 +59,7 @@
 #define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
 
 static inline bool
-ifunc1 (unsigned long hwcap)
+ifunc1 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
   if (hwcap & HWCAP_USCAT)
 return true;
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index b7609132c58..67a5f2dff80 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -194,7 +194,7 @@ esac
 # The type may be different on different architectures.
 case "${target}" in
   aarch64*-*-*)
-   IFUNC_RESOLVER_ARGS="uint64_t hwcap"
+   IFUNC_RESOLVER_ARGS="uint64_t hwcap, const __ifunc_arg_t *features"
;;
   *)
IFUNC_RESOLVER_ARGS="void"
-- 
2.42.0



[PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-24 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

  * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
  value held in a pair of registers, with original data loaded into
  the same 2 registers.
  * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
  in a pair of registers, with original data loaded into the same 2
  registers.
  * SWPP - Atomic swap of one 128-bit value with 128-bit value held
  in a pair of registers.

It is worth noting that in keeping with existing 128-bit atomic
operations in `atomic_16.S', we have chosen to merge certain
less-restrictive orderings into more restrictive ones.  This is done
to minimize the number of branches in the atomic functions, minimizing
both the likelihood of branch mispredictions and, in keeping code
small, limit the need for extra fetch cycles.

Past benchmarking has revealed that acquire is typically slightly
faster than release (5-10%), such that for the most frequently used
atomics (CAS and SWP) it makes sense to add support for acquire, as
well as release.

Likewise, it was identified that combining acquire and release typically
results in little to no penalty, such that it is of negligible benefit
to distinguish between release and acquire-release, making the
combining release/acq_rel/seq_cst a worthwhile design choice.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

  1. Add a configure-time check to check for LSE128 support in the
  assembler.
  2. Edit host-config.h so that when N == 16, nifunc = 2.
  3. Where available due to LSE128, implement the second ifunc, making
  use of the novel instructions.
  4. For atomic functions unable to make use of these new
  instructions, define a new alias which causes the _i1 function
  variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

* Makefile.am (AM_CPPFLAGS): add conditional setting of
-DHAVE_FEAT_LSE128.
* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LSE128): New.
* config/linux/aarch64/atomic_16.S (LSE128): New macro
definition.
(libat_exchange_16): New LSE128 variant.
(libat_fetch_or_16): Likewise.
(libat_or_fetch_16): Likewise.
(libat_fetch_and_16): Likewise.
(libat_and_fetch_16): Likewise.
* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
(IFUNC_NCOND): Add operand size checking.
(has_lse2): Renamed from `ifunc1`.
(has_lse128): New.
(HWCAP2_LSE128): Likewise.
* libatomic/configure.ac: Add call to
LIBAT_TEST_FEAT_AARCH64_LSE128.
* configure (ac_subst_vars): Regenerated via autoreconf.
* libatomic/Makefile.in: Likewise.
* libatomic/auto-config.h.in: Likewise.
---
 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 +++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 170 ++-
 libatomic/config/linux/aarch64/host-config.h |  42 -
 libatomic/configure  |  61 ++-
 libatomic/configure.ac   |   3 +
 8 files changed, 293 insertions(+), 9 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index cfad90124f9..0623a0bf2d1 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
_$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+if ARCH_AARCH64_HAVE_LSE128
+AM_CPPFLAGS = -DHAVE_FEAT_LSE128
+endif
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
 libatomic_la_SOURCES += atomic_16.S
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index dc2330b91fd..cd48fa21334 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
_$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
$(am__append_4) $(am__append_5)
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS
 = -DHAVE_FEAT_LSE128
 @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp 
-DHAVE_KERNEL64
 @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586
diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index f35ab5b60a5..d4f13174e2c 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILT

[PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-24 Thread Victor Do Nascimento
At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register.  This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register.  Given the undesirable nature of the
computational expense associated with the context switch, it is
important to implement mechanisms to, wherever possible, forgo the
operation.

In light of this, given how other architectural requirements serving
as prerequisites have long been assigned HWCAP bits by the kernel, we
can inexpensively query for their availability before attempting to
read any system registers.  Where one of these early tests fail, we
can assert that the main feature of interest (be it LSE2 or LSE128)
cannot be present, allowing us to return from the function early and
skip the unnecessary expensive kernel-mediated access to system
registers.

libatomic/ChangeLog:

* config/linux/aarch64/host-config.h (has_lse2): Add test for LSE.
(has_lse128): Add test for LSE2.
---
 libatomic/config/linux/aarch64/host-config.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index 1bc7d839232..4e354124063 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -64,8 +64,13 @@ typedef struct __ifunc_arg_t {
 static inline bool
 has_lse2 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
+  /* Check for LSE2.  */
   if (hwcap & HWCAP_USCAT)
 return true;
+  /* No point checking further for atomic 128-bit load/store if LSE
+ prerequisite not met.  */
+  if (!(hwcap & HWCAP_ATOMICS))
+return false;
   if (!(hwcap & HWCAP_CPUID))
 return false;
 
@@ -99,9 +104,11 @@ has_lse128 (unsigned long hwcap, const __ifunc_arg_t 
*features)
  support in older kernels as it is of CPU feature absence.  Try fallback
  method to guarantee LSE128 is not implemented.
 
- In the absence of HWCAP_CPUID, we are unable to check for LSE128.  */
-  if (!(hwcap & HWCAP_CPUID))
-return false;
+ In the absence of HWCAP_CPUID, we are unable to check for LSE128.
+ If feature check available, check LSE2 prerequisite before proceeding.  */
+  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
+ return false;
+
   unsigned long isar0;
   asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
   if (AT_FEAT_FIELD (isar0) >= 3)
-- 
2.42.0



[PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-24 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

  #define LSE(NAME) NAME##_i1

Where we wish to generate ifunc names with the pre-processor's token
concatenation feature, we add a level of indirection to previous macro
calls.  If before we would have had`MACRO(_i)', we now have
`MACRO_FEAT(name, feature)'.  Where we wish to refer to base
functionality (i.e., functions where ifunc suffixes are absent), the
original `MACRO()' may be used to bypass suffixing.

Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same.  For example, the entry
and endpoints of `libat_store_16' remain defined by:

  ENTRY (libat_store_16)

and

  END (libat_store_16)

For the LSE2 implementation of the same 16-byte atomic store, we now
have:

  ENTRY_FEAT (libat_store_16, LSE2)

and

  END_FEAT (libat_store_16, LSE2)

For the aliasing of function names, we define the following new
implementation of the ALIAS macro:

  ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)

Defining the `CORE(NAME)' macro to be the identity operator, it
returns the base function name unaltered and allows us to alias
target-specific ifuncs to the corresponding base implementation.
For example, we'd alias the LSE2 `libat_exchange_16' to it base
implementation with:

  ALIAS (libat_exchange_16, LSE2, CORE)

libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY_FEAT): Likewise.
(ENTRY_FEAT1): Likewise.
(END_FEAT): Likewise.
(END_FEAT1): Likewise.
(ALIAS): Modify macro to take in `arch' arguments.
(ALIAS1): New.
---
 libatomic/config/linux/aarch64/atomic_16.S | 79 +-
 1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index ad14f8f2e6e..16a42925903 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,22 +40,38 @@
 
.arch   armv8-a+lse
 
-#define ENTRY(name)\
-   .global name;   \
-   .hidden name;   \
-   .type name,%function;   \
+#define LSE2(NAME) NAME##_i1
+#define CORE(NAME) NAME
+
+#define ENTRY(NAME) ENTRY_FEAT1 (NAME)
+
+#define ENTRY_FEAT(NAME, FEAT)  \
+   ENTRY_FEAT1 (FEAT (NAME))
+
+#define ENTRY_FEAT1(NAME)  \
+   .global NAME;   \
+   .hidden NAME;   \
+   .type NAME,%function;   \
.p2align 4; \
-name:  \
-   .cfi_startproc; \
+NAME:  \
+   .cfi_startproc; \
hint34  // bti c
 
-#define END(name)  \
+#define END(NAME) END_FEAT1 (NAME)
+
+#define END_FEAT(NAME, FEAT)   \
+   END_FEAT1 (FEAT (NAME))
+
+#define END_FEAT1(NAME)\
.cfi_endproc;   \
-   .size name, .-name;
+   .size NAME, .-NAME;
+
+#define ALIAS(NAME, FROM, TO)  \
+   ALIAS1 (FROM (NAME),TO (NAME))
 
-#define ALIAS(alias,name)  \
-   .global alias;  \
-   .set alias, name;
+#define ALIAS1(ALIAS, NAME)\
+   .global ALIAS;  \
+   .set ALIAS, NAME;
 
 #define res0 x0
 #define res1 x1
@@ -108,7 +124,7 @@ ENTRY (libat_load_16)
 END (libat_load_16)
 
 
-ENTRY (libat_load_16_i1)
+ENTRY_FEAT (libat_load_16, LSE2)
cbnzw1, 1f
 
/* RELAXED.  */
@@ -128,7 +144,7 @@ ENTRY (libat_load_16_i1)
ldp res0, res1, [x0]
dmb ishld
ret
-END (libat_load_16_i1)
+END_FEAT (libat_load_16, LSE2)
 
 
 ENTRY (libat_store_16)
@@ -148,7 +164,7 @@ ENTRY (libat_store_16)
 END (libat_store_16)
 
 
-ENTRY (libat_store_16_i1)
+ENTRY_FEAT (libat_store_16, LSE2)
cbnzw4, 1f
 
/* RELAXED.  */
@@ -160,7 +176,7 @@ ENTRY (libat_store_16_i1)
stlxp   w4, in0, in1, [x0]
cbnzw4, 1b
ret
-END (libat_store_16_i1)
+END_FEAT (libat_store_16, LSE2)
 
 
 ENTRY (libat_exchange_16)
@@ -237,7 +253,7 @@ ENTRY (libat_compare_exchange_16)
 END (libat_compare_exchange_16)
 
 
-ENTRY (libat_compare_exchange_16_i1)
+ENTRY_FEAT (libat_compare_exchange_16, LSE2)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -270,7 +286,7 @@ ENTRY (libat_compare_exchange_16_i1)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END (libat_compare_exchange_16_i1)
+END_FEAT (libat_compare_exchange_16, LSE2)
 
 
 ENTRY (libat_fetch_add_16)
@@ -556,21 +572,20 @@ END (libat_test_and_set_

Re: [PATCH v3] RISC-V: Add split pattern to generate SFB instructions. [PR113095]

2024-01-24 Thread Jeff Law




On 1/24/24 05:54, Monk Chiang wrote:

Since the match.pd transforms (zero_one == 0) ? y : z  y,
into ((typeof(y))zero_one * z)  y. Add splitters to recongize
this expression to generate SFB instructions.

gcc/ChangeLog:
PR target/113095
* config/riscv/sfb.md: New splitters to rewrite single bit
sign extension as the condition to SFB instructions.

gcc/testsuite/ChangeLog:
 * gcc.target/riscv/sfb.c: New test.
* gcc.target/riscv/pr113095.c: New test.

Thanks.  I pushed this to the trunk.
jeff


Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-24 Thread Harald Anlauf

Am 24.01.24 um 10:13 schrieb Janne Blomqvist:

On Wed, Jan 24, 2024 at 10:28 AM FX Coudert  wrote:

Now, if
the OS adds cospi() to libm and it's in libm's symbol map, then the
cospi() used by gfortran depends on the search order of the loaded
libraries.


We only include the fallback math functions in libgfortran when they are not 
available on the system. configure detects what is present in the libc being 
targeted, and conditionally compiles the necessary fallback functions (and only 
them).


Exactly. However, there is the (corner?) case when libgfortran has
been compiled, and cospi() not found and thus the fallback
implementation is included, and then later libc is updated to a
version that does provide cospi(). I believe in that case which
version gets used is down to the library search order (i.e. the order
that "ldd /path/to/binary" prints the libs), it will use the first
symbol it finds.  Also, it's not necessary to do some ifdef tricks
with gfortran.map, if a symbol listed there isn't found in the library
it's just ignored. So the *pi() trig functions can be unconditionally
added there, and then depending on whether the target libm includes
those or not they are then included in the exported symbol list.

It's possible to override this to look for specific symbol versions
etc., but that probably goes deep into the weeds of target-specific
stuff (e.g. are we looking for cospi@FBSD_1.7, cospi@GLIBC_X.Y.Z, or
something else?). I'm sure you don't wanna go there.



Isn't this something that could be addressed by
__attribute__(__weakref__)?




Re: [PATCH 2/2] RISC-V/testsuite: Also verify if-conversion runs for pr105314.c

2024-01-24 Thread Jeff Law




On 1/24/24 04:26, Maciej W. Rozycki wrote:

On Tue, 16 Jan 2024, Maciej W. Rozycki wrote:


I don't have a strong opinion on this.  I certainly see Andrew's point, but
it's also the case that if some work earlier in the RTL or gimple pipeline
comes along and compromises the test, then we'd see the failure and deal with
it.  It's pretty standard procedure.


  I'll be happy to add an RTL test case, also for my recent complementary
cset-sext.c addition and maybe other if-conversion pieces recently added.
I think that does not preclude arming pr105314.c with RTL scanning though.


  I have made a buch of testcases as we discussed at the meeting last week
and the RTL parser did not blow up, so I have now submitted them.  See:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643802.html>
and the next two messages (threading broke with this submission for some
reason, probably due to a glitch in my mail client I've seen from time to
time; I guess it's not worth it to get the patch series resubmitted as
they are independent from each other really and can be applied in any
order).

  I haven't heard back from Andrew beyond his initial message, so it's not
clear to me whether he maintains his objection in spite the arguments
given.  Andrew?

  Do we have consensus now to move forward with this change as posted?  I'd
like to get these patches ticked off ASAP.
I think it should move forward.  I think having the RTL tests deals with 
Andrew's concern and the testcase adjustment has value as well.


I ACK's the RTL tests a few minutes ago and we should consider the 1/2 
and 2/2 of the original OK now as well.


Thanks,
Jeff


Re: [PATCH 2/2] libstdc++: Implement P2165R4 changes to std::pair/tuple/etc

2024-01-24 Thread Jonathan Wakely
On Wed, 24 Jan 2024 at 15:24, Patrick Palka  wrote:
>
> On Wed, 24 Jan 2024, Jonathan Wakely wrote:
>
> > On Tue, 23 Jan 2024 at 23:54, Patrick Palka wrote:
> > > diff --git a/libstdc++-v3/include/bits/stl_pair.h 
> > > b/libstdc++-v3/include/bits/stl_pair.h
> > > index b81b479ad43..a9b20fbe7ca 100644
> > > --- a/libstdc++-v3/include/bits/stl_pair.h
> > > +++ b/libstdc++-v3/include/bits/stl_pair.h
> > > @@ -85,12 +85,70 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >/// @cond undocumented
> > >
> > >// Forward declarations.
> > > +  template
> > > +struct pair;
> >
> > We have a compiler bug where a forward declaration without template
> > parameter names causes bad diagnostics later. The compiler seems to
> > try to use the parameter names from the first decl it sees, so we end
> > up with things like  even when there's a name
> > available at the site of the actual error. So I think we should name
> > these _T1 and _T2 here.
>
> Will fix.
>
> >
> > > +
> > >template
> > >  class tuple;
> > >
> > > +  // Declarations of std::array and its std::get overloads, so that
> > > +  // std::tuple_cat can use them if  is included before .
> > > +  // We also declare the other std::get overloads here so that they're
> > > +  // visible to the P2165R4 tuple-like constructors of pair and tuple.
> > > +  template
> > > +struct array;
> > > +
> > >template
> > >  struct _Index_tuple;
> > >
> > > +  template
> > > +constexpr typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&
> > > +get(pair<_Tp1, _Tp2>& __in) noexcept;
> > > +
> > > +  template
> > > +constexpr typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&&
> > > +get(pair<_Tp1, _Tp2>&& __in) noexcept;
> > > +
> > > +  template
> > > +constexpr const typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&
> > > +get(const pair<_Tp1, _Tp2>& __in) noexcept;
> > > +
> > > +  template
> > > +constexpr const typename tuple_element<_Int, pair<_Tp1, 
> > > _Tp2>>::type&&
> > > +get(const pair<_Tp1, _Tp2>&& __in) noexcept;
> > > +
> > > +  template
> > > +constexpr __tuple_element_t<__i, tuple<_Elements...>>&
> > > +get(tuple<_Elements...>& __t) noexcept;
> > > +
> > > +  template
> > > +constexpr const __tuple_element_t<__i, tuple<_Elements...>>&
> > > +get(const tuple<_Elements...>& __t) noexcept;
> > > +
> > > +  template
> > > +constexpr __tuple_element_t<__i, tuple<_Elements...>>&&
> > > +get(tuple<_Elements...>&& __t) noexcept;
> > > +
> > > +  template
> > > +constexpr const __tuple_element_t<__i, tuple<_Elements...>>&&
> > > +get(const tuple<_Elements...>&& __t) noexcept;
> > > +
> > > +  template
> > > +constexpr _Tp&
> > > +get(array<_Tp, _Nm>&) noexcept;
> > > +
> > > +  template
> > > +constexpr _Tp&&
> > > +get(array<_Tp, _Nm>&&) noexcept;
> > > +
> > > +  template
> > > +constexpr const _Tp&
> > > +get(const array<_Tp, _Nm>&) noexcept;
> > > +
> > > +  template
> > > +constexpr const _Tp&&
> > > +get(const array<_Tp, _Nm>&&) noexcept;
> > > +
> > >  #if ! __cpp_lib_concepts
> > >// Concept utility functions, reused in conditionally-explicit
> > >// constructors.
> > > @@ -159,6 +217,46 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >  #endif // lib concepts
> > >  #endif // C++11
> > >
> > > +#if __glibcxx_tuple_like // >= C++23
> > > +  template
> > > +inline constexpr bool __is_tuple_v = false;
> > > +
> > > +  template
> > > +inline constexpr bool __is_tuple_v> = true;
> > > +
> > > +  // TODO: Reuse __is_tuple_like from ?
> > > +  template
> > > +inline constexpr bool __is_tuple_like_v = false;
> > > +
> > > +  template
> > > +inline constexpr bool __is_tuple_like_v> = true;
> > > +
> > > +  template
> > > +inline constexpr bool __is_tuple_like_v> = true;
> > > +
> > > +  template
> > > +inline constexpr bool __is_tuple_like_v> = true;
> > > +
> > > +  // __is_tuple_like_v is defined in .
> > > +
> > > +  template
> > > +concept __tuple_like = __is_tuple_like_v>;
> > > +
> > > +  template
> > > +concept __pair_like = __tuple_like<_Tp> && 
> > > tuple_size_v> == 2;
> > > +
> > > +  template
> > > +concept __eligible_tuple_like
> > > +  = __detail::__different_from<_Tp, _Tuple> && __tuple_like<_Tp>
> > > +   && (tuple_size_v> == tuple_size_v<_Tuple>)
> > > +   && !ranges::__detail::__is_subrange>;
> > > +
> > > +  template
> > > +concept __eligible_pair_like
> > > +  = __detail::__different_from<_Tp, _Pair> && __pair_like<_Tp>
> > > +   && !ranges::__detail::__is_subrange>;
> > > +#endif // C++23
> > > +
> > >template class __pair_base
> > >{
> > >  #if __cplusplus >= 201103L && ! __cpp_lib_concepts
> > > @@ -295,6 +393,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >   return false;
> > >  #endif
> > > }
> > > +
> > > +#if __glibcxx_tuple_like // >= C++23
> > > +  template
> > > +   static constexpr bool
> > > +   _S_constructible_from_pair_

[PATCH v2 0/2] libatomic: AArch64 rcpc3 128-bit atomic operation enablement

2024-01-24 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing both the Load-Acquire RCpc Pair
Ordered, and Store-Release Pair Ordered operations in the form of
LDIAPP and STILP.

In light of this, continuing on from previously-proposed Libatomic
enablement work [1], this patch series therefore makes the following
changes to Libatomic:

  1. Extend the number of allowed ifunc alternatives to 4.
  2. Add LDIAPP and STILP instructions to 16-byte atomic operations.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643841.html

Victor Do Nascimento (2):
  libatomic: Increase max IFUNC_NCOND(N) from 3 to 4.
  libatomic: Add rcpc3 128-bit atomic operations for AArch64

 libatomic/Makefile.am|   6 +-
 libatomic/Makefile.in|  22 ++--
 libatomic/acinclude.m4   |  19 
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 102 ++-
 libatomic/config/linux/aarch64/host-config.h |  33 +-
 libatomic/configure  |  59 ++-
 libatomic/configure.ac   |   1 +
 libatomic/libatomic_i.h  |  18 
 9 files changed, 243 insertions(+), 20 deletions(-)

-- 
2.42.0



[PATCH v2 1/2] libatomic: Increase max IFUNC_NCOND(N) from 3 to 4.

2024-01-24 Thread Victor Do Nascimento
libatomic/ChangeLog:
* libatomic_i.h: Add GEN_SELECTOR implementation for
IFUNC_NCOND(N) == 4.
---
 libatomic/libatomic_i.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h
index 861a22da152..0a854fd908c 100644
--- a/libatomic/libatomic_i.h
+++ b/libatomic/libatomic_i.h
@@ -275,6 +275,24 @@ bool libat_is_lock_free (size_t, void *) MAN(is_lock_free);
return C3(libat_,X,_i3);\
  return C2(libat_,X);  \
}
+# elif IFUNC_NCOND(N) == 4
+#  define GEN_SELECTOR(X)  \
+   extern typeof(C2(libat_,X)) C3(libat_,X,_i1) HIDDEN;\
+   extern typeof(C2(libat_,X)) C3(libat_,X,_i2) HIDDEN;\
+   extern typeof(C2(libat_,X)) C3(libat_,X,_i3) HIDDEN;\
+   extern typeof(C2(libat_,X)) C3(libat_,X,_i4) HIDDEN;\
+   static typeof(C2(libat_,X)) * C2(select_,X) (IFUNC_RESOLVER_ARGS) \
+   {   \
+ if (IFUNC_COND_1) \
+   return C3(libat_,X,_i1);\
+ if (IFUNC_COND_2) \
+   return C3(libat_,X,_i2);\
+ if (IFUNC_COND_3) \
+   return C3(libat_,X,_i3);\
+ if (IFUNC_COND_4) \
+   return C3(libat_,X,_i4);\
+ return C2(libat_,X);  \
+   }
 # else
 #  error "Unsupported number of ifunc alternatives."
 # endif
-- 
2.42.0



[PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-24 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.

These operations are single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.

libatomic/ChangeLog:

  * configure.ac: Add call to LIBAT_TEST_FEAT_LRCPC3() test.
  * configure: Regenerate.
  * config/linux/aarch64/host-config.h (HAS_LRCPC3): New.
  (has_rcpc3): Likewise.
  (HWCAP2_LRCPC3): Likewise.
  * config/linux/aarch64/atomic_16.S (libat_load_16): Add
  LRCPC3 variant.
  (libat_store_16): Likewise.
  * acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LRCPC3): New.
  (HAVE_FEAT_LRCPC3): Likewise
  (ARCH_AARCH64_HAVE_LRCPC3): Likewise.
  * Makefile.am (AM_CPPFLAGS): Conditionally append
  -DHAVE_FEAT_LRCPC3 flag.
---
 libatomic/Makefile.am|   6 +-
 libatomic/Makefile.in|  22 ++--
 libatomic/acinclude.m4   |  19 
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 102 ++-
 libatomic/config/linux/aarch64/host-config.h |  33 +-
 libatomic/configure  |  59 ++-
 libatomic/configure.ac   |   1 +
 8 files changed, 225 insertions(+), 20 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index 0623a0bf2d1..1e5481fa580 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,8 +130,12 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
_$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+AM_CPPFLAGS  =
 if ARCH_AARCH64_HAVE_LSE128
-AM_CPPFLAGS = -DHAVE_FEAT_LSE128
+AM_CPPFLAGS += -DHAVE_FEAT_LSE128
+endif
+if ARCH_AARCH64_HAVE_LRCPC3
+AM_CPPFLAGS+= -DHAVE_FEAT_LRCPC3
 endif
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index cd48fa21334..8e87d12907a 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -89,15 +89,17 @@ POST_UNINSTALL = :
 build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
-@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_1 = $(foreach 
s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
-@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2 = atomic_16.S
-@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(foreach \
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_1
 = -DHAVE_FEAT_LSE128
+@ARCH_AARCH64_HAVE_LRCPC3_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2
 = -DHAVE_FEAT_LRCPC3
+@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(foreach 
s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
+@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = atomic_16.S
+@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_5 = $(foreach \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ s,$(SIZES),$(addsuffix \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ _$(s)_1_.lo,$(SIZEOBJS))) \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ $(addsuffix \
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@ _8_2_.lo,$(SIZEOBJS))
-@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = $(addsuffix 
_8_1_.lo,$(SIZEOBJS))
-@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_5 = $(addsuffix 
_16_1_.lo,$(SIZEOBJS)) \
+@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_6 = $(addsuffix 
_8_1_.lo,$(SIZEOBJS))
+@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_7 = $(addsuffix 
_16_1_.lo,$(SIZEOBJS)) \
 @ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@   $(addsuffix 
_16_2_.lo,$(SIZEOBJS))
 
 subdir = .
@@ -424,7 +426,7 @@ libatomic_la_LDFLAGS = $(libatomic_version_info) 
$(libatomic_version_script) \
$(lt_host_flags) $(libatomic_darwin_rpath)
 
 libatomic_la_SOURCES = gload.c gstore.c gcas.c gexch.c glfree.c lock.c \
-   init.c fenv.c fence.c flag.c $(am__append_2)
+   init.c fenv.c fence.c flag.c $(am__append_4)
 SIZEOBJS = load store cas exch fadd fsub fand fior fxor fnand tas
 EXTRA_libatomic_la_SOURCES = $(addsuffix _n.c,$(SIZEOBJS))
 libatomic_la_DEPENDENCIES = $(libatomic_la_LIBADD) $(libatomic_version_dep)
@@ -450,9 +452,11 @@ all_c_files := $(foreach dir,$(search_path),$(wildcard 
$(dir)/*.c))
 # Then sort through them to find the one we want, and select the first.
 M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
-   _$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
-   $(am__append_4) $

Re: [PATCH] libgccjit: Add convert vector

2024-01-24 Thread David Malcolm
On Thu, 2023-12-21 at 16:01 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds the support for the convert vector internal function.

Thanks for the patch.

> I'll need to double-check that making the decl a register is
> necessary.

I confess I don't know anything about this aspect of the patch, but
presumably you have this working within the rustc backend as well as in
the testcase.

Am I right in thinking that this is an elementwise conversion?  Might
be good to specify this in the docs and header file.

Otherwise looks good to me (with usual comment about updating the ABI
numbering as necessary)

Thanks
Dave



[PATCH] Fix vect_long_mult for aarch64 [PR109705]

2024-01-24 Thread Andrew Pinski
On aarch64, vectorization of `long` multiply can be done if SVE is enabled
or if long is 32bit (ILP32). It can also be done for constants too but there
is no effective target test for that just yet.

Build and tested on aarch64-linux-gnu with no regressions (also tested with SVE 
enabled).

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_vect_long_mult):
Fix aarch64*-*-* checks.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/lib/target-supports.exp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 9ca8355b3e1..178d1a73064 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9090,7 +9090,9 @@ proc check_effective_target_vect_long_mult { } {
  && [check_effective_target_has_arch_pwr10])
 || [is-effective-target arm_neon]
 || ([istarget sparc*-*-*] && [check_effective_target_ilp32])
-|| [istarget aarch64*-*-*]
+|| ([istarget aarch64*-*-*]
+&& ([check_effective_target_ilp32]
+|| check_effective_target_aarch64_sve]))
 || ([istarget mips*-*-*]
  && [et-is-effective-target mips_msa])
 || ([istarget riscv*-*-*]
-- 
2.39.3



Re: [PATCH] aarch64: Fix eh_return for -mtrack-speculation [PR112987]

2024-01-24 Thread Richard Sandiford
Szabolcs Nagy  writes:
> Recent commit introduced a conditional branch in eh_return epilogues
> that is not compatible with speculation tracking:
>
>   commit 426fddcbdad6746fe70e031f707fb07f55dfb405
>   Author: Szabolcs Nagy 
>   CommitDate: 2023-11-27 15:52:48 +
>
>   aarch64: Use br instead of ret for eh_return
>
> gcc/ChangeLog:
>
>   PR target/112987
>   * config/aarch64/aarch64.cc (aarch64_expand_epilogue): Use
>   explicit compare and separate jump with speculation tracking.
> ---
>  gcc/config/aarch64/aarch64.cc | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index e6bd3fd0bb4..e6de62dc02a 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -9879,7 +9879,17 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall)
>is just as correct as retaining the CFA from the body
>of the function.  Therefore, do nothing special.  */
>rtx label = gen_label_rtx ();
> -  rtx x = gen_rtx_EQ (VOIDmode, EH_RETURN_TAKEN_RTX, const0_rtx);
> +  rtx x;
> +  if (aarch64_track_speculation)
> + {
> +   /* Emit an explicit compare, so cc can be tracked.  */
> +   rtx cc_reg = aarch64_gen_compare_reg (EQ,
> + EH_RETURN_TAKEN_RTX,
> + const0_rtx);
> +   x = gen_rtx_EQ (GET_MODE (cc_reg), cc_reg, const0_rtx);
> + }
> +  else
> + x = gen_rtx_EQ (VOIDmode, EH_RETURN_TAKEN_RTX, const0_rtx);

It looks from a quick scan like we already have 3 instances of
this kind of construct.  Would you mind factoring them out into
a helper?

E.g. (strawman):

static rtx
aarch64_gen_compare_zero_and_branch (rtx_code code, rtx x, rtx_label *label)
{
}

that returns the SET pattern.  The caller can then emit the pattern
using whichever interface is appropriate.

Thanks,
Richard

>x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
>   gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
>rtx jump = emit_jump_insn (gen_rtx_SET (pc_rtx, x));


Re: [PATCH] aarch64: Fix __builtin_apply with -mgeneral-regs-only [PR113486]

2024-01-24 Thread Richard Sandiford
Andrew Pinski  writes:
> The problem here is the builtin apply mechanism thinks the FP registers
> are to be used due to get_raw_arg_mode not returning VOIDmode. This
> fixes that oversight and the backend now returns VOIDmode for non-general-regs
> if TARGET_GENERAL_REGS_ONLY is true.
>
> Built and tested for aarch64-linux-gnu with no regressions.
>
>   PR target/113486
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_get_reg_raw_mode): For
>   TARGET_GENERAL_REGS_ONLY, return VOIDmode for non-GP_REGNUM_P regno.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/builtin_apply-1.c: New test.

OK, thanks.

Richard

> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc  |  4 
>  gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c | 12 
>  2 files changed, 16 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index e6bd3fd0bb4..a838cbba51d 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -7221,6 +7221,10 @@ aarch64_function_arg_boundary (machine_mode mode, 
> const_tree type)
>  static fixed_size_mode
>  aarch64_get_reg_raw_mode (int regno)
>  {
> +  /* Don't use any non GP registers for __builtin_apply and
> + __builtin_return if general registers only mode is requested. */
> +  if (TARGET_GENERAL_REGS_ONLY && !GP_REGNUM_P (regno))
> +return as_a  (VOIDmode);
>if (TARGET_SVE && FP_REGNUM_P (regno))
>  /* Don't use the SVE part of the register for __builtin_apply and
> __builtin_return.  The SVE registers aren't used by the normal PCS,
> diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c 
> b/gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c
> new file mode 100644
> index 000..d70abe037d2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mgeneral-regs-only" } */
> +/* PR target/113486 */
> +
> +
> +/* __builtin_apply should not use FP registers if 
> +   general registers only mode is requested. */
> +void
> +foo (void)
> +{
> +  __builtin_apply (foo, 0, 0);
> +}


Re: [PATCH] testsuite: require libc sym for -shared

2024-01-24 Thread Jeff Law




On 1/23/24 00:15, Alexandre Oliva wrote:


Targets whose binutils support -shared, but that don't have a shared
libc, and that can't add PDC (non-PIC) to shared libraries, may
succeed at the effective target test for -shared, because it brings
nothing from libc, but tests that rely on -shared and that use bits
from libc, such as g++.dg/lto/pr108772, fail despite requiring the
shared effective target.

Extend the effective target test to bring malloc() from libc, that's
likely to be present in libc and bring a substantial amount of code if
no shared libc is available.

Regstrapped on x86_64-linux-gnu, also tested on aarch64-elf with gcc-13,
where the problem was observed.  Ok to install?


for  gcc/testsuite/ChangeLog

* lib/target-supports.exp (check_effective_target_shared):
Check for a static-only libc.

OK
jeff


Re: [PATCH] testsuite: no dfp run without dfprt

2024-01-24 Thread Jeff Law




On 1/23/24 00:13, Alexandre Oliva wrote:


newlib-src/libc/include/sys/fenv.h doesn't define the FE_* macros that
libgcc expects to enable decimal float support.  Only after newlib is
configured and built does an overriding header that defines those
macros become available in objdir//newlib/targ-include/, but
by then, libgcc has already been built without dfp and libbid.

This has exposed a number of tests that attempt to link dfp programs
without requiring a dfprt effective target.

dfp.exp already skips if dfp support is missing altogether, and sets
the default to compile rather than run if dfp support is present in
the compiler but missing in the runtime libraries.

However, some of the dfp tests override the default without requiring
dfprt.  Drop the overriders where reasonable, and add the explicit
requirement elsewhere.

Regstrapped on x86_64-linux-gnu; also tested on aarch64-elf with gcc-13,
where the problem was observed.  Ok to install?


for  gcc/testsuite/ChangeLog

* c-c++-common/dfp/pr36800.c: Drop dg-do overrider.
* c-c++-common/dfp/pr39034.c: Likewise.
* c-c++-common/dfp/pr39035.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d32-1.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d32-2.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d64-1.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d64-2.c: Likewise.
* gcc.dg/dfṕ/builtin-tgmath-dfp.c: Likewise.
* gcc.dg/dfṕ/c23-float-dfp-4.c: Likewise.
* gcc.dg/dfṕ/c23-float-dfp-5.c: Likewise.
* gcc.dg/dfṕ/c23-float-dfp-6.c: Likewise.
* gcc.dg/dfṕ/c23-float-dfp-7.c: Likewise.
* gcc.dg/dfp/pr108068.c: Likewise.
* gcc.dg/dfp/pr97439.c: Likewise.
* g++.dg/compat/decimal/pass-1_main.C: Require dfprt.
* g++.dg/compat/decimal/pass-2_main.C: Likewise.
* g++.dg/compat/decimal/pass-3_main.C: Likewise.
* g++.dg/compat/decimal/pass-4_main.C: Likewise.
* g++.dg/compat/decimal/pass-5_main.C: Likewise.
* g++.dg/compat/decimal/pass-6_main.C: Likewise.
* g++.dg/compat/decimal/return-1_main.C: Likewise.
* g++.dg/compat/decimal/return-2_main.C: Likewise.
* g++.dg/compat/decimal/return-3_main.C: Likewise.
* g++.dg/compat/decimal/return-4_main.C: Likewise.
* g++.dg/compat/decimal/return-5_main.C: Likewise.
* g++.dg/compat/decimal/return-6_main.C: Likewise.
* g++.dg/eh/dfp-1.C: Likewise.
* g++.dg/eh/dfp-2.C: Likewise.
* g++.dg/eh/dfp-saves-aarch64.C: Likewise.
* gcc.c-torture/execute/pr80692.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d128-1.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d128-2.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d128-3.c: Likewise.
* gcc.dg/dfṕ/bid-non-canonical-d128-4.c: Likewise.

OK
jeff


Re: [PATCH] libgccjit: Fix float playback for cross-compilation

2024-01-24 Thread David Malcolm
On Thu, 2024-01-11 at 18:42 -0500, Antoni Boucher wrote:
> Hi.
> This patch fixes the bug 113343.
> I'm wondering if there's a better solution than using mpfr.
> The only other solution I found is real_from_string, but that seems
> overkill to convert the number to a string.
> I could not find a better way to create a real value from a host
> double.

I took a look, and I don't see a better way; it seems weird to go
through a string stage.  Ideally there would be a
real_from_host_double, but I don't see one.

Is there a cross-platform way to directly access the representation of
a host double?

> If there's no solution, do we lose some precision by using mpfr?
> Running Rust's core library tests, there was a difference of one
> decimal, so I'm wondering if there's some lost precision, or if it's
> just because those tests don't work on m68k which was my test target.

Sorry, can you clarify what you mean by "a difference of one decimal"
above?

> Also, I'm not sure how to write a test this fix. Any ideas?

I think we don't need cross-compilation-specific tests, we should just
use and/or extend the existing coverage for
gcc_jit_context_new_rvalue_from_double e.g. in test-constants.c and
test-types.c

We probably should have test coverage for "awkward" values; we already
have coverage for DBL_MIN and DBL_MAX, but we don't yet have test
coverage for:
* quiet/signaling NaN
* +ve/-ve inf
* -ve zero

Thanks
Dave



Re: [PATCH] Fix vect_long_mult for aarch64 [PR109705]

2024-01-24 Thread Richard Sandiford
Andrew Pinski  writes:
> On aarch64, vectorization of `long` multiply can be done if SVE is enabled
> or if long is 32bit (ILP32). It can also be done for constants too but there
> is no effective target test for that just yet.
>
> Build and tested on aarch64-linux-gnu with no regressions (also tested with 
> SVE enabled).
>
> gcc/testsuite/ChangeLog:
>
>   * lib/target-supports.exp (check_effective_target_vect_long_mult):
>   Fix aarch64*-*-* checks.

OK, thanks!

Richard

> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/lib/target-supports.exp | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 9ca8355b3e1..178d1a73064 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -9090,7 +9090,9 @@ proc check_effective_target_vect_long_mult { } {
> && [check_effective_target_has_arch_pwr10])
>|| [is-effective-target arm_neon]
>|| ([istarget sparc*-*-*] && [check_effective_target_ilp32])
> -  || [istarget aarch64*-*-*]
> +  || ([istarget aarch64*-*-*]
> +  && ([check_effective_target_ilp32]
> +  || check_effective_target_aarch64_sve]))
>|| ([istarget mips*-*-*]
> && [et-is-effective-target mips_msa])
>|| ([istarget riscv*-*-*]


Re: [PATCH] jit, Darwin: Implement library exports list.

2024-01-24 Thread David Malcolm
On Tue, 2024-01-16 at 11:10 +, Iain Sandoe wrote:
> Tested on x86_64, i686 Darwin and x86_64 Linux,
> OK for trunk? when ?
> thanks,
> Iain

Hi Iain, thanks for the patch.

I'll have to defer to your Darwin expertise here; given that you've
tested it on the above configurations I'll assume it's correct, but...

> 
> --- 8< ---
> 
> Currently, we have no exports list for libgccjit, which means that
> all symbols are exported, including those from libstdc++ which is
> linked statically into the lib.  This causes failures when the
> shared libstdc++ is used but some c++ symbols are satisfied from
> libgccjit.
> 
> This implements an export file for Darwin (which is currently
> manually created by cross-checking libgccjit.map).

...I'm a little nervous about this; Antoyo has a number of out-of-tree
patches we're working towards merging, and almost all of these touch
libgccjit.map.


>   Ideally we'd
> script this, at some point.  

Yes.  How about a Python 3 script (inside "contrib", or in "gcc/jit")
that would do that.  Then whenever a patch touches libgccjit.map we'd
run that script to regenerate libgccjit.exp in the source tree.  I can
have a go at writing it, if you think that's the best way to go.

I take it .exp is the standard extension for these exports file in the
Darwin world.  If so, it's a shame (but unavoidable) that it clashes
with the existing uses of .exp in our source tree for our
expect/Tcl/DejaGnu sources.

I think the patch as-is is OK for trunk now, assuming that you've
tested it as above.

Dave


> Update libtool current and age to
> reflect the current ABI version (we are not bumping the SO name
> at this stage).
> 
> This fixes a number of new failures in jit testing.
> 
> gcc/jit/ChangeLog:
> 
> * Make-lang.in: Implement exports list, and use a shared
> libgcc.
> * libgccjit.exp: New file.
> 
> Signed-off-by: Iain Sandoe 
> ---
>  gcc/jit/Make-lang.in  |  38 ---
>  gcc/jit/libgccjit.exp | 229
> ++
>  2 files changed, 251 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/jit/libgccjit.exp
> 
> diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in
> index b1f0ce73e12..52dc2c24908 100644
> --- a/gcc/jit/Make-lang.in
> +++ b/gcc/jit/Make-lang.in
> @@ -55,7 +55,10 @@ else
>  
>  ifneq (,$(findstring darwin,$(host)))
>  
> -LIBGCCJIT_AGE = 1
> +LIBGCCJIT_CURRENT = 26
> +LIBGCCJIT_REVISION = 0
> +LIBGCCJIT_AGE = 26
> +LIBGCCJIT_COMPAT = 0
>  LIBGCCJIT_BASENAME = libgccjit
>  
>  LIBGCCJIT_SONAME = \
> @@ -63,15 +66,15 @@ LIBGCCJIT_SONAME = \
>  LIBGCCJIT_FILENAME =
> $(LIBGCCJIT_BASENAME).$(LIBGCCJIT_VERSION_NUM).dylib
>  LIBGCCJIT_LINKER_NAME = $(LIBGCCJIT_BASENAME).dylib
>  
> -# Conditionalize the use of the LD_VERSION_SCRIPT_OPTION and
> -# LD_SONAME_OPTION depending if configure found them, using $(if)
> -# We have to define a COMMA here, otherwise the commas in the "true"
> -# result are treated as separators by the $(if).
> -COMMA := ,
> +# Darwin does not have a version script option. Exported symbols are
> controlled
> +# by the following, and library versioning is done using libtool.
>  LIBGCCJIT_VERSION_SCRIPT_OPTION = \
> -   $(if $(LD_VERSION_SCRIPT_OPTION),\
> - -
> Wl$(COMMA)$(LD_VERSION_SCRIPT_OPTION)$(COMMA)$(srcdir)/jit/libgccjit.
> map)
> +  -Wl,-exported_symbols_list,$(srcdir)/jit/libgccjit.exp
>  
> +# Conditionalize the use of  LD_SONAME_OPTION on configure finding
> it, using
> +# $(if).  We have to define a COMMA here, otherwise the commas in
> the "true"
> +# result are treated as separators by the $(if).
> +COMMA := ,
>  LIBGCCJIT_SONAME_OPTION = \
> $(if $(LD_SONAME_OPTION), \
>  -
> Wl$(COMMA)$(LD_SONAME_OPTION)$(COMMA)$(LIBGCCJIT_SONAME))
> @@ -143,15 +146,18 @@ ifneq (,$(findstring mingw,$(target)))
>  # Create import library
>  LIBGCCJIT_EXTRA_OPTS = -Wl,--out-implib,$(LIBGCCJIT_IMPORT_LIB)
>  else
> -
>  ifneq (,$(findstring darwin,$(host)))
> -# TODO : Construct a Darwin-style symbol export file.
> -LIBGCCJIT_EXTRA_OPTS = -Wl,-
> compatibility_version,$(LIBGCCJIT_VERSION_NUM) \
> -   -Wl,-
> current_version,$(LIBGCCJIT_VERSION_NUM).$(LIBGCCJIT_MINOR_NUM).$(LIB
> GCCJIT_AGE) \
> -   $(LIBGCCJIT_VERSION_SCRIPT_OPTION) \
> -   $(LIBGCCJIT_SONAME_OPTION)
> +LIBGCCJIT_VERS =
> $(LIBGCCJIT_CURRENT).$(LIBGCCJIT_REVISION).$(LIBGCCJIT_AGE)
> +LIBGCCJIT_EXTRA_OPTS = -Wl,-current_version,$(LIBGCCJIT_VERS) \
> + -Wl,-compatibility_version,$(LIBGCCJIT_COMPAT) \
> +  $(LIBGCCJIT_VERSION_SCRIPT_OPTION) $(LIBGCCJIT_SONAME_OPTION)
> +# Use the default (shared) libgcc.
> +JIT_LDFLAGS = $(filter-out -static-libgcc, $(LDFLAGS))
> +ifeq (,$(findstring darwin8,$(host)))
> +JIT_LDFLAGS += -Wl,-rpath,@loader_path
> +endif
>  else
> -
> +JIT_LDFLAGS = $(LDFLAGS)
>  LIBGCCJIT_EXTRA_OPTS = $(LIBGCCJIT_VERSION_SCRIPT_OPTION) \
> $(LIBGCCJIT_SONAME_OPTION)
>  endif
> @@ -170,7 +176,7 @@ $(LIBGCCJIT_FILENAME): $(jit_OBJS) \
> $(LIBDEPS) $(srcdir)/jit

Re: [PATCH] testsuite, jit: Stabilize error output.

2024-01-24 Thread David Malcolm
On Tue, 2024-01-16 at 11:12 +, Iain Sandoe wrote:
> Tested on x86_64, i686 Darwin, x86_64 Linux,
> OK for trunk? When?
> thanks
> Iain

Thanks; looks good to me for trunk.

Given that the scope is just the jit testsuite and that you've tested
it on 3 configurations (and presumably made use of this when debugging
the other issue), I think this is OK to go in now.

Dave

> 
> --- 8< ---
> 
> Currently when a test fails, we print out a lot of information,
> this includes items that are not stable between invocations (e.g.
> the PID for the executable).  That makes automated comparisons
> between test runs flag any persistent fails as new ones each time
> which is not usually what is wanted.
> 
> This patch amends the error output to drop the variable portion
> of the message and retain items that should only change if the
> failure mode changes.
> 
> gcc/testsuite/ChangeLog:
> 
> * jit.dg/jit.exp: Filter error output to remove per-run
> variable content.
> 
> Signed-off-by: Iain Sandoe 
> ---
>  gcc/testsuite/jit.dg/jit.exp | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/testsuite/jit.dg/jit.exp
> b/gcc/testsuite/jit.dg/jit.exp
> index 286cfa8192a..893ff5f6dd0 100644
> --- a/gcc/testsuite/jit.dg/jit.exp
> +++ b/gcc/testsuite/jit.dg/jit.exp
> @@ -94,25 +94,34 @@ proc parse_valgrind_logfile {name logfile} {
>  # unexpected exits.
>  
>  proc verify_exit_status { executable wres } {
> -    lassign $wres pid spawnid os_error_flag value
> +    set extra [lassign $wres pid spawnid os_error_flag value]
>  verbose "pid: $pid" 3
>  verbose "spawnid: $spawnid" 3
>  verbose "os_error_flag: $os_error_flag" 3
>  verbose "value: $value" 3
>  
>  # Detect segfaults etc:
> -    if { [llength $wres] > 4 } {
> -   if { [lindex $wres 4] == "CHILDKILLED" } {
> -   fail "$executable killed: $wres"
> +    set len [llength $extra]
> +    if { $len >= 1 } {
> +   if { [lindex $extra 0] == "CHILDKILLED" } {
> +   set reason "Unknown Reason"
> +   set detail "No Details"
> +   if { $len >= 2 } {
> +   set reason [lindex $extra 1]
> +   if { $len >= 3 } {
> +   set detail [lindex $extra 2]
> +   }
> +   }
> +   fail "$executable killed: $reason $detail"
>     return
> }
>  }
>  if { $os_error_flag != 0 } {
> -   fail "$executable: OS error: $wres"
> +   fail "$executable: OS error: $os_error_flag $extra"
> return
>  }
>  if { $value != 0 } {
> -   fail "$executable: non-zero exit code: $wres"
> +   fail "$executable: non-zero exit code: $value $extra"
> return
>  }
>  pass "$executable exited cleanly"



Re: [PATCH] Fortran: passing of optional dummies to elemental procedures [PR113377]

2024-01-24 Thread Mikael Morin

Le 23/01/2024 à 21:36, Harald Anlauf a écrit :

Dear all,

here's the second part of a series for the treatment of missing
optional arguments passed to optional dummies, now fixing the
case that the latter procedures are elemental.  Adjustments
were necessary when the missing dummy has the VALUE attribute.

I factored the code for the treatment of VALUE, hoping that the
monster loop in gfc_conv_procedure_call will become slightly
easier to overlook.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Looks good, but...


Thanks,
Harald





diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 128add47516..0fac0523670 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc



@@ -6392,12 +6479,23 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
}
}

+ /* Scalar dummy arguments of intrinsic type with VALUE attribute.  */
+ if (fsym
+ && fsym->attr.value
+ && !fsym->attr.dimension
+ // && (fsym->ts.type != BT_CHARACTER
+ //  || gfc_length_one_character_type_p (&fsym->ts))


... please remove the commented code here.  OK with that change.
The !fsym->attr.dimension condition could be removed as well as we are 
in the case of an elemental procedure at this point, but it doesn't harm 
if you prefer keeping it.

Thanks for the patch.

Mikael


+ && fsym->ts.type != BT_DERIVED
+ && fsym->ts.type != BT_CLASS)
+   conv_dummy_value (&parmse, e, fsym, optionalargs);
+
  /* If we are passing an absent array as optional dummy to an
 elemental procedure, make sure that we pass NULL when the data
 pointer is NULL.  We need this extra conditional because of
 scalarization which passes arrays elements to the procedure,
 ignoring the fact that the array can be absent/unallocated/...  */
- if (ss->info->can_be_null_ref && ss->info->type != GFC_SS_REFERENCE)
+ else if (ss->info->can_be_null_ref
+  && ss->info->type != GFC_SS_REFERENCE)
{
  tree descriptor_data;





Re: [PATCH 2/2] libstdc++: Implement P2165R4 changes to std::pair/tuple/etc

2024-01-24 Thread Patrick Palka
On Wed, 24 Jan 2024, Jonathan Wakely wrote:

> On Wed, 24 Jan 2024 at 15:24, Patrick Palka  wrote:
> >
> > On Wed, 24 Jan 2024, Jonathan Wakely wrote:
> >
> > > On Tue, 23 Jan 2024 at 23:54, Patrick Palka wrote:
> > > > diff --git a/libstdc++-v3/include/bits/stl_pair.h 
> > > > b/libstdc++-v3/include/bits/stl_pair.h
> > > > index b81b479ad43..a9b20fbe7ca 100644
> > > > --- a/libstdc++-v3/include/bits/stl_pair.h
> > > > +++ b/libstdc++-v3/include/bits/stl_pair.h
> > > > @@ -85,12 +85,70 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >/// @cond undocumented
> > > >
> > > >// Forward declarations.
> > > > +  template
> > > > +struct pair;
> > >
> > > We have a compiler bug where a forward declaration without template
> > > parameter names causes bad diagnostics later. The compiler seems to
> > > try to use the parameter names from the first decl it sees, so we end
> > > up with things like  even when there's a name
> > > available at the site of the actual error. So I think we should name
> > > these _T1 and _T2 here.
> >
> > Will fix.
> >
> > >
> > > > +
> > > >template
> > > >  class tuple;
> > > >
> > > > +  // Declarations of std::array and its std::get overloads, so that
> > > > +  // std::tuple_cat can use them if  is included before .
> > > > +  // We also declare the other std::get overloads here so that they're
> > > > +  // visible to the P2165R4 tuple-like constructors of pair and tuple.
> > > > +  template
> > > > +struct array;
> > > > +
> > > >template
> > > >  struct _Index_tuple;
> > > >
> > > > +  template
> > > > +constexpr typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&
> > > > +get(pair<_Tp1, _Tp2>& __in) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr typename tuple_element<_Int, pair<_Tp1, _Tp2>>::type&&
> > > > +get(pair<_Tp1, _Tp2>&& __in) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr const typename tuple_element<_Int, pair<_Tp1, 
> > > > _Tp2>>::type&
> > > > +get(const pair<_Tp1, _Tp2>& __in) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr const typename tuple_element<_Int, pair<_Tp1, 
> > > > _Tp2>>::type&&
> > > > +get(const pair<_Tp1, _Tp2>&& __in) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr __tuple_element_t<__i, tuple<_Elements...>>&
> > > > +get(tuple<_Elements...>& __t) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr const __tuple_element_t<__i, tuple<_Elements...>>&
> > > > +get(const tuple<_Elements...>& __t) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr __tuple_element_t<__i, tuple<_Elements...>>&&
> > > > +get(tuple<_Elements...>&& __t) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr const __tuple_element_t<__i, tuple<_Elements...>>&&
> > > > +get(const tuple<_Elements...>&& __t) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr _Tp&
> > > > +get(array<_Tp, _Nm>&) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr _Tp&&
> > > > +get(array<_Tp, _Nm>&&) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr const _Tp&
> > > > +get(const array<_Tp, _Nm>&) noexcept;
> > > > +
> > > > +  template
> > > > +constexpr const _Tp&&
> > > > +get(const array<_Tp, _Nm>&&) noexcept;
> > > > +
> > > >  #if ! __cpp_lib_concepts
> > > >// Concept utility functions, reused in conditionally-explicit
> > > >// constructors.
> > > > @@ -159,6 +217,46 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >  #endif // lib concepts
> > > >  #endif // C++11
> > > >
> > > > +#if __glibcxx_tuple_like // >= C++23
> > > > +  template
> > > > +inline constexpr bool __is_tuple_v = false;
> > > > +
> > > > +  template
> > > > +inline constexpr bool __is_tuple_v> = true;
> > > > +
> > > > +  // TODO: Reuse __is_tuple_like from ?
> > > > +  template
> > > > +inline constexpr bool __is_tuple_like_v = false;
> > > > +
> > > > +  template
> > > > +inline constexpr bool __is_tuple_like_v> = 
> > > > true;
> > > > +
> > > > +  template
> > > > +inline constexpr bool __is_tuple_like_v> = true;
> > > > +
> > > > +  template
> > > > +inline constexpr bool __is_tuple_like_v> = true;
> > > > +
> > > > +  // __is_tuple_like_v is defined in .
> > > > +
> > > > +  template
> > > > +concept __tuple_like = __is_tuple_like_v>;
> > > > +
> > > > +  template
> > > > +concept __pair_like = __tuple_like<_Tp> && 
> > > > tuple_size_v> == 2;
> > > > +
> > > > +  template
> > > > +concept __eligible_tuple_like
> > > > +  = __detail::__different_from<_Tp, _Tuple> && __tuple_like<_Tp>
> > > > +   && (tuple_size_v> == tuple_size_v<_Tuple>)
> > > > +   && !ranges::__detail::__is_subrange>;
> > > > +
> > > > +  template
> > > > +concept __eligible_pair_like
> > > > +  = __detail::__different_from<_Tp, _Pair> && __pair_like<_Tp>
> > > > +   && !ranges::__detail::__is_subrange>;
> > > > +#endif // C++23
> > > > +
> > > >template class __pair_base

Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-24 Thread Steve Kargl
On Wed, Jan 24, 2024 at 11:13:05AM +0200, Janne Blomqvist wrote:
> On Wed, Jan 24, 2024 at 10:28 AM FX Coudert  wrote:
> > > Now, if
> > > the OS adds cospi() to libm and it's in libm's symbol map, then the
> > > cospi() used by gfortran depends on the search order of the loaded
> > > libraries.
> >
> > We only include the fallback math functions in libgfortran when they are 
> > not available on the system. configure detects what is present in the libc 
> > being targeted, and conditionally compiles the necessary fallback functions 
> > (and only them).
> 
> Exactly. However, there is the (corner?) case when libgfortran has
> been compiled, and cospi() not found and thus the fallback
> implementation is included, and then later libc is updated to a
> version that does provide cospi(). I believe in that case which
> version gets used is down to the library search order (i.e. the order
> that "ldd /path/to/binary" prints the libs), it will use the first
> symbol it finds.  Also, it's not necessary to do some ifdef tricks
> with gfortran.map, if a symbol listed there isn't found in the library
> it's just ignored. So the *pi() trig functions can be unconditionally
> added there, and then depending on whether the target libm includes
> those or not they are then included in the exported symbol list.
> 
> It's possible to override this to look for specific symbol versions
> etc., but that probably goes deep into the weeds of target-specific
> stuff (e.g. are we looking for cospi@FBSD_1.7, cospi@GLIBC_X.Y.Z, or
> something else?). I'm sure you don't wanna go there.
> 

Ah, so that's the part I was missing.  I was under the impression
that if a symbol appears in a libraries symbol map, then the
library had to contain a function by that name.  If the loader
ignores symbols for a missing function, then yes, I think I can
get rid of the indirection via _gfortran_cospi_r4().  It will
take a few days for me to redesign this, which shouldn't be too
problematic in that GCC is in stage 4 and this is neither a
regression or doc fix.

Janne, FX, Harald, thanks for taking a peek.

-- 
Steve


[no subject]

2024-01-24 Thread Andi Kleen
This version addresses all the feedback so far (Thanks!).  The largest
change is support for using [[musttail]] in C23, not just C++.

-Andi



[PATCH v2 1/5] Improve must tail in RTL backend

2024-01-24 Thread Andi Kleen
- Give error messages for all causes of non sibling call generation
- Don't override choices of other non sibling call checks with
must tail. This causes ICEs. The must tail attribute now only
overrides flag_optimize_sibling_calls locally.
- Error out when tree-tailcall failed to mark a must-tail call
sibcall. In this case it doesn't know the true reason and only gives
a vague message (this could be improved, but it's already useful without
that) tree-tailcall usually fails without optimization, so must
adjust the existing must-tail plugin test to specify -O2.
---
 gcc/calls.cc  | 31 +--
 .../gcc.dg/plugin/must-tail-call-1.c  |  1 +
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 01f447347437..3115807b7788 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -2650,7 +2650,9 @@ expand_call (tree exp, rtx target, int ignore)
   /* The type of the function being called.  */
   tree fntype;
   bool try_tail_call = CALL_EXPR_TAILCALL (exp);
-  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
+  /* tree-tailcall decided not to do tail calls. Error for the musttail case.  
*/
+  if (!try_tail_call)
+  maybe_complain_about_tail_call (exp, "cannot tail-call: other reasons");
   int pass;
 
   /* Register in which non-BLKmode value will be returned,
@@ -3021,10 +3023,22 @@ expand_call (tree exp, rtx target, int ignore)
  pushed these optimizations into -O2.  Don't try if we're already
  expanding a call, as that means we're an argument.  Don't try if
  there's cleanups, as we know there's code to follow the call.  */
-  if (currently_expanding_call++ != 0
-  || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
-  || args_size.var
-  || dbg_cnt (tail_call) == false)
+  if (currently_expanding_call++ != 0)
+{
+  maybe_complain_about_tail_call (exp, "cannot tail-call: inside another 
call");
+  try_tail_call = 0;
+}
+  if (!flag_optimize_sibling_calls
+   && !CALL_FROM_THUNK_P (exp)
+   && !CALL_EXPR_MUST_TAIL_CALL (exp))
+try_tail_call = 0;
+  if (args_size.var)
+{
+  /* ??? correct message?  */
+  maybe_complain_about_tail_call (exp, "cannot tail-call: stack space 
needed");
+  try_tail_call = 0;
+}
+  if (dbg_cnt (tail_call) == false)
 try_tail_call = 0;
 
   /* Workaround buggy C/C++ wrappers around Fortran routines with
@@ -3045,15 +3059,12 @@ expand_call (tree exp, rtx target, int ignore)
if (MEM_P (*iter))
  {
try_tail_call = 0;
+   maybe_complain_about_tail_call (exp,
+   "cannot tail-call: hidden string length 
argument");
break;
  }
}
 
-  /* If the user has marked the function as requiring tail-call
- optimization, attempt it.  */
-  if (must_tail_call)
-try_tail_call = 1;
-
   /*  Rest of purposes for tail call optimizations to fail.  */
   if (try_tail_call)
 try_tail_call = can_implement_as_sibling_call_p (exp,
diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c 
b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
index 3a6d4cceaba7..44af361e2925 100644
--- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
@@ -1,4 +1,5 @@
 /* { dg-do compile { target tail_call } } */
+/* { dg-options "-O2" } */
 /* { dg-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 extern void abort (void);
-- 
2.43.0



[PATCH v2 3/5] C: Implement musttail attribute for returns

2024-01-24 Thread Andi Kleen
Implement a C23 clang compatible musttail attribute similar to the earlier
C++ implementation in the C parser.
---
 gcc/c/c-parser.cc | 59 +--
 gcc/c/c-tree.h|  2 +-
 gcc/c/c-typeck.cc | 15 ++--
 3 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2ff..30f3fe042a2b 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1616,6 +1616,11 @@ struct omp_for_parse_data {
   bool fail : 1;
 };
 
+struct attr_state
+{
+  bool musttail_p; // parsed a musttail for return
+};
+
 static bool c_parser_nth_token_starts_std_attributes (c_parser *,
  unsigned int);
 static tree c_parser_std_attribute_specifier_sequence (c_parser *);
@@ -1660,7 +1665,7 @@ static location_t c_parser_compound_statement_nostart 
(c_parser *);
 static void c_parser_label (c_parser *, tree);
 static void c_parser_statement (c_parser *, bool *, location_t * = NULL);
 static void c_parser_statement_after_labels (c_parser *, bool *,
-vec * = NULL);
+vec * = NULL, attr_state = 
{});
 static tree c_parser_c99_block_statement (c_parser *, bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
@@ -6943,6 +6948,28 @@ c_parser_handle_directive_omp_attributes (tree &attrs,
 }
 }
 
+/* Check if STD_ATTR contains a musttail attribute and handle it
+   PARSER is the parser and A is the output attr_state.  */
+
+static tree
+c_parser_handle_musttail (c_parser *parser, tree std_attrs, attr_state &a)
+{
+  if (c_parser_next_token_is_keyword (parser, RID_RETURN))
+{
+  if (lookup_attribute ("gnu", "musttail", std_attrs))
+   {
+ std_attrs = remove_attribute ("gnu", "musttail", std_attrs);
+ a.musttail_p = true;
+   }
+  if (lookup_attribute ("clang", "musttail", std_attrs))
+   {
+ std_attrs = remove_attribute ("clang", "musttail", std_attrs);
+ a.musttail_p = true;
+   }
+}
+  return std_attrs;
+}
+
 /* Parse a compound statement except for the opening brace.  This is
used for parsing both compound statements and statement expressions
(which follow different paths to handling the opening).  */
@@ -6959,6 +6986,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
   bool in_omp_loop_block
 = omp_for_parse_state ? omp_for_parse_state->want_nested_loop : false;
   tree sl = NULL_TREE;
+  attr_state a = {};
 
   if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
 {
@@ -7097,7 +7125,10 @@ c_parser_compound_statement_nostart (c_parser *parser)
= c_parser_nth_token_starts_std_attributes (parser, 1);
   tree std_attrs = NULL_TREE;
   if (have_std_attrs)
-   std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+   {
+ std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+ std_attrs = c_parser_handle_musttail (parser, std_attrs, a);
+   }
   if (c_parser_next_token_is_keyword (parser, RID_CASE)
  || c_parser_next_token_is_keyword (parser, RID_DEFAULT)
  || (c_parser_next_token_is (parser, CPP_NAME)
@@ -7245,7 +7276,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
  last_stmt = true;
  mark_valid_location_for_stdc_pragma (false);
  if (!omp_for_parse_state)
-   c_parser_statement_after_labels (parser, NULL);
+   c_parser_statement_after_labels (parser, NULL, NULL, a);
  else
{
  /* In canonical loop nest form, nested loops can only appear
@@ -7287,15 +7318,18 @@ c_parser_compound_statement_nostart (c_parser *parser)
 /* Parse all consecutive labels, possibly preceded by standard
attributes.  In this context, a statement is required, not a
declaration, so attributes must be followed by a statement that is
-   not just a semicolon.  */
+   not just a semicolon.  Returns an attr_state.  */
 
-static void
+static attr_state
 c_parser_all_labels (c_parser *parser)
 {
+  attr_state a = {};
   bool have_std_attrs;
   tree std_attrs = NULL;
   if ((have_std_attrs = c_parser_nth_token_starts_std_attributes (parser, 1)))
-std_attrs = c_parser_std_attribute_specifier_sequence (parser);
+std_attrs = c_parser_handle_musttail (parser,
+   c_parser_std_attribute_specifier_sequence (parser), a);
+
   while (c_parser_next_token_is_keyword (parser, RID_CASE)
 || c_parser_next_token_is_keyword (parser, RID_DEFAULT)
 || (c_parser_next_token_is (parser, CPP_NAME)
@@ -7317,6 +7351,7 @@ c_parser_all_labels (c_parser *parser)
 }
   else if (have_std_attrs && c_parser_next_token_is (parser, CPP_SEMICOLON))
 c_parser_error (parser, "expected statement");
+  return a;
 }
 
 /* Parse a label (C90 6.6.1, C99 6.8.1, C11 6.8.1).
@@ -7560,11 +7595,1

[PATCH v2 2/5] C++: Support clang compatible [[musttail]] (PR83324)

2024-01-24 Thread Andi Kleen
This patch implements a clang compatible [[musttail]] attribute for
returns.

musttail is useful as an alternative to computed goto for interpreters.
With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.

It turns out that David Malcolm had already implemented middle/backend
support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.

This patch adds a [[gnu::musttail]] attribute for C++ that can be added
to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.

For compatibility it also detects clang::musttail

One problem is that tree-tailcall usually fails when optimization
is disabled, which implies the attribute only really works with
optimization on. But that seems to be a reasonable limitation.

Passes bootstrap and full test
---
 gcc/cp/cp-tree.h|  4 ++--
 gcc/cp/parser.cc| 28 +++-
 gcc/cp/semantics.cc |  6 +++---
 gcc/cp/typeck.cc| 20 ++--
 4 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 60e6dafc5494..bed52e860a00 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7763,7 +7763,7 @@ extern void finish_while_stmt (tree);
 extern tree begin_do_stmt  (void);
 extern void finish_do_body (tree);
 extern void finish_do_stmt (tree, tree, bool, tree, bool);
-extern tree finish_return_stmt (tree);
+extern tree finish_return_stmt (tree, bool = false);
 extern tree begin_for_scope(tree *);
 extern tree begin_for_stmt (tree, tree);
 extern void finish_init_stmt   (tree);
@@ -8275,7 +8275,7 @@ extern tree composite_pointer_type(const 
op_location_t &,
 tsubst_flags_t);
 extern tree merge_types(tree, tree);
 extern tree strip_array_domain (tree);
-extern tree check_return_expr  (tree, bool *, bool *);
+extern tree check_return_expr  (tree, bool *, bool *, bool);
 extern tree spaceship_type (tree, tsubst_flags_t = 
tf_warning_or_error);
 extern tree genericize_spaceship   (location_t, tree, tree, tree);
 extern tree cp_build_binary_op  (const op_location_t &,
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 3748ccd49ff3..5a32804c0201 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2462,7 +2462,7 @@ static tree cp_parser_perform_range_for_lookup
 static tree cp_parser_range_for_member_function
   (tree, tree);
 static tree cp_parser_jump_statement
-  (cp_parser *);
+  (cp_parser *, bool = false);
 static void cp_parser_declaration_statement
   (cp_parser *);
 
@@ -12719,9 +12719,27 @@ cp_parser_statement (cp_parser* parser, tree 
in_statement_expr,
 NULL_TREE, false);
  break;
 
+   case RID_RETURN:
+ {
+   bool musttail_p = false;
+   std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
+   if (lookup_attribute ("", "musttail", std_attrs))
+ {
+   musttail_p = true;
+   std_attrs = remove_attribute ("", "musttail", std_attrs);
+ }
+   // support this for compatibility
+   if (lookup_attribute ("clang", "musttail", std_attrs))
+ {
+   musttail_p = true;
+   std_attrs = remove_attribute ("clang", "musttail", std_attrs);
+ }
+   statement = cp_parser_jump_statement (parser, musttail_p);
+ }
+ break;
+
case RID_BREAK:
case RID_CONTINUE:
-   case RID_RETURN:
case RID_CO_RETURN:
case RID_GOTO:
  std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
@@ -14767,7 +14785,7 @@ cp_parser_init_statement (cp_parser *parser, tree *decl)
   return false;
 }
 
-/* Parse a jump-statement.
+/* Parse a jump-statement. MUSTTAIL_P indicates a musttail attribute.
 
jump-statement:
  break ;
@@ -14785,7 +14803,7 @@ cp_parser_init_statement (cp_parser *parser, tree *decl)
Returns the new BREAK_STMT, CONTINUE_STMT, RETURN_EXPR, or GOTO_EXPR.  */
 
 static tree
-cp_parser_jump_statem

[PATCH v2 5/5] Add documentation for musttail attribute

2024-01-24 Thread Andi Kleen
---
 gcc/doc/extend.texi | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0bc586d120e7..c68d32bed8de 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9867,6 +9867,22 @@ foo (int x, int y)
 @code{y} is not actually incremented and the compiler can but does not
 have to optimize it to just @code{return 42 + 42;}.
 
+@cindex @code{musttail} statement attribute
+@item musttail
+
+The @code{gnu::musttail} or @code{clang::hottail} attribute
+can be applied to a return statement that returns the value
+of a call to indicate that the call must be a tail call
+that does not allocate extra stack space.
+
+@smallexample
+[[gnu::musttail]] return foo();
+@end smallexample
+
+If the compiler cannot generate a tail call it will generate
+an error. Tail calls generally require enabling optimization.
+On some targets they may not be supported.
+
 @end table
 
 @node Attribute Syntax
-- 
2.43.0



  1   2   >