[PATCH] sccvn: Avoid UB in ao_ref_init_from_vn_reference [PR105533]

2024-03-06 Thread Jakub Jelinek
Hi!

When compiling libgcc or on e.g.
int a[64];
int p;

void
foo (void)
{
  int s = 1;
  while (p)
{
  s -= 11;
  a[s] != 0;
}
}
sccvn invokes UB in the compiler as detected by ubsan:
../../gcc/poly-int.h:1089:5: runtime error: left shift of negative value -40
The problem is that we still use C++11..C++17 as the implementation language
and in those C++ versions shifting negative values left is UB (well defined
since C++20) and above in
   offset += op->off << LOG2_BITS_PER_UNIT;
op->off is poly_int64 with -40 value (in libgcc with -8).
I understand the offset_int << LOG2_BITS_PER_UNIT shifts but it is then well
defined during underlying implementation which is done on the uhwi limbs,
but for poly_int64 we use
offset += pop->off * BITS_PER_UNIT;
a few lines earlier and I think that is both more readable in what it
actually does and triggers UB only if there would be signed multiply
overflow.  In the end, the compiler will treat them the same at least at the
RTL level (at least, if not and they aren't the same cost, it should).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-07  Jakub Jelinek  

PR middle-end/105533
* tree-ssa-sccvn.cc (ao_ref_init_from_vn_reference) :
Multiple op->off by BITS_PER_UNIT instead of shifting it left by
LOG2_BITS_PER_UNIT.

--- gcc/tree-ssa-sccvn.cc.jj2024-02-28 22:57:18.318658827 +0100
+++ gcc/tree-ssa-sccvn.cc   2024-03-06 14:52:16.819229719 +0100
@@ -1221,7 +1221,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
  if (maybe_eq (op->off, -1))
max_size = -1;
  else
-   offset += op->off << LOG2_BITS_PER_UNIT;
+   offset += op->off * BITS_PER_UNIT;
  break;
 
case REALPART_EXPR:


Jakub



回复: RE: [PATCH] RISC-V: Fix ICE in riscv vector costs

2024-03-06 Thread juzhe.zh...@rivai.ai
I suggest open an PR with a PR id.



juzhe.zh...@rivai.ai
 
发件人: Demin Han
发送时间: 2024-03-07 15:39
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: kito.cheng; pan2.li; jeffreyalaw
主题: RE: [PATCH] RISC-V: Fix ICE in riscv vector costs
OK.
Which is better for testcase name?
1.  ice-biggestmode.c or
2.  Report a bug and name the testcase with PR id
 
From: juzhe.zh...@rivai.ai  
Sent: 2024年3月7日 15:20
To: Demin Han ; gcc-patches 

Cc: kito.cheng ; pan2.li ; jeffreyalaw 

Subject: Re: [PATCH] RISC-V: Fix ICE in riscv vector costs
 
Could you plz add testcase ? I just noticed you didn't append a testcase (jpeg) 
in this patch.
 


juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2024-03-07 13:54
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; jeffreyalaw
Subject: [PATCH] RISC-V: Fix ICE in riscv vector costs
The following code can result in ICE:
-march=rv64gcv_zba_zbb --param riscv-autovec-lmul=dynamic -O3
 
char *jpeg_difference7_input_buf;
void jpeg_difference7(int *diff_buf) {
  unsigned width;
  int samp, Rb;
  while (--width) {
Rb = samp = *jpeg_difference7_input_buf;
*diff_buf++ = -(int)(samp + (long)Rb >> 1);
  }
}
 
One biggest_mode update missed in one branch and trigger assertion fail.
gcc_assert (biggest_size >= mode_size);
 
Tested On RV64 and no regression.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-costs.cc: Fix ICE
 
Signed-off-by: demin.han 
---
gcc/config/riscv/riscv-vector-costs.cc | 2 ++
1 file changed, 2 insertions(+)
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 7c9840df4e9..f13a1296b31 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -413,6 +413,8 @@ compute_local_live_ranges (
  auto *r = get_live_range (live_ranges, arg);
  gcc_assert (r);
  (*r).second = MAX (point, (*r).second);
+   biggest_mode = get_biggest_mode (
+ biggest_mode, TYPE_MODE (TREE_TYPE (arg)));
}
}
  else
-- 
2.44.0
 
 


RE: [PATCH] RISC-V: Fix ICE in riscv vector costs

2024-03-06 Thread Demin Han
OK.
Which is better for testcase name?

1.  ice-biggestmode.c or

2.  Report a bug and name the testcase with PR id

From: juzhe.zh...@rivai.ai 
Sent: 2024年3月7日 15:20
To: Demin Han ; gcc-patches 

Cc: kito.cheng ; pan2.li ; jeffreyalaw 

Subject: Re: [PATCH] RISC-V: Fix ICE in riscv vector costs

Could you plz add testcase ? I just noticed you didn't append a testcase (jpeg) 
in this patch.


juzhe.zh...@rivai.ai

From: demin.han
Date: 2024-03-07 13:54
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
jeffreyalaw
Subject: [PATCH] RISC-V: Fix ICE in riscv vector costs
The following code can result in ICE:
-march=rv64gcv_zba_zbb --param riscv-autovec-lmul=dynamic -O3

char *jpeg_difference7_input_buf;
void jpeg_difference7(int *diff_buf) {
  unsigned width;
  int samp, Rb;
  while (--width) {
Rb = samp = *jpeg_difference7_input_buf;
*diff_buf++ = -(int)(samp + (long)Rb >> 1);
  }
}

One biggest_mode update missed in one branch and trigger assertion fail.
gcc_assert (biggest_size >= mode_size);

Tested On RV64 and no regression.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc: Fix ICE

Signed-off-by: demin.han 
mailto:demin@starfivetech.com>>
---
gcc/config/riscv/riscv-vector-costs.cc | 2 ++
1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 7c9840df4e9..f13a1296b31 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -413,6 +413,8 @@ compute_local_live_ranges (
  auto *r = get_live_range (live_ranges, arg);
  gcc_assert (r);
  (*r).second = MAX (point, (*r).second);
+   biggest_mode = get_biggest_mode (
+ biggest_mode, TYPE_MODE (TREE_TYPE (arg)));
}
}
  else
--
2.44.0




Re: [PATCH] RISC-V: Fix ICE in riscv vector costs

2024-03-06 Thread juzhe.zh...@rivai.ai
Could you plz add testcase ? I just noticed you didn't append a testcase (jpeg) 
in this patch.



juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2024-03-07 13:54
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; jeffreyalaw
Subject: [PATCH] RISC-V: Fix ICE in riscv vector costs
The following code can result in ICE:
-march=rv64gcv_zba_zbb --param riscv-autovec-lmul=dynamic -O3
 
char *jpeg_difference7_input_buf;
void jpeg_difference7(int *diff_buf) {
  unsigned width;
  int samp, Rb;
  while (--width) {
Rb = samp = *jpeg_difference7_input_buf;
*diff_buf++ = -(int)(samp + (long)Rb >> 1);
  }
}
 
One biggest_mode update missed in one branch and trigger assertion fail.
gcc_assert (biggest_size >= mode_size);
 
Tested On RV64 and no regression.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-costs.cc: Fix ICE
 
Signed-off-by: demin.han 
---
gcc/config/riscv/riscv-vector-costs.cc | 2 ++
1 file changed, 2 insertions(+)
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 7c9840df4e9..f13a1296b31 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -413,6 +413,8 @@ compute_local_live_ranges (
  auto *r = get_live_range (live_ranges, arg);
  gcc_assert (r);
  (*r).second = MAX (point, (*r).second);
+   biggest_mode = get_biggest_mode (
+ biggest_mode, TYPE_MODE (TREE_TYPE (arg)));
}
}
  else
-- 
2.44.0
 
 


Re: [PATCH] RISC-V: Fix ICE in riscv vector costs

2024-03-06 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2024-03-07 13:54
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; jeffreyalaw
Subject: [PATCH] RISC-V: Fix ICE in riscv vector costs
The following code can result in ICE:
-march=rv64gcv_zba_zbb --param riscv-autovec-lmul=dynamic -O3
 
char *jpeg_difference7_input_buf;
void jpeg_difference7(int *diff_buf) {
  unsigned width;
  int samp, Rb;
  while (--width) {
Rb = samp = *jpeg_difference7_input_buf;
*diff_buf++ = -(int)(samp + (long)Rb >> 1);
  }
}
 
One biggest_mode update missed in one branch and trigger assertion fail.
gcc_assert (biggest_size >= mode_size);
 
Tested On RV64 and no regression.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-costs.cc: Fix ICE
 
Signed-off-by: demin.han 
---
gcc/config/riscv/riscv-vector-costs.cc | 2 ++
1 file changed, 2 insertions(+)
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 7c9840df4e9..f13a1296b31 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -413,6 +413,8 @@ compute_local_live_ranges (
  auto *r = get_live_range (live_ranges, arg);
  gcc_assert (r);
  (*r).second = MAX (point, (*r).second);
+   biggest_mode = get_biggest_mode (
+ biggest_mode, TYPE_MODE (TREE_TYPE (arg)));
}
}
  else
-- 
2.44.0
 
 


Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-03-06 Thread Iain Sandoe
Hi Dimitry, folks,

> On 6 Mar 2024, at 23:02, Dimitry Andric  wrote:
> 
> On 6 Mar 2024, at 15:57, FX Coudert  wrote:
>> 
>>> Hmm I recall trying it and finding a problem - was there some different fix 
>>> applied
>>> in the end?
>> 
>> The bug is still open, I don’t think a patch was applied, and I don’t find 
>> any email to the list stating what the problem could be.
> 
> The original patch (https://gcc.gnu.org/bugzilla/attachment.cgi?id=56010) 
> still applies to the master branch.

I have retested this on various Darwin versions and confirm that it fixes the 
bootstrap fail on x86_64-darwin23 and works OK on other versions (including 32b 
hosts). 

+1 for applying this soon.



the second patch really needs to be posted separately to make review easier (if 
we were not in stage 4, I’d say it’s ‘obvious’ anyway).

thanks
Iain


> It turned out there is also a related problem in libcc1plugin.cc and 
> libcp1plugin.cc , which is fixed by 
> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57639 :
> 
> commit 49222b98ac8e30a4a042ada0ece3d7df93f049d2
> Author: Dimitry Andric 
> Date:   2024-03-06T23:46:27+01:00
> 
>Fix libcc1plugin and libc1plugin to use INCLUDE_VECTOR before including
>system.h, instead of directly including , to avoid running into
>poisoned identifiers.
> 
> diff --git a/libcc1/libcc1plugin.cc b/libcc1/libcc1plugin.cc
> index 72d17c3b81c..e64847466f4 100644
> --- a/libcc1/libcc1plugin.cc
> +++ b/libcc1/libcc1plugin.cc
> @@ -32,6 +32,7 @@
> #undef PACKAGE_VERSION
> 
> #define INCLUDE_MEMORY
> +#define INCLUDE_VECTOR
> #include "gcc-plugin.h"
> #include "system.h"
> #include "coretypes.h"
> @@ -69,8 +70,6 @@
> #include "gcc-c-interface.h"
> #include "context.hh"
> 
> -#include 
> -
> using namespace cc1_plugin;
> 
> 
> diff --git a/libcc1/libcp1plugin.cc b/libcc1/libcp1plugin.cc
> index 0eff7c68d29..da68c5d0ac1 100644
> --- a/libcc1/libcp1plugin.cc
> +++ b/libcc1/libcp1plugin.cc
> @@ -33,6 +33,7 @@
> #undef PACKAGE_VERSION
> 
> #define INCLUDE_MEMORY
> +#define INCLUDE_VECTOR
> #include "gcc-plugin.h"
> #include "system.h"
> #include "coretypes.h"
> @@ -71,8 +72,6 @@
> #include "rpc.hh"
> #include "context.hh"
> 
> -#include 
> -
> using namespace cc1_plugin;
> 
> 
> 



RE: [PATCH] c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector types [PR89224]

2024-03-06 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Tuesday, February 20, 2024 7:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) 
> Subject: [PATCH] c++/c-common: Fix convert_vector_to_array_for_subscript
> for qualified vector types [PR89224]
> 
> After r7-987-gf17a223de829cb, the access for the elements of a vector type
> would lose the qualifiers.
> So if we had `constvector[0]`, the type of the element of the array would not
> have const on it.
> This was due to a missing build_qualified_type for the inner type of the 
> vector
> when building the array type.
> We need to add back the call to build_qualified_type and now the access has
> the correct qualifiers. So the overloads and even if it is a lvalue or rvalue 
> is
> correctly done.
> 
> Note we correctly now reject the testcase gcc.dg/pr83415.c which was
> incorrectly accepted after r7-987-gf17a223de829cb.


Ping?

> 
> Built and tested for aarch64-linux-gnu.
> 
>   PR c++/89224
> 
> gcc/c-family/ChangeLog:
> 
>   * c-common.cc (convert_vector_to_array_for_subscript): Call
> build_qualified_type
>   for the inner type.
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (cxx_eval_array_reference): Compare main variants
>   for the vector/array types instead of the types directly.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/torture/vector-subaccess-1.C: New test.
>   * gcc.dg/pr83415.c: Change warning to error.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/c-family/c-common.cc  |  7 +-
>  gcc/cp/constexpr.cc   |  3 ++-
>  .../g++.dg/torture/vector-subaccess-1.C   | 23 +++
>  gcc/testsuite/gcc.dg/pr83415.c|  2 +-
>  4 files changed, 32 insertions(+), 3 deletions(-)  create mode 100644
> gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
> 
> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index
> e15eff698df..884dd9043f9 100644
> --- a/gcc/c-family/c-common.cc
> +++ b/gcc/c-family/c-common.cc
> @@ -8936,6 +8936,7 @@ convert_vector_to_array_for_subscript (location_t
> loc,
>if (gnu_vector_type_p (TREE_TYPE (*vecp)))
>  {
>tree type = TREE_TYPE (*vecp);
> +  tree newitype;
> 
>ret = !lvalue_p (*vecp);
> 
> @@ -8950,8 +8951,12 @@ convert_vector_to_array_for_subscript
> (location_t loc,
>for function parameters.  */
>c_common_mark_addressable_vec (*vecp);
> 
> +  /* Make sure qualifiers are copied from the vector type to the new
> element
> +  of the array type.  */
> +  newitype = build_qualified_type (TREE_TYPE (type), TYPE_QUALS
> +(type));
> +
>*vecp = build1 (VIEW_CONVERT_EXPR,
> -   build_array_type_nelts (TREE_TYPE (type),
> +   build_array_type_nelts (newitype,
> TYPE_VECTOR_SUBPARTS (type)),
> *vecp);
>  }
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index
> fa346fe01c9..1fe91d16e8e 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -4421,7 +4421,8 @@ cxx_eval_array_reference (const constexpr_ctx
> *ctx, tree t,
>if (!lval
>&& TREE_CODE (ary) == VIEW_CONVERT_EXPR
>&& VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (ary, 0)))
> -  && TREE_TYPE (t) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (ary, 0
> +  && TYPE_MAIN_VARIANT (TREE_TYPE (t))
> +   == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND
> (ary,
> +0)
>  ary = TREE_OPERAND (ary, 0);
> 
>tree oldidx = TREE_OPERAND (t, 1);
> diff --git a/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
> b/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
> new file mode 100644
> index 000..0c8958a4e03
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
> @@ -0,0 +1,23 @@
> +/* PR c++/89224 */
> +
> +/* The access of `vector[i]` has the same qualifiers as the original
> +   vector which was missing. */
> +
> +typedef __attribute__((vector_size(16))) unsigned char  Int8x8_t;
> +
> +template 
> +void g(T ) {
> +__builtin_abort();
> +}
> +template 
> +void g(const T ) {
> +  __builtin_exit(0);
> +}
> +void f(const Int8x8_t x) {
> +  g(x[0]);
> +}
> +int main(void)
> +{
> +Int8x8_t x ={};
> +f(x);
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr83415.c b/gcc/testsuite/gcc.dg/pr83415.c
> index 5934c16d97c..2fc85031505 100644
> --- a/gcc/testsuite/gcc.dg/pr83415.c
> +++ b/gcc/testsuite/gcc.dg/pr83415.c
> @@ -7,6 +7,6 @@ int
>  main (int argc, short *argv[])
>  {
>int i = argc;
> -  y[i] = 7 - i; /* { dg-warning "read-only" } */
> +  y[i] = 7 - i; /* { dg-error "read-only" } */
>return 0;
>  }
> --
> 2.43.0



[PATCH] RISC-V: Fix ICE in riscv vector costs

2024-03-06 Thread demin.han
The following code can result in ICE:
-march=rv64gcv_zba_zbb --param riscv-autovec-lmul=dynamic -O3

char *jpeg_difference7_input_buf;
void jpeg_difference7(int *diff_buf) {
  unsigned width;
  int samp, Rb;
  while (--width) {
Rb = samp = *jpeg_difference7_input_buf;
*diff_buf++ = -(int)(samp + (long)Rb >> 1);
  }
}

One biggest_mode update missed in one branch and trigger assertion fail.
gcc_assert (biggest_size >= mode_size);

Tested On RV64 and no regression.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc: Fix ICE

Signed-off-by: demin.han 
---
 gcc/config/riscv/riscv-vector-costs.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 7c9840df4e9..f13a1296b31 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -413,6 +413,8 @@ compute_local_live_ranges (
  auto *r = get_live_range (live_ranges, arg);
  gcc_assert (r);
  (*r).second = MAX (point, (*r).second);
+ biggest_mode = get_biggest_mode (
+   biggest_mode, TYPE_MODE (TREE_TYPE (arg)));
}
}
  else
-- 
2.44.0



Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-03-06 Thread chenglulu



在 2024/3/7 下午12:05, mengqinggang 写道:

Hi,

Thanks, this patch is LGTM.


I don't have a problem either.

Thanks.




在 2024/3/7 上午10:56, Xi Ruoyao 写道:

On Thu, 2024-03-07 at 10:43 +0800, mengqinggang wrote:

Hi,

Whether to add an option to control the generation of R_LARCH_RELAX,
similar to as -mrelax/-mno-relax.

There are already -mrelax and -mno-relax, they can be checked in the
compiler code with TARGET_LINKER_RELAXATION.

/* snip */


+    case 'Q':
+  if (!TARGET_LINKER_RELAXATION)
+ break;

So with -mno-relax we'll break early here, then no R_LARCH_RELAX will be
printed.


+  if (code == HIGH)
+ op = XEXP (op, 0);
+
+  if (loongarch_classify_symbolic_expression (op) == 
SYMBOL_TLS_IE)

+ fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
+
+  break;

The tls-ie-norelax.c test case also checks for -mno-relax:


+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" 
} */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target 
tls_native } } } */

i.e. -mno-relax is used compiling this test case, and the compiled
assembly code should not contain R_LARCH_RELAX.





Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-03-06 Thread mengqinggang

Hi,

Thanks, this patch is LGTM.


在 2024/3/7 上午10:56, Xi Ruoyao 写道:

On Thu, 2024-03-07 at 10:43 +0800, mengqinggang wrote:

Hi,

Whether to add an option to control the generation of R_LARCH_RELAX,
similar to as -mrelax/-mno-relax.

There are already -mrelax and -mno-relax, they can be checked in the
compiler code with TARGET_LINKER_RELAXATION.

/* snip */


+    case 'Q':
+  if (!TARGET_LINKER_RELAXATION)
+break;

So with -mno-relax we'll break early here, then no R_LARCH_RELAX will be
printed.


+  if (code == HIGH)
+op = XEXP (op, 0);
+
+  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
+fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
+
+  break;

The tls-ie-norelax.c test case also checks for -mno-relax:


+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */

i.e. -mno-relax is used compiling this test case, and the compiled
assembly code should not contain R_LARCH_RELAX.





Re: [patch, libgfortran] Part 2: PR105456 Child I/O does not propage iostat

2024-03-06 Thread Jerry D

On 3/6/24 9:13 AM, Harald Anlauf wrote:

Hi Jerry,

can you please replace the user message in e.g. your new testcase
pr105456-wf.f90 by say:

piomsg="The users message containing % and %% and %s and other stuff"

This behaves as expected with Intel, but dies horribly with gfortran
after your patch!

Cheers,
Harald




Fixed with:

commit 03932d3203bce244edd812b81921c2f16ea18d86 (HEAD -> master, 
origin/master, origin/HEAD)

Author: Jerry DeLisle 
Date:   Wed Mar 6 19:46:04 2024 -0800

Fortran: Fix issue with using snprintf function.

The previous patch used snprintf to set the message
string. The message string is not a formatted string
and the snprintf will interpret '%' related characters
as format specifiers when there are no associated
output variables. A segfault ensues.

This change replaces snprintf with a fortran string copy
function and null terminates the message string.

PR libfortran/105456

libgfortran/ChangeLog:

* io/list_read.c (list_formatted_read_scalar): Use fstrcpy
from libgfortran/runtime/string.c to replace snprintf.
(nml_read_obj): Likewise.
* io/transfer.c (unformatted_read): Likewise.
(unformatted_write): Likewise.
(formatted_transfer_scalar_read): Likewise.
(formatted_transfer_scalar_write): Likewise.
* io/write.c (list_formatted_write_scalar): Likewise.
(nml_write_obj): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr105456.f90: Revise using '%' characters
in users error message.

Jerry -



Re: [PATCH] c++: lambda capturing structured bindings [PR85889]

2024-03-06 Thread Jason Merrill

On 3/4/24 12:49, Marek Polacek wrote:

On Fri, Mar 01, 2024 at 07:58:24PM -0500, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for 15?  (Or even trunk?)

-- >8 --
 clarifies that it's OK to capture structured
bindings.

[expr.prim.lambda.capture]/4 says "The identifier in a simple-capture shall
denote a local entity" and [basic.pre]/3: "An entity is a [...] structured
binding".

It doesn't appear that this was made a DR, so, strictly speaking, we
should have a -Wc++20-extensions warning, like clang++.

PR c++/85889

gcc/cp/ChangeLog:

* lambda.cc (add_capture): Add a pedwarn for capturing structured
bindings.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/decomp3.C: Use -Wno-c++20-extensions.
* g++.dg/cpp1z/decomp60.C: New test.
---
  gcc/cp/lambda.cc  |  9 +
  gcc/testsuite/g++.dg/cpp1z/decomp60.C | 12 
  gcc/testsuite/g++.dg/cpp2a/decomp3.C  |  2 +-
  3 files changed, 22 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp60.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index 4b1f9391fee..470f9d2c4f1 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -607,6 +607,15 @@ add_capture (tree lambda, tree id, tree orig_init, bool 
by_reference_p,
 TCTX_CAPTURE_BY_COPY, type))
return error_mark_node;
}
+
+  if (cxx_dialect < cxx20)
+   {
+ tree stripped_init = tree_strip_any_location_wrapper (initializer);


I was missing an auto_diagnostic_group here.  Fixed.


OK for 15 with that fix.


+ if (DECL_DECOMPOSITION_P (stripped_init)
+ && pedwarn (input_location, OPT_Wc__20_extensions,
+ "captured structured bindings are a C++20 extension"))
+   inform (DECL_SOURCE_LOCATION (stripped_init), "declared here");






Re: [PATCH] c++: ICE with variable template and [[deprecated]] [PR110031]

2024-03-06 Thread Jason Merrill

On 3/4/24 17:11, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?


OK.


-- >8 --
lookup_and_finish_template_variable already has and uses the complain
parameter but it is not passing it down to mark_used so we got the
default tf_warning_or_error, which causes various problems when
lookup_and_finish_template_variable gets called with complain=tf_none.

PR c++/110031

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Pass complain to
mark_used.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/inline-var11.C: New test.
---
  gcc/cp/pt.cc  |  2 +-
  gcc/testsuite/g++.dg/cpp1z/inline-var11.C | 32 +++
  2 files changed, 33 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/inline-var11.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c4bc54a8fdb..48d2b3cbac6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10533,7 +10533,7 @@ lookup_and_finish_template_variable (tree templ, tree 
targs,
if (var == error_mark_node)
  return error_mark_node;
var = finish_template_variable (var, complain);
-  mark_used (var);
+  mark_used (var, complain);
return var;
  }
  
diff --git a/gcc/testsuite/g++.dg/cpp1z/inline-var11.C b/gcc/testsuite/g++.dg/cpp1z/inline-var11.C

new file mode 100644
index 000..d92911ed3a9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/inline-var11.C
@@ -0,0 +1,32 @@
+// PR c++/110031
+// { dg-do compile { target c++17 } }
+
+template 
+[[deprecated]]
+inline constexpr bool t = true ;
+
+template 
+struct enableif;
+
+template<>
+struct enableif
+{
+using y = int;
+};
+template 
+using enableif_t = typename enableif::y;
+
+template > = 0>   // { dg-warning "deprecated" }
+struct A {  A(T &&)  {  }};
+
+template 
+struct A {
+  A(T &&) = delete;
+  A() = delete;
+};
+
+int main(void)
+{
+  A a(5.3); // { dg-error "use of deleted function" }
+  return 0;
+}

base-commit: a89c5df317d1de74871e2a05c36aed9cbbb21f42




Re: [PATCH] c++/modules: member alias tmpl partial inst [PR103994]

2024-03-06 Thread Jason Merrill

On 3/4/24 17:26, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

Alias templates are weird in that their specializations can appear in
both decl_specializations and type_specializations.  They appear in the
latter only at parse time via finish_template_type.  This should probably
be revisited in GCC 15 since it seems sufficient to store them only in
decl_specializations.


It looks like most all of lookup_template_class is wrong for alias 
templates.


Can we move the alias template handling up higher and unconditionally 
return the result of tsubst?


Jason



Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-03-06 Thread Xi Ruoyao
On Thu, 2024-03-07 at 10:43 +0800, mengqinggang wrote:
> Hi,
> 
> Whether to add an option to control the generation of R_LARCH_RELAX,
> similar to as -mrelax/-mno-relax.

There are already -mrelax and -mno-relax, they can be checked in the
compiler code with TARGET_LINKER_RELAXATION.

/* snip */

> > +    case 'Q':
> > +  if (!TARGET_LINKER_RELAXATION)
> > +break;

So with -mno-relax we'll break early here, then no R_LARCH_RELAX will be
printed.

> > +  if (code == HIGH)
> > +op = XEXP (op, 0);
> > +
> > +  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
> > +fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
> > +
> > +  break;

The tls-ie-norelax.c test case also checks for -mno-relax:

> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
> > +/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } 
> > } */

i.e. -mno-relax is used compiling this test case, and the compiled
assembly code should not contain R_LARCH_RELAX.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-03-06 Thread mengqinggang

Hi,

Whether to add an option to control the generation of R_LARCH_RELAX,
similar to as -mrelax/-mno-relax.


在 2024/2/29 下午3:11, Xi Ruoyao 写道:

In Binutils we need to make IE to LE relaxation only allowed when there
is an R_LARCH_RELAX after R_LARCH_TLE_IE_PC_{HI20,LO12} so an invalid
"partial" relaxation won't happen with the extreme code model.  So if we
are emitting %ie_pc_{hi20,lo12} in a non-extreme code model, emit an
R_LARCH_RELAX to allow the relaxation.  The IE to LE relaxation does not
require the pcalau12i and the ld instruction to be adjacent, so we don't
need to limit ourselves to use the macro.

For the distro maintainers backporting changes: this change depends on
r14-8721, without r14-8721 R_LARCH_RELAX can be emitted mistakenly in
the extreme code model.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Support 'Q' for R_LARCH_RELAX for TLS IE.
(loongarch_output_move): Use 'Q' to print R_LARCH_RELAX for TLS
IE.
* config/loongarch/loongarch.md (ld_from_got): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/tls-ie-relax.c: New test.
* gcc.target/loongarch/tls-ie-norelax.c: New test.
* gcc.target/loongarch/tls-ie-extreme.c: New test.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.cc | 15 ++-
  gcc/config/loongarch/loongarch.md |  2 +-
  .../gcc.target/loongarch/tls-ie-extreme.c |  5 +
  .../gcc.target/loongarch/tls-ie-norelax.c |  5 +
  gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c | 11 +++
  5 files changed, 36 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 0428b6e65d5..70e31bb831c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4981,7 +4981,7 @@ loongarch_output_move (rtx dest, rtx src)
  if (type == SYMBOL_TLS_LE)
return "lu12i.w\t%0,%h1";
  else
-   return "pcalau12i\t%0,%h1";
+   return "%Q1pcalau12i\t%0,%h1";
}
  
if (src_code == CONST_INT)

@@ -6145,6 +6145,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
 'L'  Print the low-part relocation associated with OP.
 'm'Print one less than CONST_INT OP in decimal.
 'N'Print the inverse of the integer branch condition for 
comparison OP.
+   'Q'  Print R_LARCH_RELAX for TLS IE.
 'r'  Print address 12-31bit relocation associated with OP.
 'R'  Print address 32-51bit relocation associated with OP.
 'T'Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
@@ -6282,6 +6283,18 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
letter);
break;
  
+case 'Q':

+  if (!TARGET_LINKER_RELAXATION)
+   break;
+
+  if (code == HIGH)
+   op = XEXP (op, 0);
+
+  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
+   fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
+
+  break;
+
  case 'r':
loongarch_print_operand_reloc (file, op, false /* hi64_part */,
 true /* lo_reloc */);
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index f3b5c641fce..525e1e82183 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2620,7 +2620,7 @@ (define_insn "@ld_from_got"
(match_operand:P 2 "symbolic_operand")))]
UNSPEC_LOAD_FROM_GOT))]
""
-  "ld.\t%0,%1,%L2"
+  "%Q2ld.\t%0,%1,%L2"
[(set_attr "type" "move")]
  )
  
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c

new file mode 100644
index 000..00c545a3e8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d -mcmodel=extreme 
-mexplicit-relocs=auto -mrelax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#include "tls-ie-relax.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
new file mode 100644
index 000..dd6bf3634a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#include "tls-ie-relax.c"
diff --git 

Re: [PATCH] c++/modules: inline namespace abi_tag streaming [PR110730]

2024-03-06 Thread Patrick Palka
On Wed, 6 Mar 2024, Jason Merrill wrote:

> On 3/6/24 14:10, Patrick Palka wrote:
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> > 
> > -- >8 --
> > 
> > The unreduced testcase from this PR crashes at runtime ultimately
> > because we don't stream the abi_tag attribute on inline namespaces and
> > so the filesystem::current_path() call resolves to the non-C++11 ABI
> > version even though the C++11 ABI is active, leading to a crash when
> > destructing the call result (which contains an std::string member).
> > 
> > While we do stream the DECL_ATTRIBUTES of all decls that go through
> > the generic tree streaming routines, it seems namespaces are streamed
> > separately from other decls and we don't use the generic routines for
> > them.  So this patch makes us stream the abi_tag manually for (inline)
> > namespaces.
> 
> Why not stream all DECL_ATTRIBUTES of all namespaces?

AFAICT abi_tag and deprecated are the only attributes that we recognize
on a namespace, and for deprecated it should suffice to stream the
TREE_DEPRECATED flag instead of the actual attribute, so hardcoding
abi_tag streaming seems convenient.

If we wanted to stream all DECL_ATTRIBUTES of a namespace then we'd have
to assume up front what kind of tree arguments of the attributes can be,
e.g. an INTEGER_CST or a STRING_CST etc and implement streaming of these
trees within the bytes_in/out base classes instead of trees_in/out (we
only have access to a bytes_in/out object from read/write_namespaces).
I'm not sure that's worth it unless there's other namespace attributes
we want to stream?

> 
> Jason
> 
> 



Re: [PATCH] c++: Fix ICE diagnosing incomplete type of overloaded function set [PR98356]

2024-03-06 Thread Jason Merrill

On 3/4/24 18:30, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

In the linked PR the result of 'get_first_fn' is a USING_DECL against
the template parameter, to be filled in on instantiation. But we don't
actually need to get the first set of the member functions: it's enough
to know that we have a (possibly overloaded) member function at all.

PR c++/98356

gcc/cp/ChangeLog:

* typeck2.cc (cxx_incomplete_type_diagnostic): Don't assume
'member' will be a FUNCTION_DECL (or something like it).

gcc/testsuite/ChangeLog:

* g++.dg/pr98356.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/typeck2.cc  | 11 +--
  gcc/testsuite/g++.dg/pr98356.C |  9 +
  2 files changed, 14 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/pr98356.C

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 9608bdccd8b..31198b2f9f5 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -350,16 +350,15 @@ cxx_incomplete_type_diagnostic (location_t loc, 
const_tree value,
  bad_member:
{
tree member = TREE_OPERAND (value, 1);
-   if (is_overloaded_fn (member))
- member = get_first_fn (member);
-
-   if (DECL_FUNCTION_MEMBER_P (member)
-   && ! flag_ms_extensions)
+   if (is_overloaded_fn (member) && !flag_ms_extensions)
  {
gcc_rich_location richloc (loc);
/* If "member" has no arguments (other than "this"), then
   add a fix-it hint.  */
-   if (type_num_arguments (TREE_TYPE (member)) == 1)
+   member = MAYBE_BASELINK_FUNCTIONS (member);
+   if (TREE_CODE (member) == FUNCTION_DECL
+   && DECL_OBJECT_MEMBER_FUNCTION_P (member)
+   && type_num_arguments (TREE_TYPE (member)) == 1)
  richloc.add_fixit_insert_after ("()");
complained = emit_diagnostic (diag_kind, , 0,
 "invalid use of member function %qD "
diff --git a/gcc/testsuite/g++.dg/pr98356.C b/gcc/testsuite/g++.dg/pr98356.C
new file mode 100644
index 000..acea238593b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr98356.C
@@ -0,0 +1,9 @@
+// PR c++/98356
+// { dg-do compile { target c++11 } }
+
+template  class T> struct S {
+  using A = T;
+  using A::foo;
+  void foo ();
+  void bar () {foo.}  // { dg-error "invalid use of member function" }
+};




RE: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-03-06 Thread Li, Pan2
Thanks Richard for comments.

> gen_int_libfunc will no longer make it emit libcalls for fixed point
> modes, so this can't be correct
> and there's no libgcc implementation for integer mode saturating ops,
> so it's pointless to emit calls
> to them.

Got the pointer here, the OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", 
'3', gen_unsigned_fixed_libfunc)
Is designed for the fixed point, cannot cover integer mode right now.

Given we have saturating integer alu like below, could you help to coach me the 
most reasonable way to represent
It in scalar as well as vectorize part? Sorry not familiar with this part and 
still dig into how it works...

uint32_t sat_uadd (uint32_t a, uint32_t b)
{
  uint32_t add = a + b;
  return add | -(add < a);
}

sint32_t sat_sadd (sint32_t a, sint32_t b)
{
  sint32_t add = a + b;
  sint32_t x = a ^ b;
  sint32_t y = add ^ x;
  return x < 0 ? add : (y >= 0 ? add : INT32_MAX + (x < 0));
}

uint32_t sat_usub (uint32_t a, uint32_t b)
{
  return a >= b ? a - b : 0;
}

sint32_t sat_ssub (sint32_t a, sint32_t b)
{
  sint32_t sub = a - b;
  sint32_t x = a ^ b;
  sint32_t y = sub ^ x;
  return x >= 0 ? sub : (y >= 0 ? sub : INT32_MAX + (x < 0));
}

uint32_t sat_umul (uint32_t a, uint32_t b)
{
  uint64_t mul = a * b;

  return mul <= (uint64_t)UINT32_MAX ? (uint32_t)mul : UINT32_MAX;
}

sint32_t sat_smul (sint32_t a, sint32_t b)
{
  sint64_t mul = a * b;

  return mul >= (sint64_t)INT32_MIN && mul <= (sint64_t)INT32_MAX ? 
(sint32_t)mul : INT32_MAX + ((x ^ y) < 0);
}

uint32_t sat_udiv (uint32_t a, uint32_t b)
{
  return a / b; // never overflow
}

sint32_t sat_sdiv (sint32_t a, sint32_t b)
{
  return a == INT32_MIN && b == -1 ? INT32_MAX : a / b;
}

sint32_t sat_abs (sint32_t a)
{
  return a >= 0 ? a : (a == INT32_MIN ? INT32_MAX : -a);
}

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, March 5, 2024 4:41 PM
To: Li, Pan2 
Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com; jeffreya...@gmail.com
Subject: Re: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation 
US_PLUS

On Tue, Mar 5, 2024 at 8:09 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > I do wonder what the existing usadd patterns with integer vector modes
> > in various targets do?
> > Those define_insn will at least not end up in the optab set I guess,
> > so they must end up
> > being either unused or used by explicit gen_* (via intrinsic
> > functions?) or by combine?
>
> For usadd with vector modes, I think the backend like RISC-V try to leverage 
> instructions
> like Vector Single-Width Saturating Add(aka vsaddu.vv/x/i).
>
> > I think simply changing gen_*_fixed_libfunc to gen_int_libfunc won't
> > work.  Since there's
> > no libgcc support I'd leave it as gen_*_fixed_libfunc thus no library
> > fallback for integers?
>
> Change to gen_int_libfunc follows other int optabs. I am not sure if it will 
> hit the standard name usaddm3 for vector mode.
> But the happy path for scalar modes works up to a point, please help to 
> correct me if any misunderstanding.

gen_int_libfunc will no longer make it emit libcalls for fixed point
modes, so this can't be correct
and there's no libgcc implementation for integer mode saturating ops,
so it's pointless to emit calls
to them.

> #0  riscv_expand_usadd (dest=0x76a8c7c8, x=0x76a8c798, 
> y=0x76a8c7b0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:10662
> #1  0x029f142a in gen_usaddsi3 (operand0=0x76a8c7c8, 
> operand1=0x76a8c798, operand2=0x76a8c7b0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.md:3848
> #2  0x01751e60 in insn_gen_fn::operator() rtx_def*> (this=0x4910e70 ) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/recog.h:441
> #3  0x0180f553 in maybe_gen_insn (icode=CODE_FOR_usaddsi3, nops=3, 
> ops=0x7fffd2c0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs.cc:8232
> #4  0x0180fa42 in maybe_expand_insn (icode=CODE_FOR_usaddsi3, nops=3, 
> ops=0x7fffd2c0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs.cc:8275
> #5  0x0180fade in expand_insn (icode=CODE_FOR_usaddsi3, nops=3, 
> ops=0x7fffd2c0) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs.cc:8306
> #6  0x015cebdc in expand_fn_using_insn (stmt=0x76a36480, 
> icode=CODE_FOR_usaddsi3, noutputs=1, ninputs=2) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/internal-fn.cc:254
> #7  0x015de146 in expand_direct_optab_fn (fn=IFN_SAT_ADD, 
> stmt=0x76a36480, optab=usadd_optab, nargs=2) at 
> /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/internal-fn.cc:3818
> #8  0x015e3610 in expand_SAT_ADD (fn=IFN_SAT_ADD, 
> stmt=0x76a36480) at 
> 

[PATCH] LoongArch: testsuite: Add compilation options to the regname-fp-s9.c.

2024-03-06 Thread Lulu Cheng
When the value of the macro DEFAULT_CFLAGS is set to '-ansi -pedantic-errors',
regname-s9-fp.c will test to fail. To solve this problem, add the compilation
option '-Wno-pedantic -std=gnu90' to this test case.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/regname-fp-s9.c: Add compilation option
'-Wno-pedantic -std=gnu90'.
---
 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c 
b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
index d2e3b80f83c..77a74f1f667 100644
--- a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
+++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
@@ -1,3 +1,4 @@
 /* { dg-do compile } */
+/* { dg-additional-options "-Wno-pedantic -std=gnu90" } */
 register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */
 register long fp asm("fp"); /* { dg-warning "register of 'fp' used for 
multiple global register variables" } */
-- 
2.39.3



Re: [PATCH] c++: Don't set DECL_CONTEXT to nested template-template parameters [PR98881]

2024-03-06 Thread Jason Merrill

On 3/5/24 00:24, Nathaniel Shead wrote:

On Mon, Mar 04, 2024 at 10:07:33PM -0500, Patrick Palka wrote:

On Tue, 5 Mar 2024, Nathaniel Shead wrote:


On Mon, Mar 04, 2024 at 09:26:00PM -0500, Patrick Palka wrote:

On Tue, 5 Mar 2024, Nathaniel Shead wrote:


On Mon, Mar 04, 2024 at 07:14:54PM -0500, Patrick Palka wrote:

On Sat, 2 Mar 2024, Nathaniel Shead wrote:


Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

When streaming in a nested template-template parameter as in the
attached testcase, we end up reaching the containing template-template
parameter in 'tpl_parms_fini'. We should not set the DECL_CONTEXT to
this (nested) template-template parameter, as it should already be the
struct that the outer template-template parameter is declared on.

PR c++/98881

gcc/cp/ChangeLog:

* module.cc (trees_out::tpl_parms_fini): Clarify logic purely
for checking purposes. Don't consider a template template
parameter as the owning template.
(trees_in::tpl_parms_fini): Don't consider a template template
parameter as the owning template.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-tpl-parm-3_a.H: New test.
* g++.dg/modules/tpl-tpl-parm-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc| 17 -
  gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_a.H | 11 +++
  gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_b.C | 13 +
  3 files changed, 36 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 67f132d28d7..5663d01ed9c 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -10126,10 +10126,14 @@ trees_out::tpl_parms_fini (tree tmpl, unsigned 
tpl_levels)
  tree dflt = TREE_PURPOSE (parm);
  tree_node (dflt);
  
-	  if (streaming_p ())

+ if (CHECKING_P && streaming_p ())
{
+ /* Sanity check that the DECL_CONTEXT we'll infer when
+streaming in is correct.  */
  tree decl = TREE_VALUE (parm);
- if (TREE_CODE (decl) == TEMPLATE_DECL)
+ if (TREE_CODE (decl) == TEMPLATE_DECL
+ /* A template template parm is not the owning template.  */
+ && !DECL_TEMPLATE_TEMPLATE_PARM_P (tmpl))
{
  tree ctx = DECL_CONTEXT (decl);
  tree inner = DECL_TEMPLATE_RESULT (decl);
@@ -10164,8 +10168,13 @@ trees_in::tpl_parms_fini (tree tmpl, unsigned 
tpl_levels)
return false;
  TREE_PURPOSE (parm) = dflt;
  
+	  /* Original template template parms have a context

+of their owning template.  Reduced ones do not.
+But if TMPL is itself a template template parm
+then it cannot be the owning template.  */
  tree decl = TREE_VALUE (parm);
- if (TREE_CODE (decl) == TEMPLATE_DECL)
+ if (TREE_CODE (decl) == TEMPLATE_DECL
+ && !DECL_TEMPLATE_TEMPLATE_PARM_P (tmpl))


IIUC a TEMPLATE_DECL inside a template parameter list always represents
a template template parm, so won't this effectively disable the
DECL_CONTEXT setting logic?


This is only when 'tmpl' (i.e. the containing TEMPLATE_DECL that we're
streaming) is itself a template template parm.


D'oh, makes sense.




{
  tree inner = DECL_TEMPLATE_RESULT (decl);
  tree tpi = (TREE_CODE (inner) == TYPE_DECL
@@ -10173,8 +10182,6 @@ trees_in::tpl_parms_fini (tree tmpl, unsigned 
tpl_levels)
  : DECL_INITIAL (inner));
  bool original = (TEMPLATE_PARM_LEVEL (tpi)
   == TEMPLATE_PARM_ORIG_LEVEL (tpi));
- /* Original template template parms have a context
-of their owning template.  Reduced ones do not.  */
  if (original)
DECL_CONTEXT (decl) = tmpl;
}
diff --git a/gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_a.H 
b/gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_a.H
new file mode 100644
index 000..21bbc054fa3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-tpl-parm-3_a.H
@@ -0,0 +1,11 @@
+// PR c++/98881
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+template  struct X {};
+
+template typename TT>
+struct X> {
+  template typename UU>
+  void f (X>&);
+};


I wonder why the partial specialization is relevant here?  I can't
seem to trigger the ICE without the partial specialization.
Slightly further reduced to not use bound ttps:

 template class TT>
 struct X { };

 template class TT> requires true
 struct X {
   template class UU>
   void f(X);
 };

Maybe the expectation is that tpl_parms_fini for UU should be called
with tpl_levels=1 (so that we stream only its own 

Re: [PATCH] c++/modules: inline namespace abi_tag streaming [PR110730]

2024-03-06 Thread Jason Merrill

On 3/6/24 14:10, Patrick Palka wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

The unreduced testcase from this PR crashes at runtime ultimately
because we don't stream the abi_tag attribute on inline namespaces and
so the filesystem::current_path() call resolves to the non-C++11 ABI
version even though the C++11 ABI is active, leading to a crash when
destructing the call result (which contains an std::string member).

While we do stream the DECL_ATTRIBUTES of all decls that go through
the generic tree streaming routines, it seems namespaces are streamed
separately from other decls and we don't use the generic routines for
them.  So this patch makes us stream the abi_tag manually for (inline)
namespaces.


Why not stream all DECL_ATTRIBUTES of all namespaces?

Jason



[PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-06 Thread Lulu Cheng
If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be
implemented through "ll.w+sc.w". In the implementation of the instruction 
sequence,
it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 32-bit
and 64-bit, the two operand registers that need to be compared are symbolically
extended, and one of the operand registers is obtained from memory through the
"ll.w" instruction, which can ensure that the symbolic expansion is carried out.
However, the value of the other operand register is not guaranteed to be the
value of the sign extension.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_cas_value_strong):
In loongarch64, a sign extension operation is added when
operands[2] is a register operand and the mode is SImode.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/atomic-cas-int.C: New test.
---
 gcc/config/loongarch/sync.md  | 46 ++-
 .../g++.target/loongarch/atomic-cas-int.C | 32 +
 2 files changed, 67 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..d41c2d26811 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong"
(clobber (match_scratch:GPR 5 "="))]
   ""
 {
-  return "1:\\n\\t"
-"ll.\\t%0,%1\\n\\t"
-"bne\\t%0,%z2,2f\\n\\t"
-"or%i3\\t%5,$zero,%3\\n\\t"
-"sc.\\t%5,%1\\n\\t"
-"beqz\\t%5,1b\\n\\t"
-"b\\t3f\\n\\t"
-"2:\\n\\t"
-"%G4\\n\\t"
-"3:\\n\\t";
+  output_asm_insn ("1:", operands);
+  output_asm_insn ("ll.\t%0,%1", operands);
+
+  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
+ return value of the val_without_const_folding will not be truncated and
+ will be passed directly to the function compare_exchange_strong.
+ However, the instruction 'bne' does not distinguish between 32-bit and
+ 64-bit operations.  so if the upper 32 bits of the register are not
+ extended by the 32nd bit symbol, then the comparison may not be valid
+ here.  This will affect the result of the operation.  */
+
+  if (TARGET_64BIT && REG_P (operands[2])
+  && GET_MODE (operands[2]) == SImode)
+{
+  output_asm_insn ("addi.w\t%5,%2,0", operands);
+  output_asm_insn ("bne\t%0,%5,2f", operands);
+}
+  else
+output_asm_insn ("bne\t%0,%z2,2f", operands);
+
+  output_asm_insn ("or%i3\t%5,$zero,%3", operands);
+  output_asm_insn ("sc.\t%5,%1", operands);
+  output_asm_insn ("beqz\t%5,1b", operands);
+  output_asm_insn ("b\t3f", operands);
+  output_asm_insn ("2:", operands);
+  output_asm_insn ("%G4", operands);
+  output_asm_insn ("3:", operands);
+
+  return "";
 }
-  [(set (attr "length") (const_int 28))])
+  [(set (attr "length")
+ (if_then_else
+   (and (match_test "GET_MODE (operands[2]) == SImode")
+(match_test "REG_P (operands[2])"))
+   (const_int 32)
+   (const_int 28)))])
 
 (define_insn "atomic_cas_value_strong_amcas"
   [(set (match_operand:QHWD 0 "register_operand" "=")
diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C 
b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
new file mode 100644
index 000..830ce48267a
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+__attribute__ ((noinline)) long
+val_without_const_folding (long val)
+{
+  return val;
+}
+
+int
+main ()
+{
+  int oldval = 0xaa;
+  int newval = 0xbb;
+  std::atomic amo;
+
+  amo.store (oldval);
+
+  long longval = val_without_const_folding (0xff80 + oldval);
+  oldval = static_cast (longval);
+
+  amo.compare_exchange_strong (oldval, newval);
+
+  if (newval != amo.load (std::memory_order_relaxed))
+__builtin_abort ();
+
+  return 0;
+}
+
-- 
2.39.3



Re: [PATCH] c-family, c++, v2: Fix up handling of types which may have padding in __atomic_{compare_}exchange

2024-03-06 Thread Jason Merrill

On 2/20/24 05:02, Jakub Jelinek wrote:

On Tue, Feb 20, 2024 at 09:01:10AM +0100, Richard Biener wrote:

I'm not sure those would be really equivalent (MEM_REF vs. V_C_E
as well as combined vs. split).  It really depends how RTL expansion
handles this (as you can see padding can be fun here).

So I'd be nervous for a match.pd rule here (also we can't match
memory defs).


Ok.  Perhaps forwprop then; anyway, that would be an optimization.


As for your patch I'd go with a MEM_REF unconditionally, I don't
think we want different behavior whether there's padding or not ...


I've made it conditional so that the MEM_REFs don't appear that often in the
FE trees, but maybe that is fine.

The unconditional patch would then be:


OK.


2024-02-20  Jakub Jelinek  

gcc/c-family/
* c-common.cc (resolve_overloaded_atomic_exchange): Instead of setting
p1 to VIEW_CONVERT_EXPR (*p1), set it to MEM_REF with p1 and
(typeof (p1)) 0 operands and I_type type.
(resolve_overloaded_atomic_compare_exchange): Similarly for p2.
gcc/cp/
* pt.cc (tsubst_expr): Handle MEM_REF.
gcc/testsuite/
* g++.dg/ext/atomic-5.C: New test.

--- gcc/c-family/c-common.cc.jj 2024-02-17 16:40:42.831571693 +0100
+++ gcc/c-family/c-common.cc2024-02-20 10:58:56.599865656 +0100
@@ -7793,9 +7793,14 @@ resolve_overloaded_atomic_exchange (loca
/* Convert object pointer to required type.  */
p0 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p0);
(*params)[0] = p0;
-  /* Convert new value to required type, and dereference it.  */
-  p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
-  p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
+  /* Convert new value to required type, and dereference it.
+ If *p1 type can have padding or may involve floating point which
+ could e.g. be promoted to wider precision and demoted afterwards,
+ state of padding bits might not be preserved.  */
+  build_indirect_ref (loc, p1, RO_UNARY_STAR);
+  p1 = build2_loc (loc, MEM_REF, I_type,
+  build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1),
+  build_zero_cst (TREE_TYPE (p1)));
(*params)[1] = p1;
  
/* Move memory model to the 3rd position, and end param list.  */

@@ -7873,9 +7878,14 @@ resolve_overloaded_atomic_compare_exchan
p1 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1);
(*params)[1] = p1;
  
-  /* Convert desired value to required type, and dereference it.  */

-  p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
-  p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
+  /* Convert desired value to required type, and dereference it.
+ If *p2 type can have padding or may involve floating point which
+ could e.g. be promoted to wider precision and demoted afterwards,
+ state of padding bits might not be preserved.  */
+  build_indirect_ref (loc, p2, RO_UNARY_STAR);
+  p2 = build2_loc (loc, MEM_REF, I_type,
+  build1 (VIEW_CONVERT_EXPR, I_type_ptr, p2),
+  build_zero_cst (TREE_TYPE (p2)));
(*params)[2] = p2;
  
/* The rest of the parameters are fine. NULL means no special return value

--- gcc/cp/pt.cc.jj 2024-02-17 16:40:42.868571182 +0100
+++ gcc/cp/pt.cc2024-02-20 10:57:36.646973603 +0100
@@ -20088,6 +20088,14 @@ tsubst_expr (tree t, tree args, tsubst_f
RETURN (r);
}
  
+case MEM_REF:

+  {
+   tree op0 = RECUR (TREE_OPERAND (t, 0));
+   tree op1 = RECUR (TREE_OPERAND (t, 0));
+   tree new_type = tsubst (TREE_TYPE (t), args, complain, in_decl);
+   RETURN (build2_loc (EXPR_LOCATION (t), MEM_REF, new_type, op0, op1));
+  }
+
  case NOP_EXPR:
{
tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
--- gcc/testsuite/g++.dg/ext/atomic-5.C.jj  2024-02-20 10:57:36.647973589 
+0100
+++ gcc/testsuite/g++.dg/ext/atomic-5.C 2024-02-20 10:57:36.647973589 +0100
@@ -0,0 +1,42 @@
+// { dg-do compile { target c++14 } }
+
+template 
+void
+foo (long double *ptr, long double *val, long double *ret)
+{
+  __atomic_exchange (ptr, val, ret, __ATOMIC_RELAXED);
+}
+
+template 
+bool
+bar (long double *ptr, long double *exp, long double *des)
+{
+  return __atomic_compare_exchange (ptr, exp, des, false,
+   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+}
+
+bool
+baz (long double *p, long double *q, long double *r)
+{
+  foo<0> (p, q, r);
+  foo<1> (p + 1, q + 1, r + 1);
+  return bar<0> (p + 2, q + 2, r + 2) || bar<1> (p + 3, q + 3, r + 3);
+}
+
+constexpr int
+qux (long double *ptr, long double *val, long double *ret)
+{
+  __atomic_exchange (ptr, val, ret, __ATOMIC_RELAXED);
+  return 0;
+}
+
+constexpr bool
+corge (long double *ptr, long double *exp, long double *des)
+{
+  return __atomic_compare_exchange (ptr, exp, des, false,
+   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+}
+
+long double a[6];
+const int b = qux (a, a + 1, a + 2);
+const bool c = corge (a + 3, a + 4, a + 5);


Jakub





Re: [PATCH] dwarf2out, v3: Emit DW_AT_export_symbols on anon unions/structs [PR113918]

2024-03-06 Thread Jason Merrill

On 2/16/24 16:06, Jakub Jelinek wrote:

On Fri, Feb 16, 2024 at 03:40:39PM -0500, Jason Merrill wrote:

--- gcc/cp/cp-objcp-common.cc.jj2024-02-13 12:50:21.666846296 +0100
+++ gcc/cp/cp-objcp-common.cc   2024-02-16 20:40:51.374763528 +0100
@@ -410,6 +410,15 @@ cp_type_dwarf_attribute (const_tree type
return 1;
 break;
+case DW_AT_export_symbols:


For C++ this can use ANON_AGGR_TYPE_P, so it doesn't need to involve the
FIELD_DECL at all.  But I suppose the C front-end doesn't have a similar
flag?


Yes, using ANON_AGGR_TYPE_P there works for C++, but C doesn't have anything
like that.  All it uses is DECL_NAME == NULL on FIELD_DECL +
RECORD_OR_UNION_TYPE_P on its type to determine anon struct/union.

The patch below has updated cp_type_dwarf_attribute, otherwise the same as
before.


OK.


2024-02-16  Jakub Jelinek  

PR debug/113918
gcc/
* dwarf2out.cc (gen_field_die): Emit DW_AT_export_symbols
on anonymous unions or structs for -gdwarf-5 or -gno-strict-dwarf.
gcc/c/
* c-tree.h (c_type_dwarf_attribute): Declare.
* c-objc-common.h (LANG_HOOKS_TYPE_DWARF_ATTRIBUTE): Redefine.
* c-objc-common.cc: Include dwarf2.h.
(c_type_dwarf_attribute): New function.
gcc/cp/
* cp-objcp-common.cc (cp_type_dwarf_attribute): Return 1
for DW_AT_export_symbols on anonymous structs or unions.
gcc/testsuite/
* c-c++-common/dwarf2/pr113918.c: New test.

--- gcc/dwarf2out.cc.jj 2024-02-15 13:54:29.284358101 +0100
+++ gcc/dwarf2out.cc2024-02-16 20:38:19.718841259 +0100
@@ -25153,6 +25153,17 @@ gen_field_die (tree decl, struct vlr_con
  
add_accessibility_attribute (decl_die, decl);
  
+  /* Add DW_AT_export_symbols to anonymous unions or structs.  */

+  if ((dwarf_version >= 5 || !dwarf_strict) && DECL_NAME (decl) == NULL_TREE)
+if (tree type = member_declared_type (decl))
+  if (lang_hooks.types.type_dwarf_attribute (TYPE_MAIN_VARIANT (type),
+DW_AT_export_symbols) != -1)
+   {
+ dw_die_ref type_die = lookup_type_die (TYPE_MAIN_VARIANT (type));
+ if (type_die && get_AT (type_die, DW_AT_export_symbols) == NULL)
+   add_AT_flag (type_die, DW_AT_export_symbols, 1);
+   }
+
/* Equate decl number to die, so that we can look up this decl later on.  */
equate_decl_number_to_die (decl, decl_die);
  }
--- gcc/c/c-tree.h.jj   2024-01-31 10:46:35.164761720 +0100
+++ gcc/c/c-tree.h  2024-02-16 20:43:45.993372908 +0100
@@ -731,6 +731,7 @@ extern bool c_warn_unused_global_decl (c
  extern void c_initialize_diagnostics (diagnostic_context *);
  extern bool c_var_mod_p (tree x, tree fn);
  extern alias_set_type c_get_alias_set (tree);
+extern int c_type_dwarf_attribute (const_tree, int);
  
  /* in c-typeck.cc */

  extern int in_alignof;
--- gcc/c/c-objc-common.h.jj2024-01-03 12:06:52.973862999 +0100
+++ gcc/c/c-objc-common.h   2024-02-16 20:42:21.073535465 +0100
@@ -119,6 +119,9 @@ static const scoped_attribute_specs *con
  #undef LANG_HOOKS_GIMPLIFY_EXPR
  #define LANG_HOOKS_GIMPLIFY_EXPR c_gimplify_expr
  
+#undef LANG_HOOKS_TYPE_DWARF_ATTRIBUTE

+#define LANG_HOOKS_TYPE_DWARF_ATTRIBUTE c_type_dwarf_attribute
+
  #undef LANG_HOOKS_OMP_PREDETERMINED_SHARING
  #define LANG_HOOKS_OMP_PREDETERMINED_SHARING c_omp_predetermined_sharing
  
--- gcc/c/c-objc-common.cc.jj	2024-01-03 12:06:53.213859637 +0100

+++ gcc/c/c-objc-common.cc  2024-02-16 20:45:24.649022305 +0100
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
  #include "gcc-rich-location.h"
  #include "stringpool.h"
  #include "attribs.h"
+#include "dwarf2.h"
  
  static bool c_tree_printer (pretty_printer *, text_info *, const char *,

int, bool, bool, bool, bool *, const char **);
@@ -446,3 +447,25 @@ instantiation_dependent_expression_p (tr
  {
return false;
  }
+
+/* Return -1 if dwarf ATTR shouldn't be added for TYPE, or the attribute
+   value otherwise.  */
+int
+c_type_dwarf_attribute (const_tree type, int attr)
+{
+  if (type == NULL_TREE)
+return -1;
+
+  switch (attr)
+{
+case DW_AT_export_symbols:
+  if (RECORD_OR_UNION_TYPE_P (type) && TYPE_NAME (type) == NULL_TREE)
+   return 1;
+  break;
+
+default:
+  break;
+}
+
+  return -1;
+}
--- gcc/cp/cp-objcp-common.cc.jj2024-02-13 12:50:21.666846296 +0100
+++ gcc/cp/cp-objcp-common.cc   2024-02-16 21:48:33.880458318 +0100
@@ -410,6 +410,11 @@ cp_type_dwarf_attribute (const_tree type
return 1;
break;
  
+case DW_AT_export_symbols:

+  if (ANON_AGGR_TYPE_P (type))
+   return 1;
+  break;
+
  default:
break;
  }
--- gcc/testsuite/c-c++-common/dwarf2/pr113918.c.jj 2024-02-16 
20:27:13.996961811 +0100
+++ gcc/testsuite/c-c++-common/dwarf2/pr113918.c2024-02-16 
20:27:13.996961811 +0100
@@ -0,0 +1,33 @@
+/* PR debug/113918 */
+/* { dg-do compile } */
+/* { dg-options 

Re: [PATCH] c++: Fix up parameter pack diagnostics on xobj vs. varargs functions [PR113802]

2024-03-06 Thread Jason Merrill

On 2/16/24 17:15, Jakub Jelinek wrote:

On Fri, Feb 16, 2024 at 10:47:47PM +0100, Jakub Jelinek wrote:

The following patch works.


Or yet another option would be instead of (sometimes) clearing
declarator->parameter_pack_p when we diagnose this bug for error
recovery ignore the this specifier.


Let's go with this one.  OK.


With the following patch (testsuite patch remains the same),
I get excess errors though:
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:30:25: 
error: expansion pattern 'Selves' contains no parameter packs
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:42:26: 
error: expansion pattern 'Selves' contains no parameter packs
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:56:26: error: 
expansion pattern 'Selves&' contains no parameter packs
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:68:27: error: 
expansion pattern 'Selves&' contains no parameter packs
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:82:27: error: 
expansion pattern 'Selves&&' contains no parameter packs
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:94:28: error: 
expansion pattern 'Selves&&' contains no parameter packs
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:108:32: error: 
expansion pattern 'const Selves&' contains no parameter packs
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:120:33: error: 
expansion pattern 'const Selves&' contains no parameter packs
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:134:33: error: 
expansion pattern 'const Selves&&' contains no parameter packs
/usr/src/gcc/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics3.C:146:34: error: 
expansion pattern 'const Selves&&' contains no parameter packs
though, that is e.g. on
struct S0 {
   template
   void g(this Selves... selves) {}  // { dg-error "an explicit object parameter 
cannot be a function parameter pack" }
}
where such an extra error would have been emitted if the this keyword was
omitted.

--- gcc/cp/parser.cc.jj 2024-02-16 17:38:27.802845433 +0100
+++ gcc/cp/parser.cc2024-02-16 23:08:40.835437740 +0100
@@ -25734,22 +25734,6 @@ cp_parser_parameter_declaration (cp_pars
decl_specifiers.locations[ds_this] = 0;
  }
  
-  if (xobj_param_p

-  && ((declarator && declarator->parameter_pack_p)
- || cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS)))
-{
-  location_t xobj_param
-   = make_location (decl_specifiers.locations[ds_this],
-decl_spec_token_start->location,
-input_location);
-  error_at (xobj_param,
-   "an explicit object parameter cannot "
-   "be a function parameter pack");
-  /* Suppress errors that occur down the line.  */
-  if (declarator)
-   declarator->parameter_pack_p = false;
-}
-
/* If a function parameter pack was specified and an implicit template
   parameter was introduced during cp_parser_parameter_declaration,
   change any implicit parameters introduced into packs.  */
@@ -25762,9 +25746,10 @@ cp_parser_parameter_declaration (cp_pars
(INNERMOST_TEMPLATE_PARMS (current_template_parms));
  
if (latest_template_parm_idx != template_parm_idx)

-   decl_specifiers.type = convert_generic_types_to_packs
- (decl_specifiers.type,
-  template_parm_idx, latest_template_parm_idx);
+   decl_specifiers.type
+ = convert_generic_types_to_packs (decl_specifiers.type,
+   template_parm_idx,
+   latest_template_parm_idx);
  }
  
if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS))

@@ -25794,6 +25779,21 @@ cp_parser_parameter_declaration (cp_pars
}
  }
  
+  if (xobj_param_p

+  && (declarator ? declarator->parameter_pack_p
+: PACK_EXPANSION_P (decl_specifiers.type)))
+{
+  location_t xobj_param
+   = make_location (decl_specifiers.locations[ds_this],
+decl_spec_token_start->location,
+input_location);
+  error_at (xobj_param,
+   "an explicit object parameter cannot "
+   "be a function parameter pack");
+  xobj_param_p = false;
+  decl_specifiers.locations[ds_this] = 0;
+}
+
/* The restriction on defining new types applies only to the type
   of the parameter, not to the default argument.  */
parser->type_definition_forbidden_message = saved_message;

Jakub





Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-03-06 Thread Dimitry Andric
On 6 Mar 2024, at 15:57, FX Coudert  wrote:
> 
>> Hmm I recall trying it and finding a problem - was there some different fix 
>> applied
>> in the end?
> 
> The bug is still open, I don’t think a patch was applied, and I don’t find 
> any email to the list stating what the problem could be.

The original patch (https://gcc.gnu.org/bugzilla/attachment.cgi?id=56010) still 
applies to the master branch. It turned out there is also a related problem in 
libcc1plugin.cc and libcp1plugin.cc , which is fixed 
by https://gcc.gnu.org/bugzilla/attachment.cgi?id=57639 :

commit 49222b98ac8e30a4a042ada0ece3d7df93f049d2
Author: Dimitry Andric 
Date:   2024-03-06T23:46:27+01:00

Fix libcc1plugin and libc1plugin to use INCLUDE_VECTOR before including
system.h, instead of directly including , to avoid running into
poisoned identifiers.

diff --git a/libcc1/libcc1plugin.cc b/libcc1/libcc1plugin.cc
index 72d17c3b81c..e64847466f4 100644
--- a/libcc1/libcc1plugin.cc
+++ b/libcc1/libcc1plugin.cc
@@ -32,6 +32,7 @@
 #undef PACKAGE_VERSION
 
 #define INCLUDE_MEMORY
+#define INCLUDE_VECTOR
 #include "gcc-plugin.h"
 #include "system.h"
 #include "coretypes.h"
@@ -69,8 +70,6 @@
 #include "gcc-c-interface.h"
 #include "context.hh"
 
-#include 
-
 using namespace cc1_plugin;
 
 
diff --git a/libcc1/libcp1plugin.cc b/libcc1/libcp1plugin.cc
index 0eff7c68d29..da68c5d0ac1 100644
--- a/libcc1/libcp1plugin.cc
+++ b/libcc1/libcp1plugin.cc
@@ -33,6 +33,7 @@
 #undef PACKAGE_VERSION
 
 #define INCLUDE_MEMORY
+#define INCLUDE_VECTOR
 #include "gcc-plugin.h"
 #include "system.h"
 #include "coretypes.h"
@@ -71,8 +72,6 @@
 #include "rpc.hh"
 #include "context.hh"
 
-#include 
-
 using namespace cc1_plugin;
 
 



[PATCH] lto-streamer: Ensure src_pwd is remapped

2024-03-06 Thread Morten Linderud
I've made an attempt at patching this issue as it produces unreproducible
unreproducible binaries for Golang. I don't know C/C++ and it's my first gcc
patch so please bear with me :)

-- >8 --

When -flto is used with line macros containing bare symbols instead of
absolute paths the full path is added but never remapped with
-ffile-prefix-map. This could produce unreproducible binaries.

Fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108534

Signed-off-by: Morten Linderud 
---
 gcc/lto-streamer-in.cc  | 6 --
 gcc/lto-streamer-out.cc | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index ad0ca24007a..799f9eec478 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "debug.h"
 #include "alloc-pool.h"
 #include "toplev.h"
+#include "file-prefix-map.h" /* remap_debug_filename()  */
 
 /* Allocator used to hold string slot entries for line map streaming.  */
 static struct object_allocator *string_slot_allocator;
@@ -557,11 +558,12 @@ lto_location_cache::input_location_and_block (location_t 
*loc,
{
  const char *pwd = bp_unpack_string (data_in, bp);
  const char *src_pwd = get_src_pwd ();
- if (strcmp (pwd, src_pwd) == 0)
+ const char *remapped_src_pwd = remap_debug_filename (src_pwd);
+ if (strcmp (pwd, remapped_src_pwd) == 0)
stream_relative_path_prefix = NULL;
  else
stream_relative_path_prefix
- = canon_relative_path_prefix (pwd, src_pwd);
+ = canon_relative_path_prefix (pwd, remapped_src_pwd);
}
   stream_file = canon_file_name (stream_relative_path_prefix,
 bp_unpack_string (data_in, bp));
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index d4f728094ed..379e256a1e7 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -230,8 +230,10 @@ lto_output_location_1 (struct output_block *ob, struct 
bitpack_d *bp,
  ob->emit_pwd = false;
}
  bp_pack_value (bp, stream_pwd, 1);
- if (stream_pwd)
-   bp_pack_string (ob, bp, get_src_pwd (), true);
+ if (stream_pwd){
+   const char *remapped_pwd = remap_debug_filename (get_src_pwd ());
+   bp_pack_string (ob, bp, remapped_pwd, true);
+ }
  bp_pack_string (ob, bp, remapped, true);
  bp_pack_value (bp, xloc.sysp, 1);
}
-- 
2.44.0


Re: [C++ coroutines] Initial implementation pushed to master.

2024-03-06 Thread H.J. Lu
On Wed, Mar 6, 2024 at 1:03 AM Iain Sandoe  wrote:
>
>
>
> > On 5 Mar 2024, at 17:31, H.J. Lu  wrote:
> >
> > On Sat, Jan 18, 2020 at 4:54 AM Iain Sandoe  wrote:
> >>
>
> >> 2020-01-18  Iain Sandoe  
> >>
> >>* Makefile.in: Add coroutine-passes.o.
> >>* builtin-types.def (BT_CONST_SIZE): New.
> >>(BT_FN_BOOL_PTR): New.
> >>(BT_FN_PTR_PTR_CONST_SIZE_BOOL): New.
> >>* builtins.def (DEF_COROUTINE_BUILTIN): New.
> >>* coroutine-builtins.def: New file.
> >>* coroutine-passes.cc: New file.
> >
> > There are
> >
> >  tree res_tgt = TREE_OPERAND (gimple_call_arg (stmt, 2), 0);
> >  tree _dest = destinations.get_or_insert (idx, );
> >  if (existed && dump_file)
> >Why does this behavior depend on dump_file?
>
> This was checking for a potential wrong-code error during development;
> there is no point in making it into a diagnostic (since the user could not fix
> the problem if it happened).  I guess changing to a gcc_checking_assert()
> would be reasonable but I’d prefer to do that once GCC-15 opens.
>
> Have you found any instance where this results in a reported bug?

No, I haven't.  I only noticed it by chance.

> (I do not recall anything on my coroutines bug list that would seem to 
> indicate this).
>
> thanks for noting it.
> Iain
>
>
> >{
> >  fprintf (
> >dump_file,
> >"duplicate YIELD RESUME point (" HOST_WIDE_INT_PRINT_DEC
> >") ?\n",
> >idx);
> >  print_gimple_stmt (dump_file, stmt, 0, 
> > TDF_VOPS|TDF_MEMSYMS);
> >}
> >  else
> >res_dest = res_tgt;
> >
> > H.J.
>


-- 
H.J.


Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2024-03-06 Thread Jakub Jelinek
On Wed, Mar 06, 2024 at 05:28:21PM -0300, Alexandre Oliva wrote:
> On Mar  1, 2024, "Richard Earnshaw (lists)"  wrote:
> 
> > On 01/03/2024 04:38, Alexandre Oliva wrote:
> >> Thanks for the review.
> 
> > For closure, Jakub has just pushed a patch to the generic code, so I
> > don't think we need this now.
> 
> ACK.  I see the c2x-stdarg-4.c test is now passing in our arm-eabi
> gcc-13 tree.  Thank you all.
> 
> Alas, the same nightly build showed a new riscv fail in c23-stdarg-6.c,
> that also got backported to gcc-13.  Presumably it's failing in the
> trunk as well, both riscv32-elf and riscv64-elf.

That is PR114175 I guess.  The test just makes it clear what has been broken
already in GCC 13 there.

Jakub



Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2024-03-06 Thread Alexandre Oliva
On Mar  1, 2024, "Richard Earnshaw (lists)"  wrote:

> On 01/03/2024 04:38, Alexandre Oliva wrote:
>> Thanks for the review.

> For closure, Jakub has just pushed a patch to the generic code, so I
> don't think we need this now.

ACK.  I see the c2x-stdarg-4.c test is now passing in our arm-eabi
gcc-13 tree.  Thank you all.

Alas, the same nightly build showed a new riscv fail in c23-stdarg-6.c,
that also got backported to gcc-13.  Presumably it's failing in the
trunk as well, both riscv32-elf and riscv64-elf.

I haven't looked into whether it's a regression brought about by the
patch or just a new failure mode that the new test exposed.  Either way,
I'm not sure whether to link this new failure to any of the associated
PRs or to file a new one, but, FTR, I'm going to look into it.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] vect: Do not peel epilogue for partial vectors [PR114196].

2024-03-06 Thread Robin Dapp
Hi,

r14-7036-gcbf569486b2dec added an epilogue vectorization guard for early
break but PR114196 shows that we also run into the problem without early
break.  Therefore remove early break from the conditions.

gcc/ChangeLog:

PR middle-end/114196

* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Remove
early break check from guards.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr114196.c: New test.
* gcc.target/riscv/rvv/autovec/pr114196.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/pr114196.c   | 19 +++
 .../gcc.target/riscv/rvv/autovec/pr114196.c   | 19 +++
 gcc/tree-vect-loop-manip.cc   |  6 +++---
 3 files changed, 41 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr114196.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c

diff --git a/gcc/testsuite/gcc.target/aarch64/pr114196.c 
b/gcc/testsuite/gcc.target/aarch64/pr114196.c
new file mode 100644
index 000..15e4b0e31b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr114196.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options { -O3 -fno-vect-cost-model -march=armv9-a 
-msve-vector-bits=256 } } */
+
+unsigned a;
+int b;
+long *c;
+
+int
+main ()
+{
+  for (int d = 0; d < 22; d += 4) {
+  b = ({
+   int e = c[d];
+   e;
+   })
+  ? 0 : -c[d];
+  a *= 3;
+  }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
new file mode 100644
index 000..7ba9cbbed70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options { -O3 -fno-vect-cost-model -march=rv64gcv_zvl256b -mabi=lp64d 
-mrvv-vector-bits=zvl } } */
+
+unsigned a;
+int b;
+long *c;
+
+int
+main ()
+{
+  for (int d = 0; d < 22; d += 4) {
+  b = ({
+   int e = c[d];
+   e;
+   })
+  ? 0 : -c[d];
+  a *= 3;
+  }
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index f72da915103..c3cd20eef70 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2183,9 +2183,9 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info loop_vinfo,
  perform the peeling.  The below condition mirrors that of
  vect_gen_vector_loop_niters  where niters_vector_mult_vf_var then sets
  step_vector to VF rather than 1.  This is what creates the nonlinear
- IV.  PR113163.  */
-  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
+ IV.  PR113163.
+ This also happens without early breaks, see PR114196.  */
+  if (LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
   && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
   && induction_type != vect_step_op_neg)
 {
-- 
2.43.2


[committed] i386: Fix and improve insn constraint for V2QI arithmetic/shift insns

2024-03-06 Thread Uros Bizjak
optimize_function_for_size_p predicate is not stable during optab selection,
because it also depends on node->count/node->frequency of the current function,
which are updated during IPA, so they may change between early opts and
late opts.  Use optimize_size instead - optimize_size implies
optimize_function_for_size_p (cfun), so if a named pattern uses
"&& optimize_size" and the insn it splits into uses
optimize_function_for_size_p (cfun), it shouldn't fail.

PR target/114232

gcc/ChangeLog:

* config/i386/mmx.md (negv2qi2): Enable for optimize_size instead
of optimize_function_for_size_p.  Explicitly enable for TARGET_SSE2.
(negv2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.  Explicitly enable for TARGET_SSE2.
(v2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 2856ae6ffef..9a8d6030d8b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2874,11 +2874,18 @@ (define_insn "negv2qi2"
 (neg:V2QI
  (match_operand:V2QI 1 "register_operand" "0,Yw")))
(clobber (reg:CC FLAGS_REG))]
-  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "!TARGET_PARTIAL_REG_STALL || optimize_size || TARGET_SSE2"
   "#"
   [(set_attr "isa" "*,sse2")
(set_attr "type" "multi")
-   (set_attr "mode" "QI,TI")])
+   (set_attr "mode" "QI,TI")
+   (set (attr "enabled")
+   (cond [(and (eq_attr "alternative" "0")
+   (and (match_test "TARGET_PARTIAL_REG_STALL")
+(not (match_test "optimize_function_for_size_p 
(cfun)"
+   (symbol_ref "false")
+ ]
+ (const_string "*")))])
 
 (define_split
   [(set (match_operand:V2QI 0 "general_reg_operand")
@@ -2912,8 +2919,7 @@ (define_split
 (neg:V2QI
  (match_operand:V2QI 1 "sse_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
-   && TARGET_SSE2 && reload_completed"
+  "TARGET_SSE2 && reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 0)
(minus:V16QI (match_dup 0) (match_dup 1)))]
@@ -2975,11 +2981,18 @@ (define_insn "v2qi3"
  (match_operand:V2QI 1 "register_operand" "0,0,Yw")
  (match_operand:V2QI 2 "register_operand" "Q,x,Yw")))
(clobber (reg:CC FLAGS_REG))]
-  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "!TARGET_PARTIAL_REG_STALL || optimize_size || TARGET_SSE2"
   "#"
   [(set_attr "isa" "*,sse2_noavx,avx")
(set_attr "type" "multi,sseadd,sseadd")
-   (set_attr "mode" "QI,TI,TI")])
+   (set_attr "mode" "QI,TI,TI")
+   (set (attr "enabled")
+   (cond [(and (eq_attr "alternative" "0")
+   (and (match_test "TARGET_PARTIAL_REG_STALL")
+(not (match_test "optimize_function_for_size_p 
(cfun)"
+   (symbol_ref "false")
+ ]
+ (const_string "*")))])
 
 (define_split
   [(set (match_operand:V2QI 0 "general_reg_operand")
@@ -3021,8 +3034,7 @@ (define_split
  (match_operand:V2QI 1 "sse_reg_operand")
  (match_operand:V2QI 2 "sse_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
-   && TARGET_SSE2 && reload_completed"
+  "TARGET_SSE2 && reload_completed"
   [(set (match_dup 0)
 (plusminus:V16QI (match_dup 1) (match_dup 2)))]
 {
@@ -3684,9 +3696,10 @@ (define_insn_and_split "v2qi3"
  (match_operand:V2QI 1 "register_operand" "0")
  (match_operand:QI 2 "nonmemory_operand" "cI")))
(clobber (reg:CC FLAGS_REG))]
-  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "!TARGET_PARTIAL_REG_STALL || optimize_size"
   "#"
-  "&& reload_completed"
+  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   && reload_completed"
   [(parallel
  [(set (zero_extract:HI (match_dup 3) (const_int 8) (const_int 8))
   (subreg:HI


Re: [PATCH v2] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-06 Thread Vineet Gupta



On 3/5/24 23:27, pan2...@intel.com wrote:
> From: Pan Li 
>
> Update in v2:
> * Cleanup some unused code.
> * Fix some typo of commit log.
>
> Original log:
>
> This patch would like to introduce one new gcc attribute for RVV.
> This attribute is used to define fixed-length variants of one
> existing sizeless RVV types.
>
> This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
> one args should be the integer constant and its' value is terminated
> by the LMUL and the vector register bits in zvl*b.  For example:
>
> typedef vint32m2_t fixed_vint32m2_t 
> __attribute__((riscv_rvv_vector_bits(128)));
>
> The above type define is valid when -march=rv64gc_zve64d_zvl64b
> (aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
> -march=rv64gcv_zvl128b similar to below.
>
> "error: invalid RVV vector size '128', expected size is '256' based on
> LMUL of type and '-mrvv-vector-bits=zvl'"
>
> For the vint*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -
>
> For the vfloat*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, -
>
> For the vbool*_t types only below operations are allowed except
> the CMP and ALU. The CMP and ALU operations on vbool*_t is not
> well defined currently.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
>
> For the vint*x*m*_t tuple types are not suppored in this patch
> which is compatible with clang.
>
> This patch passed the below testsuites.
> * The riscv fully regression tests.

While at it, can you also add the support for feature detection macro
|__riscv_v_fixed_vlen

Thx,
-Vineet
|

>
> gcc/ChangeLog:
>
>   * config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
>   New static func to take care of the RVV types decorated by
>   the attributes.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv.cc |  87 +-
>  .../riscv/rvv/base/riscv_rvv_vector_bits-1.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-10.c |  53 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-11.c |  76 
>  .../riscv/rvv/base/riscv_rvv_vector_bits-12.c |  14 +++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-2.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-3.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-4.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-5.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-6.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-7.c  |  76 
>  .../riscv/rvv/base/riscv_rvv_vector_bits-8.c  |  75 
>  .../riscv/rvv/base/riscv_rvv_vector_bits-9.c  |  76 
>  .../riscv/rvv/base/riscv_rvv_vector_bits.h| 108 ++
>  14 files changed, 599 insertions(+), 2 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c
>  create mode 100644 
> 

[PATCH] c++/modules: inline namespace abi_tag streaming [PR110730]

2024-03-06 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

The unreduced testcase from this PR crashes at runtime ultimately
because we don't stream the abi_tag attribute on inline namespaces and
so the filesystem::current_path() call resolves to the non-C++11 ABI
version even though the C++11 ABI is active, leading to a crash when
destructing the call result (which contains an std::string member).

While we do stream the DECL_ATTRIBUTES of all decls that go through
the generic tree streaming routines, it seems namespaces are streamed
separately from other decls and we don't use the generic routines for
them.  So this patch makes us stream the abi_tag manually for (inline)
namespaces.

PR c++/110730

gcc/cp/ChangeLog:

* module.cc (module_state::write_namespaces): Stream the
abi_tag attribute of an inline namespace.
(module_state::read_namespaces): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/namespace-6_a.H: New test.
* g++.dg/modules/namespace-6_b.C: New test.
---
 gcc/cp/module.cc | 32 
 gcc/testsuite/g++.dg/modules/namespace-6_a.H | 10 ++
 gcc/testsuite/g++.dg/modules/namespace-6_b.C |  7 +
 3 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/namespace-6_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/namespace-6_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index d9e34e9a4b9..ce62c3341a6 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -15411,6 +15411,22 @@ module_state::write_namespaces (elf_out *to, 
vec spaces,
 
   sec.u (flags);
   write_location (sec, DECL_SOURCE_LOCATION (ns));
+
+  if (DECL_NAMESPACE_INLINE_P (ns))
+   {
+ if (tree tags = lookup_attribute ("abi_tag", DECL_ATTRIBUTES (ns)))
+   {
+ tags = TREE_VALUE (tags);
+ unsigned count = 0;
+ for (tree tag = tags; tag; tag = TREE_CHAIN (tag))
+   ++count;
+ sec.u (count);
+ for (tree tag = tags; tag; tag = TREE_CHAIN (tag))
+   sec.str (TREE_STRING_POINTER (TREE_VALUE (tag)));
+   }
+ else
+   sec.u (0);
+   }
 }
 
   sec.end (to, to->name (MOD_SNAME_PFX ".nms"), crc_p);
@@ -15441,6 +15457,7 @@ module_state::read_namespaces (unsigned num)
   /* See comment in write_namespace about why not bits.  */
   unsigned flags = sec.u ();
   location_t src_loc = read_location (sec);
+  unsigned tags_count = (flags & 2) ? sec.u () : 0;
 
   if (entity_index >= entity_num
  || !parent
@@ -15449,6 +15466,17 @@ module_state::read_namespaces (unsigned num)
   if (sec.get_overrun ())
break;
 
+  tree tags = NULL_TREE;
+  while (tags_count--)
+   {
+ size_t len;
+ const char *str = sec.str ();
+ if (sec.get_overrun ())
+   break;
+ tags = tree_cons (NULL_TREE, build_string (len + 1, str), tags);
+ tags = nreverse (tags);
+   }
+
   tree id = name ? get_identifier (from ()->name (name)) : NULL_TREE;
 
   dump () && dump ("Read namespace:%u %P%s%s%s%s",
@@ -15477,6 +15505,10 @@ module_state::read_namespaces (unsigned num)
DECL_MODULE_EXPORT_P (inner) = true;
}
 
+  if (tags)
+   DECL_ATTRIBUTES (inner)
+ = tree_cons (get_identifier ("abi_tag"), tags, DECL_ATTRIBUTES 
(inner));
+
   /* Install the namespace.  */
   (*entity_ary)[entity_lwm + entity_index] = inner;
   if (DECL_MODULE_IMPORT_P (inner))
diff --git a/gcc/testsuite/g++.dg/modules/namespace-6_a.H 
b/gcc/testsuite/g++.dg/modules/namespace-6_a.H
new file mode 100644
index 000..b412cbe6cbf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/namespace-6_a.H
@@ -0,0 +1,10 @@
+// PR c++/110730
+// { dg-additional-options -fmodule-header }
+// { dg-module-cmi {} }
+
+namespace std::filesystem {
+  inline namespace __cxx11 __attribute__((__abi_tag__("cxx11", "foo"))) {
+struct path { };
+  }
+  inline path current_path() { return {}; };
+}
diff --git a/gcc/testsuite/g++.dg/modules/namespace-6_b.C 
b/gcc/testsuite/g++.dg/modules/namespace-6_b.C
new file mode 100644
index 000..2ce41c7c9c7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/namespace-6_b.C
@@ -0,0 +1,7 @@
+// PR c++/110730
+// { dg-additional-options -fmodules-ts }
+
+import "namespace-6_a.H";
+
+int main() { std::filesystem::current_path(); }
+// { dg-final { scan-assembler _ZNSt10filesystem12current_pathB5cxx11B3fooEv } 
}
-- 
2.44.0.117.g43072b4ca1



[PATCH v8] C, ObjC: Add -Wunterminated-string-initialization

2024-03-06 Thread Alejandro Colomar
Warn about the following:

char  s[3] = "foo";

Initializing a char array with a string literal of the same length as
the size of the array is usually a mistake.  Rarely is the case where
one wants to create a non-terminated character sequence from a string
literal.

In some cases, for writing faster code, one may want to use arrays
instead of pointers, since that removes the need for storing an array of
pointers apart from the strings themselves.

char  *log_levels[]   = { "info", "warning", "err" };
vs.
char  log_levels[][7] = { "info", "warning", "err" };

This forces the programmer to specify a size, which might change if a
new entry is later added.  Having no way to enforce null termination is
very dangerous, however, so it is useful to have a warning for this, so
that the compiler can make sure that the programmer didn't make any
mistakes.  This warning catches the bug above, so that the programmer
will be able to fix it and write:

char  log_levels[][8] = { "info", "warning", "err" };

This warning already existed as part of -Wc++-compat, but this patch
allows enabling it separately.  It is also included in -Wextra, since
it may not always be desired (when unterminated character sequences are
wanted), but it's likely to be desired in most cases.

Since Wc++-compat now includes this warning, the test has to be modified
to expect the text of the new warning too, in .

gcc/c-family/ChangeLog:

* c.opt: Add -Wunterminated-string-initialization.

gcc/c/ChangeLog:

* c-typeck.cc (digest_init): Separate warnings about character
  arrays being initialized as unterminated character sequences
  with string literals, from -Wc++-compat, into a new warning,
  -Wunterminated-string-initialization.

gcc/ChangeLog:

* doc/invoke.texi: Document the new
  -Wunterminated-string-initialization.

gcc/testsuite/ChangeLog:

* gcc.dg/Wcxx-compat-14.c: Adapt the test to match the new text
  of the warning, which doesn't say anything about C++ anymore.
* gcc.dg/Wunterminated-string-initialization.c: New test.

Link: 
Link: 
Link: 

Acked-by: Doug McIlroy 
[Sandra: The documentation parts of the patch are OK.]
Reviewed-by: Sandra Loosemore 
Cc: "G. Branden Robinson" 
Cc: Ralph Corderoy 
Cc: Dave Kemper 
Cc: Larry McVoy 
Cc: Andrew Pinski 
Cc: Jonathan Wakely 
Cc: Andrew Clayton 
Cc: Martin Uecker 
Cc: David Malcolm 
Cc: Mike Stump 
Cc: Joseph Myers 
Signed-off-by: Alejandro Colomar 
---

Hi!

I've added a changelog to the commit message, as requested by Sandra,
and noted her review.

Have a lovely day!
Alex


Range-diff against v7:
1:  c0f3ffcca7a ! 1:  06236d0aa05 C, ObjC: Add 
-Wunterminated-string-initialization
@@ Commit message
 Since Wc++-compat now includes this warning, the test has to be 
modified
 to expect the text of the new warning too, in 
.
 
+gcc/c-family/ChangeLog:
+
+* c.opt: Add -Wunterminated-string-initialization.
+
+gcc/c/ChangeLog:
+
+* c-typeck.cc (digest_init): Separate warnings about character
+  arrays being initialized as unterminated character sequences
+  with string literals, from -Wc++-compat, into a new warning,
+  -Wunterminated-string-initialization.
+
+gcc/ChangeLog:
+
+* doc/invoke.texi: Document the new
+  -Wunterminated-string-initialization.
+
+gcc/testsuite/ChangeLog:
+
+* gcc.dg/Wcxx-compat-14.c: Adapt the test to match the new text
+  of the warning, which doesn't say anything about C++ anymore.
+* gcc.dg/Wunterminated-string-initialization.c: New test.
+
 Link: 
 Link: 
 Link: 

 Acked-by: Doug McIlroy 
+[Sandra: The documentation parts of the patch are OK.]
+Reviewed-by: Sandra Loosemore 
 Cc: "G. Branden Robinson" 
 Cc: Ralph Corderoy 
 Cc: Dave Kemper 
@@ Commit message
 Cc: David Malcolm 
 Cc: Mike Stump 
 Cc: Joseph Myers 
-Cc: Sandra Loosemore 
 Signed-off-by: Alejandro Colomar 
 
  ## gcc/c-family/c.opt ##

 gcc/c-family/c.opt|  4 
 gcc/c/c-typeck.cc |  6 +++---
 gcc/doc/invoke.texi   | 20 ++-
 gcc/testsuite/gcc.dg/Wcxx-compat-14.c |  2 +-
 .../Wunterminated-string-initialization.c |  6 

[PATCH] gomp: testsuite: improve compatibility of bad-array-section-3.c [PR113428]

2024-03-06 Thread Richard Earnshaw

This test generates different warnings on ilp32 targets because the size
of an integer matches the size of a pointer.  Avoid this by using
signed char.

gcc/testsuite:

PR testsuite/113428
* gcc.dg/gomp/bad-array-section-c-3.c: Use signed char instead
of int.
---

I think this fixes the issues seen on ilp32 machines, without substantially
changing what the test does, but a second set of eyes wouldn't hurt.

 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c b/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
index 8be15ced8c0..431af71c422 100644
--- a/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
+++ b/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
@@ -1,15 +1,15 @@
 /* { dg-do compile } */
 
 struct S {
-  int *ptr;
+  signed char *ptr;
 };
 
 int main()
 {
-  int arr[20];
+  signed char arr[20];
 
   /* Reject array section in compound initialiser.  */
-#pragma omp target map( (struct S) { .ptr = (int *) arr[5:5] } )
+#pragma omp target map( (struct S) { .ptr = (signed char *) arr[5:5] } )
 /* { dg-error {expected '\]' before ':' token} "" { target *-*-* } .-1 } */
 /* { dg-warning {cast to pointer from integer of different size} "" { target *-*-* } .-2 } */
 /* { dg-message {sorry, unimplemented: unsupported map expression} "" { target *-*-* } .-3 } */


Re: [patch, libgfortran] Part 2: PR105456 Child I/O does not propage iostat

2024-03-06 Thread Harald Anlauf

Hi Jerry,

can you please replace the user message in e.g. your new testcase
pr105456-wf.f90 by say:

piomsg="The users message containing % and %% and %s and other stuff"

This behaves as expected with Intel, but dies horribly with gfortran
after your patch!

Cheers,
Harald


On 3/6/24 05:06, Jerry D wrote:

On 3/5/24 1:51 PM, Harald Anlauf wrote:

Hi Jerry,

on further thought, do we sanitize 'child_iomsg'?
We pass it to snprintf as format.

Wouldn't a strncpy be sufficient?

Harald




Just to be safe I will bump char message[IOMSG_LEN] to char
message[IOMSG_LEN + 1]

This is like a C string vs a Fortran string length situation. snprintf
guarantees we don't exceed the child_iomsg_len and null terminates it.

I added 1 to:
  child_iomsg_len = string_len_trim (IOMSG_LEN, child_iomsg) + 1

Because snprintf was chopping off the last character of the fortran
string to put the null in. (zero based vs one based char array). I test
this with a very long string which exceeded the length and then reduced
it until I could see the desired end.

I have not tried running a test case with sanitize. I did check with
valgrind.  I will try the sanitize flags to see if we get a problem.  If
not will push.

Thanks for comments,

Jerry -


On 3/5/24 22:37, Harald Anlauf wrote:

Hi Jerry,

I think there is the risk of buffer overrun in the following places:

+ char message[IOMSG_LEN];
+ child_iomsg_len = string_len_trim (IOMSG_LEN, child_iomsg)
+ 1;
   free_line (dtp);
   snprintf (message, child_iomsg_len, child_iomsg);
   generate_error (>common,
dtp->u.p.child_saved_iostat,

plus several more.  Wouldn't it be better to increase the size of
message by one?

Thanks,
Harald


On 3/5/24 04:15, Jerry D wrote:

On 3/1/24 11:24 AM, rep.dot@gmail.com wrote:

Hi Jerry and Steve,

On 29 February 2024 19:28:19 CET, Jerry D 
wrote:

On 2/29/24 10:13 AM, Steve Kargl wrote:

On Thu, Feb 29, 2024 at 09:36:43AM -0800, Jerry D wrote:

On 2/29/24 1:47 AM, Bernhard Reutner-Fischer wrote:


And, just for my own education, the length limitation of iomsg to
255
chars is not backed by the standard AFAICS, right? It's just our
STRERR_MAXSZ?


Yes, its what we have had for a long lone time. Once you throw an
error
things get very processor dependent. I found MSGLEN set to 100 and
IOMSG_len
to 256. Nothing magic about it.



There is no restriction on the length for the iomsg-variable
that receives the generated error message.  In fact, if the
iomsg-variable has a deferred-length type parameter, then
(re)-allocation to the exact length is expected.

    F2023

    12.11.6 IOMSG= specifier

    If an error, end-of-file, or end-of-record condition occurs
during
    execution of an input/output statement, iomsg-variable is
assigned
    an explanatory message, as if by intrinsic assignment. If no
such
    condition occurs, the definition status and value of
iomsg-variable
    are unchanged.
   character(len=23) emsg
read(fd,*,iomsg=emsg)

Here, the generated iomsg is either truncated to a length of 23
or padded with blanks to a length of 23.

character(len=:), allocatable :: emsg
read(fd,*,iomsg=emsg)

Here, emsg should have the length of whatever error message was
generated.
   HTH



Well, currently, if someone uses a larger string than 256 we are
going to chop it off.

Do we want to process this differently now?


Yes. There is some odd hunk about discrepancy of passed len and
actual len afterwards in 22-007-r1, IIRC. Didn't look closely though.


--- snip ---

Attached is the revised patch using the already available
string_len_trim function.

This hunk is only executed if a user has not passed an iostat or iomsg
variable in the parent I/O statement and an error is triggered which
terminates execution of the program. In this case, the iomsg string is
provided in the usual error message in a "processor defined" way.

(F2023):

12.6.4.8.3 Executing defined input/output data transfers
---
11 If the iostat argument of the defined input/output procedure has a
nonzero value when that procedure returns, and the processor therefore
terminates execution of the program as described in 12.11, the
processor shall make the value of the iomsg argument available in a
processor-dependent manner.
---

OK for trunk?

Regards,

Jerry















Re: [PATCH] Fortran: error recovery while simplifying expressions [PR103707, PR106987]

2024-03-06 Thread Harald Anlauf

Hi Paul,

thanks for reviewing the patch, and your trust in me :-)

Backporting to 13-branch seems easily feasible (needs another small
queued backport on which this patch depends), but going further is
definitely out of the question...  Will wait a couple of weeks though.

Harald

On 3/6/24 11:51, Paul Richard Thomas wrote:

Hi Harald,

This all looks good to me. OK for mainline and, according to intestinal
fortitude on your part, earlier branches.

Thanks

Paul


On Tue, 5 Mar 2024 at 21:24, Harald Anlauf  wrote:


Dear all,

error recovery on arithmetic errors during simplification has bugged
me for a long time, especially since the occurence of ICEs depended
on whether -frange-check is specified or not, whether array ctors
were involved, etc.

I've now come up with the attached patch that classifies the arithmetic
result codes into "hard" and "soft" errors.

A "soft" error means that it is an overflow or other exception (e.g. NaN)
that is ignored with -fno-range-check.  After the patch, a soft error
will not stop simplification (a hard one will), and error status will be
passed along.

I took this opportunity to change the emitted error for division by zero
for real and complex division dependent on whether the numerator is
regular or not.  This makes e.g. (0.)/0 a NaN and now says so, in
accordance with some other brands.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Other comments?

Thanks,
Harald








Re: [PATCH] OpenMP: warn about iteration var modifications in loop body

2024-03-06 Thread Frederik Harwath

Ping.


The Linaro CI has kindly pointed me to two test regressions that I had
missed. I have adjust the test expectations in the updated patch which I
have attached.

Frederik


On 28.02.24 8:32 PM, Frederik Harwath wrote:

Hi,

this patch implements a warning about (some simple cases of direct)
modifications of iteration variables in OpenMP loops which are
forbidden according to the OpenMP specification. I think this can be
helpful, especially for new OpenMP users. I have implemented this
after I observed some confusion concerning this topic recently.
The check is implemented during gimplification. It reuses the
"loop_iter_var" vector in the "gimplify_omp_ctx" which was previously
only used for "doacross" handling to identify the loop iteration
variables during the gimplification of MODIFY_EXPRs in omp_for bodies.
I have only added a common C/C++ test because I don't see any special
C++ constructs for which a warning *should* be emitted and Fortran
rejects modifications of iteration variables in do loops in general.

I have run "make check" on x86_64-linux-gnu and not observed any
regressions.

Is it ok to commit this?

Best regards,
Frederik
From d4fb1710bfa1d5b66979db1f0aea2d5c68ab2264 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 27 Feb 2024 21:07:00 +
Subject: [PATCH] OpenMP: warn about iteration var modifications in loop body

OpenMP loop iteration variables may not be changed by user code in the
loop body according to the OpenMP specification.  In general, the
compiler cannot enforce this, but nevertheless simple cases in which
the user modifies the iteration variable directly in the loop body
(in contrast to, e.g., modifications through a pointer) can be recognized. A
warning should be useful, for instance, to new users of OpenMP.

This commit implements a warning about forbidden iteration var modifications
during gimplification. It reuses the "loop_iter_var" vector in the
"gimplify_omp_ctx" which was previously only used for "doacross" handling to
identify the loop iteration variables during the gimplification of MODIFY_EXPRs
in omp_for bodies.

gcc/ChangeLog:

	* gimplify.cc (struct gimplify_omp_ctx): Add field "in_omp_for_body" to
	recognize the gimplification state during which the new warning should
	be emitted. Add field "is_doacross" to distinguish the original use of
	"loop_iter_var" from its new use.
	(new_omp_context): Initialize new gimplify_omp_ctx fields.
	(gimplify_modify_expr): Emit warning if iter var is modified.
	(gimplify_omp_for): Make initialization and filling of loop_iter_var
	vector unconditional and adjust new gimplify_omp_ctx fields before
	gimplifying the omp_for body.
	(gimplify_omp_ordered): Check for do_across field in addition to
	emptiness check on loop_iter_var vector since the vector is now always
	being filled.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr92347.c: Adjust.
	* gcc.target/aarch64/sve/pr96195.c: Adjust.
	* c-c++-common/gomp/iter-var-modification.c: New test.

Signed-off-by: Frederik Harwath  
---
 gcc/gimplify.cc   |  54 +++---
 .../c-c++-common/gomp/iter-var-modification.c | 100 ++
 gcc/testsuite/gcc.dg/vect/pr92347.c   |   2 +-
 .../gcc.target/aarch64/sve/pr96195.c  |   2 +-
 4 files changed, 140 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/iter-var-modification.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 7f79b3cc7e6..a74ad987cf7 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -235,6 +235,8 @@ struct gimplify_omp_ctx
   bool order_concurrent;
   bool has_depend;
   bool in_for_exprs;
+  bool in_omp_for_body;
+  bool is_doacross;
   int defaultmap[5];
 };
 
@@ -456,6 +458,10 @@ new_omp_context (enum omp_region_type region_type)
   c->privatized_types = new hash_set;
   c->location = input_location;
   c->region_type = region_type;
+  c->loop_iter_var.create (0);
+  c->in_omp_for_body = false;
+  c->is_doacross = false;
+
   if ((region_type & ORT_TASK) == 0)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
@@ -6312,6 +6318,18 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
   gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR
 	  || TREE_CODE (*expr_p) == INIT_EXPR);
 
+  if (gimplify_omp_ctxp && gimplify_omp_ctxp->in_omp_for_body)
+{
+  size_t num_vars = gimplify_omp_ctxp->loop_iter_var.length () / 2;
+  for (size_t i = 0; i < num_vars; i++)
+	{
+	  if (*to_p == gimplify_omp_ctxp->loop_iter_var[2 * i + 1])
+	warning_at (input_location, OPT_Wopenmp,
+			"forbidden modification of iteration variable %qE in "
+			"OpenMP loop", *to_p);
+	}
+}
+
   /* Trying to simplify a clobber using normal logic doesn't work,
  so handle it here.  */
   if (TREE_CLOBBER_P (*from_p))
@@ -15334,6 +15352,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 	  == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt)));
   gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt))
 	

[PR target/113001] Fix incorrect operand swapping in conditional move

2024-03-06 Thread Jeff Law

This bug totally fell off my radar.  Sorry about that.

We have some special casing the conditional move expander to simplify a 
conditional move when comparing a register against zero and that same 
register is one of the arms.


Specifically a (eq (reg) (const_int 0)) where reg is also the true arm 
or (ne (reg) (const_int 0)) where reg is the false arm need not use the 
fully generalized conditional move, thus saving an instruction for those 
cases.


In the NE case we swapped the operands, but didn't swap the condition, 
which led to the ICE due to an unrecognized pattern.  THe backend 
actually has distinct patterns for those two cases.  So swapping the 
operands is neither needed nor advisable.


Regression tested on rv64gc and verified the new tests pass.

Pushing to the trunk.

jeff
commit 10cbfcd60f9e5bdbe486e1c0192e0f168d899b77
Author: Jeff Law 
Date:   Wed Mar 6 09:50:44 2024 -0700

[PR target/113001] Fix incorrect operand swapping in conditional move

This bug totally fell off my radar.  Sorry about that.

We have some special casing the conditional move expander to simplify a
conditional move when comparing a register against zero and that same 
register
is one of the arms.

Specifically a (eq (reg) (const_int 0)) where reg is also the true arm or 
(ne
(reg) (const_int 0)) where reg is the false arm need not use the fully
generalized conditional move, thus saving an instruction for those cases.

In the NE case we swapped the operands, but didn't swap the condition, which
led to the ICE due to an unrecognized pattern.  THe backend actually has
distinct patterns for those two cases.  So swapping the operands is neither
needed nor advisable.

Regression tested on rv64gc and verified the new tests pass.

Pushing to the trunk.

PR target/113001
PR target/112871
gcc/
* config/riscv/riscv.cc (expand_conditional_move): Do not swap
operands when the comparison operand is the same as the false
arm for a NE test.

gcc/testsuite
* gcc.target/riscv/zicond-ice-3.c: New test.
* gcc.target/riscv/zicond-ice-4.c: New test.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 691d967de29..680c4a728e9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4633,8 +4633,6 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
   || (code == NE && rtx_equal_p (alt, op0)))
{
  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
- if (!rtx_equal_p (cons, op0))
-   std::swap (alt, cons);
  alt = force_reg (mode, alt);
  emit_insn (gen_rtx_SET (dest,
  gen_rtx_IF_THEN_ELSE (mode, cond,
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-ice-3.c 
b/gcc/testsuite/gcc.target/riscv/zicond-ice-3.c
new file mode 100644
index 000..650986825ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-ice-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32d" { target { rv32 } } } */
+
+long a, b;
+int c, d;
+void e(long *f) {
+  (b = *f) && --b;
+  for (; c;)
+;
+}
+void g() {
+  for (; d; d--)
+e();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-ice-4.c 
b/gcc/testsuite/gcc.target/riscv/zicond-ice-4.c
new file mode 100644
index 000..2be02c78a08
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-ice-4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32d" { target { rv32 } } } */
+
+short a, c;
+int b, d, i;
+volatile char e;
+static int f[] = {1, 1};
+long g;
+int volatile h;
+short(j)() { return b ? a : 0; }
+void k() {
+l:
+  h;
+  g = 0;
+  for (; g <= 2; g++) {
+d | ((i || j() & (0 == f[g])) ^ i) && e;
+if (c)
+  goto l;
+  }
+}
+


[committed] i386: Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move

2024-03-06 Thread Uros Bizjak
Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move and
use generic code instead.

No functional changes.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_move) [TARGET_MACHO]:
Eliminate common code and use generic code instead.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32} and by
Iain on i686-darwin9, i686 and x86_64-darwin17, x86_64-darwin19, 21,
23.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 3b1685ae448..2210e6f7cc8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -471,9 +471,9 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   if ((flag_pic || MACHOPIC_INDIRECT)
   && symbolic_operand (op1, mode))
 {
+#if TARGET_MACHO
   if (TARGET_MACHO && !TARGET_64BIT)
{
-#if TARGET_MACHO
  /* dynamic-no-pic */
  if (MACHOPIC_INDIRECT)
{
@@ -490,33 +490,18 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  emit_insn (insn);
  return;
}
- if (GET_CODE (op0) == MEM)
-   op1 = force_reg (Pmode, op1);
- else
-   {
- rtx temp = op0;
- if (GET_CODE (temp) != REG)
-   temp = gen_reg_rtx (Pmode);
- temp = legitimize_pic_address (op1, temp);
- if (temp == op0)
-   return;
- op1 = temp;
-   }
-  /* dynamic-no-pic */
-#endif
}
-  else
+#endif
+
+  if (MEM_P (op0))
+   op1 = force_reg (mode, op1);
+  else if (!(TARGET_64BIT && x86_64_movabs_operand (op1, DImode)))
{
- if (MEM_P (op0))
-   op1 = force_reg (mode, op1);
- else if (!(TARGET_64BIT && x86_64_movabs_operand (op1, DImode)))
-   {
- rtx reg = can_create_pseudo_p () ? NULL_RTX : op0;
- op1 = legitimize_pic_address (op1, reg);
- if (op0 == op1)
-   return;
- op1 = convert_to_mode (mode, op1, 1);
-   }
+ rtx reg = can_create_pseudo_p () ? NULL_RTX : op0;
+ op1 = legitimize_pic_address (op1, reg);
+ if (op0 == op1)
+   return;
+ op1 = convert_to_mode (mode, op1, 1);
}
 }
   else


Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-03-06 Thread FX Coudert
> Hmm I recall trying it and finding a problem - was there some different fix 
> applied
> in the end?

The bug is still open, I don’t think a patch was applied, and I don’t find any 
email to the list stating what the problem could be.

FX

Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-03-06 Thread Iain Sandoe



> On 6 Mar 2024, at 13:54, Sam James  wrote:
> 
> FX Coudert  writes:
> 
>> I would like to patch this patch from September 2023:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631611.html
>> 
>> This bug is now hitting macOS in the latest version of Xcode (it was 
>> originally seen on freebsd).
>> I confirm that the patch is restoring bootstrap on x86_64-apple-darwin23
> 
> Iain hit an issue with it and I never got a chance to look into how to
> fix it.

Hmm I recall trying it and finding a problem - was there some different fix 
applied
in the end?

Iain

> 
>> 
>> OK to push?
>> FX
>> 
>> 
>> 
>> 
>> 
>>> Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632
>>> 
>>> When building gcc's C++ sources against recent libc++, the poisoning of
>>> the ctype macros due to including safe-ctype.h before including C++
>>> standard headers such as , , etc, causes many compilation
>>> errors, similar to:
>>> 
>>> In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
>>> In file included from /home/dim/src/gcc/master/gcc/system.h:233:
>>> In file included from /usr/include/c++/v1/vector:321:
>>> In file included from
>>> /usr/include/c++/v1/__format/formatter_bool.h:20:
>>> In file included from
>>> /usr/include/c++/v1/__format/formatter_integral.h:32:
>>> In file included from /usr/include/c++/v1/locale:202:
>>> /usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
>>> only applies to structs, variables, functions, and namespaces
>>> 546 | _LIBCPP_INLINE_VISIBILITY
>>> | ^
>>> /usr/include/c++/v1/__config:813:37: note: expanded from macro
>>> '_LIBCPP_INLINE_VISIBILITY'
>>> 813 | # define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
>>> | ^
>>> /usr/include/c++/v1/__config:792:26: note: expanded from macro
>>> '_LIBCPP_HIDE_FROM_ABI'
>>> 792 |
>>> __attribute__((__abi_tag__(_LIBCPP_TOSTRING(
>>> _LIBCPP_VERSIONED_IDENTIFIER
>>> | ^
>>> In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
>>> In file included from /home/dim/src/gcc/master/gcc/system.h:233:
>>> In file included from /usr/include/c++/v1/vector:321:
>>> In file included from
>>> /usr/include/c++/v1/__format/formatter_bool.h:20:
>>> In file included from
>>> /usr/include/c++/v1/__format/formatter_integral.h:32:
>>> In file included from /usr/include/c++/v1/locale:202:
>>> /usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
>>> declaration list
>>> 547 | char_type toupper(char_type __c) const
>>> | ^
>>> /usr/include/c++/v1/__locale:553:48: error: too many arguments
>>> provided to function-like macro invocation
>>> 553 | const char_type* toupper(char_type* __low, const
>>> char_type* __high) const
>>> | ^
>>> /home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
>>> macro 'toupper' defined here
>>> 146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
>>> | ^
>>> 
>>> This is because libc++ uses different transitive includes than
>>> libstdc++, and some of those transitive includes pull in various ctype
>>> declarations (typically via ).
>>> 
>>> There was already a special case for including  before
>>> safe-ctype.h, so move the rest of the C++ standard header includes to
>>> the same location, to fix the problem.
>>> 



[PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-06 Thread Evgeny Karpov
Wednesday, March 6, 2024
LIU Hao wrote:

> May I suggest you keep the mcf thread model for aarch-w64-mingw32? I
> requested Martin Storsjö to test it on a physical Windows 11 on ARM machine
> with Clang and all tests passed. I think it should work once the GCC support 
> is
> complete.
> 
> 
> --
> Best regards,
> LIU Hao

The MCF thread model will be added to the aarch64-w64-mingw32 target
in the next patch series. Most probably, this will occur when the
mingw unit tests are integrated into the aarch64-w64-mingw32 target.

Regards,
Evgeny


Re: [PATCH] RISC-V: Use vmv1r.v instead of vmv.v.v for fma output reloads [PR114200].

2024-03-06 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-03-06 21:44
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai
CC: rdapp.gcc; jeffreyalaw
Subject: [PATCH] RISC-V: Use vmv1r.v instead of vmv.v.v for fma output reloads 
[PR114200].
Hi,
 
three-operand instructions like vmacc are modeled with an implicit
output reload when the output does not match one of the operands.  For
this we use vmv.v.v which is subject to length masking.
 
In a situation where the current vl is less than the full vlenb
and the fma's result value is used as input for a vector reduction
(which is never length masked) we effectively only reduce vl
elements.  The masked-out elements are relevant for the
reduction, though, leading to a wrong result.
 
This patch replaces the vmv reloads by full-register reloads.
 
Regtested on rv64, rv32 is running.
 
Regards
Robin
 
gcc/ChangeLog:
 
PR target/114200
PR target/114202
 
* config/riscv/vector.md: Use vmv[1248]r.v instead of vmv.v.v.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr114200.c: New test.
* gcc.target/riscv/rvv/autovec/pr114202.c: New test.
---
gcc/config/riscv/vector.md| 96 +--
.../gcc.target/riscv/rvv/autovec/pr114200.c   | 18 
.../gcc.target/riscv/rvv/autovec/pr114202.c   | 20 
3 files changed, 86 insertions(+), 48 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114200.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114202.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f89f9c2fa86..8b1c24c5d79 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -5351,10 +5351,10 @@ (define_insn "*pred_mul_plus_undef"
   "@
vmadd.vv\t%0,%4,%5%p1
vmacc.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%4\;vmacc.vv\t%0,%3,%4%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vv\t%0,%3,%4%p1
vmadd.vv\t%0,%4,%5%p1
vmacc.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%5\;vmacc.vv\t%0,%3,%4%p1"
+   vmv%m5r.v\t%0,%5\;vmacc.vv\t%0,%3,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
@@ -5378,9 +5378,9 @@ (define_insn "*pred_madd"
   "TARGET_VECTOR"
   "@
vmadd.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1
+   vmv%m2r.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1
vmadd.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1"
+   vmv%m2r.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "2")
@@ -5409,9 +5409,9 @@ (define_insn "*pred_macc"
   "TARGET_VECTOR"
   "@
vmacc.vv\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1
+   vmv%m4r.v\t%0,%4;vmacc.vv\t%0,%2,%3%p1
vmacc.vv\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1"
+   vmv%m4r.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5462,9 +5462,9 @@ (define_insn "*pred_madd_scalar"
   "TARGET_VECTOR"
   "@
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1"
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5494,9 +5494,9 @@ (define_insn "*pred_macc_scalar"
   "TARGET_VECTOR"
   "@
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5562,9 +5562,9 @@ (define_insn "*pred_madd_extended_scalar"
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1
+   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1"
+   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5595,9 +5595,9 @@ (define_insn "*pred_macc_extended_scalar"
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5649,10 +5649,10 @@ (define_insn "*pred_minus_mul_undef"
   "@
vnmsub.vv\t%0,%4,%5%p1
vnmsac.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1
+   vmv%m3r.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1
vnmsub.vv\t%0,%4,%5%p1
vnmsac.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1"
+   vmv%m3r.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
@@ -5676,9 +5676,9 @@ (define_insn "*pred_nmsub"
   "TARGET_VECTOR"
   "@

C++ patch ping

2024-03-06 Thread Jakub Jelinek
Hi!

https://gcc.gnu.org/pipermail/gcc-patches/2024-February/thread.html#645781
[PATCH] c++: Fix up parameter pack diagnostics on xobj vs. varargs functions 
[PR113802]
The thread contains two possible further versions of the patch.

https://gcc.gnu.org/pipermail/gcc-patches/2024-February/thread.html#645782
[PATCH] dwarf2out: Emit DW_AT_export_symbols on anon unions/structs [PR113918]
The thread contains another version of the patch at the end.

https://gcc.gnu.org/pipermail/gcc-patches/2024-February/thread.html#645916
[PATCH] c-family, c++: Fix up handling of types which may have padding in 
__atomic_{compare_}exchange
Seems Richi would like to use MEM_REF in the c-family code, that is then
the https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646040.html
version.

Thanks

Jakub



Re: amdgcn: additional gfx1030/gfx1100 support: adjust test cases

2024-03-06 Thread Andrew Stubbs

On 06/03/2024 13:49, Thomas Schwinge wrote:

Hi!

On 2024-01-24T12:43:04+, Andrew Stubbs  wrote:

This [...]


... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support", and included the following:


--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3555,30 +3555,63 @@
  ;; }}}
  ;; {{{ Int/int conversions
  
+(define_code_iterator all_convert [truncate zero_extend sign_extend])

  (define_code_iterator zero_convert [truncate zero_extend])
  (define_code_attr convop [
(sign_extend "extend")
(zero_extend "zero_extend")
(truncate "trunc")])
  
-(define_insn "2"

+(define_expand "2"
+  [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
+(all_convert:V_INT_1REG
+ (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
+  "")
+
+(define_insn "*_sdwa"
[(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
  (zero_convert:V_INT_1REG
  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
-  ""
+  "!TARGET_RDNA3"
"v_mov_b32_sdwa\t%0, %1 dst_sel: dst_unused:UNUSED_PAD 
src0_sel:"
[(set_attr "type" "vop_sdwa")
 (set_attr "length" "8")])
  
-(define_insn "extend2"

+(define_insn "extend_sdwa"
[(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
  (sign_extend:V_INT_1REG
  (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
-  ""
+  "!TARGET_RDNA3"
"v_mov_b32_sdwa\t%0, sext(%1) src0_sel:"
[(set_attr "type" "vop_sdwa")
 (set_attr "length" "8")])
  
+(define_insn "*_shift"

+  [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
+(all_convert:V_INT_1REG
+ (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
+  "TARGET_RDNA3"
+  {
+enum {extend, zero_extend, trunc};
+rtx shiftwidth = (mode == QImode
+ || mode == QImode
+ ? GEN_INT (24)
+ : mode == HImode
+   || mode == HImode
+ ? GEN_INT (16)
+ : NULL);
+operands[2] = shiftwidth;
+
+if (!shiftwidth)
+  return "v_mov_b32 %0, %1";
+else if ( == extend ||  == trunc)
+  return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0";
+else
+  return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0";
+  }
+  [(set_attr "type" "mult")
+   (set_attr "length" "8")])


OK to push the attached
"amdgcn: additional gfx1030/gfx1100 support: adjust test cases"?
Tested 'gcn.exp' for all '-march'es.


LGTM.

Andrew



Re: [PATCH] c++/modules: Prevent emission of really-extern vtables in importers [PR114229]

2024-03-06 Thread Jason Merrill

On 3/5/24 22:06, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently, reading a variable definition always marks that decl as
DECL_NOT_REALLY_EXTERN, with anything else imported still being
considered external. This is not sufficient for vtables, however; for an
extern template, a vtable may be generated (and its definition emitted)
but nonetheless the vtable should only be emitted in the TU where that
template is actually instantiated.


Does the vtable go through import_export_decl?  I've been thinking that 
that function (and import_export_class) need to be more module-aware. 
Would it make sense to do that rather than stream DECL_NOT_REALLY_EXTERN?


Jason



Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-03-06 Thread Sam James
FX Coudert  writes:

> I would like to patch this patch from September 2023:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631611.html
>
> This bug is now hitting macOS in the latest version of Xcode (it was 
> originally seen on freebsd).
> I confirm that the patch is restoring bootstrap on x86_64-apple-darwin23

Iain hit an issue with it and I never got a chance to look into how to
fix it.

>
> OK to push?
> FX
>
>
>
>
>
>> Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632
>> 
>> When building gcc's C++ sources against recent libc++, the poisoning of
>> the ctype macros due to including safe-ctype.h before including C++
>> standard headers such as , , etc, causes many compilation
>> errors, similar to:
>> 
>> In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
>> In file included from /home/dim/src/gcc/master/gcc/system.h:233:
>> In file included from /usr/include/c++/v1/vector:321:
>> In file included from
>> /usr/include/c++/v1/__format/formatter_bool.h:20:
>> In file included from
>> /usr/include/c++/v1/__format/formatter_integral.h:32:
>> In file included from /usr/include/c++/v1/locale:202:
>> /usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
>> only applies to structs, variables, functions, and namespaces
>> 546 | _LIBCPP_INLINE_VISIBILITY
>> | ^
>> /usr/include/c++/v1/__config:813:37: note: expanded from macro
>> '_LIBCPP_INLINE_VISIBILITY'
>> 813 | # define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
>> | ^
>> /usr/include/c++/v1/__config:792:26: note: expanded from macro
>> '_LIBCPP_HIDE_FROM_ABI'
>> 792 |
>> __attribute__((__abi_tag__(_LIBCPP_TOSTRING(
>> _LIBCPP_VERSIONED_IDENTIFIER
>> | ^
>> In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
>> In file included from /home/dim/src/gcc/master/gcc/system.h:233:
>> In file included from /usr/include/c++/v1/vector:321:
>> In file included from
>> /usr/include/c++/v1/__format/formatter_bool.h:20:
>> In file included from
>> /usr/include/c++/v1/__format/formatter_integral.h:32:
>> In file included from /usr/include/c++/v1/locale:202:
>> /usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
>> declaration list
>> 547 | char_type toupper(char_type __c) const
>> | ^
>> /usr/include/c++/v1/__locale:553:48: error: too many arguments
>> provided to function-like macro invocation
>> 553 | const char_type* toupper(char_type* __low, const
>> char_type* __high) const
>> | ^
>> /home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
>> macro 'toupper' defined here
>> 146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
>> | ^
>> 
>> This is because libc++ uses different transitive includes than
>> libstdc++, and some of those transitive includes pull in various ctype
>> declarations (typically via ).
>> 
>> There was already a special case for including  before
>> safe-ctype.h, so move the rest of the C++ standard header includes to
>> the same location, to fix the problem.
>> 


signature.asc
Description: PGP signature


amdgcn: additional gfx1030/gfx1100 support: adjust test cases (was: [PATCH] amdgcn: additional gfx1100 support)

2024-03-06 Thread Thomas Schwinge
Hi!

On 2024-01-24T12:43:04+, Andrew Stubbs  wrote:
> This [...]

... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support", and included the following:

> --- a/gcc/config/gcn/gcn-valu.md
> +++ b/gcc/config/gcn/gcn-valu.md
> @@ -3555,30 +3555,63 @@
>  ;; }}}
>  ;; {{{ Int/int conversions
>  
> +(define_code_iterator all_convert [truncate zero_extend sign_extend])
>  (define_code_iterator zero_convert [truncate zero_extend])
>  (define_code_attr convop [
>   (sign_extend "extend")
>   (zero_extend "zero_extend")
>   (truncate "trunc")])
>  
> -(define_insn "2"
> +(define_expand "2"
> +  [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
> +(all_convert:V_INT_1REG
> +   (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> +  "")
> +
> +(define_insn "*_sdwa"
>[(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
>  (zero_convert:V_INT_1REG
> (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> -  ""
> +  "!TARGET_RDNA3"
>"v_mov_b32_sdwa\t%0, %1 dst_sel: dst_unused:UNUSED_PAD 
> src0_sel:"
>[(set_attr "type" "vop_sdwa")
> (set_attr "length" "8")])
>  
> -(define_insn "extend2"
> +(define_insn "extend_sdwa"
>[(set (match_operand:V_INT_1REG 0 "register_operand"   "=v")
>  (sign_extend:V_INT_1REG
> (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> -  ""
> +  "!TARGET_RDNA3"
>"v_mov_b32_sdwa\t%0, sext(%1) src0_sel:"
>[(set_attr "type" "vop_sdwa")
> (set_attr "length" "8")])
>  
> +(define_insn "*_shift"
> +  [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
> +(all_convert:V_INT_1REG
> +   (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> +  "TARGET_RDNA3"
> +  {
> +enum {extend, zero_extend, trunc};
> +rtx shiftwidth = (mode == QImode
> +   || mode == QImode
> +   ? GEN_INT (24)
> +   : mode == HImode
> + || mode == HImode
> +   ? GEN_INT (16)
> +   : NULL);
> +operands[2] = shiftwidth;
> +
> +if (!shiftwidth)
> +  return "v_mov_b32 %0, %1";
> +else if ( == extend ||  == trunc)
> +  return "v_lshlrev_b32\t%0, %2, %1\;v_ashrrev_i32\t%0, %2, %0";
> +else
> +  return "v_lshlrev_b32\t%0, %2, %1\;v_lshrrev_b32\t%0, %2, %0";
> +  }
> +  [(set_attr "type" "mult")
> +   (set_attr "length" "8")])

OK to push the attached
"amdgcn: additional gfx1030/gfx1100 support: adjust test cases"?
Tested 'gcn.exp' for all '-march'es.


Grüße
 Thomas


>From 04b83e9aa19b02b9805e03f31db14325bb00e737 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Mar 2024 10:40:39 +0100
Subject: [PATCH] amdgcn: additional gfx1030/gfx1100 support: adjust test cases

The "SDWA" changes in commit 99890e15527f1f04caef95ecdd135c9f1a077f08
"amdgcn: additional gfx1030/gfx1100 support" caused a few regressions:

PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler zero_extendv64qiv64si2

PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler zero_extendv64hiv64si2

PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler zero_extendv64qiv64si2

PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler zero_extendv64hiv64si2

Those test cases need corresponding adjustment.

	gcc/testsuite/
	* gcc.target/gcn/sram-ecc-3.c: Adjust.
	* gcc.target/gcn/sram-ecc-4.c: Likewise.
	* gcc.target/gcn/sram-ecc-7.c: Likewise.
	* gcc.target/gcn/sram-ecc-8.c: Likewise.
---
 gcc/testsuite/gcc.target/gcn/sram-ecc-3.c | 2 +-
 gcc/testsuite/gcc.target/gcn/sram-ecc-4.c | 2 +-
 gcc/testsuite/gcc.target/gcn/sram-ecc-7.c | 2 +-
 gcc/testsuite/gcc.target/gcn/sram-ecc-8.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c
index 692d4578b66..bc89e3542d2 100644
--- a/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c
+++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-3.c
@@ -18,4 +18,4 @@ f ()
 a[n] = b[n];
 }
 
-/* { dg-final { scan-assembler "zero_extendv64qiv64si2" } } */
+/* { dg-final { scan-assembler "(\\\*zero_extendv64qiv64si_sdwa|\\\*zero_extendv64qiv64si_shift)" } } */
diff --git a/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c b/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c
index 61b8d552759..ff7e2d0bda5 100644
--- a/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c
+++ b/gcc/testsuite/gcc.target/gcn/sram-ecc-4.c
@@ -18,4 +18,4 @@ f ()
 a[n] = b[n];
 }
 
-/* { dg-final { scan-assembler "zero_extendv64hiv64si2" } } */
+/* { dg-final { scan-assembler "(\\\*zero_extendv64hiv64si_sdwa|\\\*zero_extendv64hiv64si_shift)" } } */
diff --git 

[PATCH] RISC-V: Use vmv1r.v instead of vmv.v.v for fma output reloads [PR114200].

2024-03-06 Thread Robin Dapp
Hi,

three-operand instructions like vmacc are modeled with an implicit
output reload when the output does not match one of the operands.  For
this we use vmv.v.v which is subject to length masking.

In a situation where the current vl is less than the full vlenb
and the fma's result value is used as input for a vector reduction
(which is never length masked) we effectively only reduce vl
elements.  The masked-out elements are relevant for the
reduction, though, leading to a wrong result.

This patch replaces the vmv reloads by full-register reloads.

Regtested on rv64, rv32 is running.

Regards
 Robin

gcc/ChangeLog:

PR target/114200
PR target/114202

* config/riscv/vector.md: Use vmv[1248]r.v instead of vmv.v.v.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr114200.c: New test.
* gcc.target/riscv/rvv/autovec/pr114202.c: New test.
---
 gcc/config/riscv/vector.md| 96 +--
 .../gcc.target/riscv/rvv/autovec/pr114200.c   | 18 
 .../gcc.target/riscv/rvv/autovec/pr114202.c   | 20 
 3 files changed, 86 insertions(+), 48 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114200.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114202.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f89f9c2fa86..8b1c24c5d79 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -5351,10 +5351,10 @@ (define_insn "*pred_mul_plus_undef"
   "@
vmadd.vv\t%0,%4,%5%p1
vmacc.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%4\;vmacc.vv\t%0,%3,%4%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vv\t%0,%3,%4%p1
vmadd.vv\t%0,%4,%5%p1
vmacc.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%5\;vmacc.vv\t%0,%3,%4%p1"
+   vmv%m5r.v\t%0,%5\;vmacc.vv\t%0,%3,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
 
@@ -5378,9 +5378,9 @@ (define_insn "*pred_madd"
   "TARGET_VECTOR"
   "@
vmadd.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1
+   vmv%m2r.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1
vmadd.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1"
+   vmv%m2r.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "2")
@@ -5409,9 +5409,9 @@ (define_insn "*pred_macc"
   "TARGET_VECTOR"
   "@
vmacc.vv\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1
+   vmv%m4r.v\t%0,%4;vmacc.vv\t%0,%2,%3%p1
vmacc.vv\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1"
+   vmv%m4r.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5462,9 +5462,9 @@ (define_insn "*pred_madd_scalar"
   "TARGET_VECTOR"
   "@
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1"
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5494,9 +5494,9 @@ (define_insn "*pred_macc_scalar"
   "TARGET_VECTOR"
   "@
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5562,9 +5562,9 @@ (define_insn "*pred_madd_extended_scalar"
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1
+   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1
vmadd.vx\t%0,%2,%4%p1
-   vmv.v.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1"
+   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5595,9 +5595,9 @@ (define_insn "*pred_macc_extended_scalar"
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
vmacc.vx\t%0,%2,%3%p1
-   vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5649,10 +5649,10 @@ (define_insn "*pred_minus_mul_undef"
   "@
vnmsub.vv\t%0,%4,%5%p1
vnmsac.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1
+   vmv%m3r.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1
vnmsub.vv\t%0,%4,%5%p1
vnmsac.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1"
+   vmv%m3r.v\t%0,%3\;vnmsub.vv\t%0,%4,%5%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
 
@@ -5676,9 +5676,9 @@ (define_insn "*pred_nmsub"
   "TARGET_VECTOR"
   "@
vnmsub.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%2\;vnmsub.vv\t%0,%3,%4%p1
+   vmv%m2r.v\t%0,%2\;vnmsub.vv\t%0,%3,%4%p1
vnmsub.vv\t%0,%3,%4%p1
-   vmv.v.v\t%0,%2\;vnmsub.vv\t%0,%3,%4%p1"
+   

Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-03-06 Thread FX Coudert
I would like to patch this patch from September 2023:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631611.html

This bug is now hitting macOS in the latest version of Xcode (it was originally 
seen on freebsd).
I confirm that the patch is restoring bootstrap on x86_64-apple-darwin23

OK to push?
FX





> Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632
> 
> When building gcc's C++ sources against recent libc++, the poisoning of
> the ctype macros due to including safe-ctype.h before including C++
> standard headers such as , , etc, causes many compilation
> errors, similar to:
> 
> In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
> In file included from /home/dim/src/gcc/master/gcc/system.h:233:
> In file included from /usr/include/c++/v1/vector:321:
> In file included from
> /usr/include/c++/v1/__format/formatter_bool.h:20:
> In file included from
> /usr/include/c++/v1/__format/formatter_integral.h:32:
> In file included from /usr/include/c++/v1/locale:202:
> /usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
> only applies to structs, variables, functions, and namespaces
> 546 | _LIBCPP_INLINE_VISIBILITY
> | ^
> /usr/include/c++/v1/__config:813:37: note: expanded from macro
> '_LIBCPP_INLINE_VISIBILITY'
> 813 | # define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
> | ^
> /usr/include/c++/v1/__config:792:26: note: expanded from macro
> '_LIBCPP_HIDE_FROM_ABI'
> 792 |
> __attribute__((__abi_tag__(_LIBCPP_TOSTRING(
> _LIBCPP_VERSIONED_IDENTIFIER
> | ^
> In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
> In file included from /home/dim/src/gcc/master/gcc/system.h:233:
> In file included from /usr/include/c++/v1/vector:321:
> In file included from
> /usr/include/c++/v1/__format/formatter_bool.h:20:
> In file included from
> /usr/include/c++/v1/__format/formatter_integral.h:32:
> In file included from /usr/include/c++/v1/locale:202:
> /usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
> declaration list
> 547 | char_type toupper(char_type __c) const
> | ^
> /usr/include/c++/v1/__locale:553:48: error: too many arguments
> provided to function-like macro invocation
> 553 | const char_type* toupper(char_type* __low, const
> char_type* __high) const
> | ^
> /home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
> macro 'toupper' defined here
> 146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
> | ^
> 
> This is because libc++ uses different transitive includes than
> libstdc++, and some of those transitive includes pull in various ctype
> declarations (typically via ).
> 
> There was already a special case for including  before
> safe-ctype.h, so move the rest of the C++ standard header includes to
> the same location, to fix the problem.
> 


Re: [PATCH] match.pd, v2: Optimize a * !a to 0 [PR114009]

2024-03-06 Thread Richard Biener
On Wed, 6 Mar 2024, Jakub Jelinek wrote:

> On Wed, Mar 06, 2024 at 11:45:42AM +0100, Richard Biener wrote:
> > OK, though feel free to add ARG_UNUSED to 'captures' as well.
> 
> Ok, done below.
> 
> > I think the INTEGRAL_TYPE_P should be redundant - the pattern
> > should work for vectors and complex as well (with integer components
> > which integer_zerop constrains).
> 
> Guess you're right.  I think it is correct as is for both complex and
> int vector, while float etc. shouldn't appear there because of the
> integer_zerop and pointer because we don't do multiplication on pointer
> types.
> 
> That said, it won't really actually match the vector int case actually
> emitted, where we have a VEC_COND_EXPR, added those as new patterns.
> And it doesn't match what is emitted for _Complex, though I think it
> isn't that important to simplify that.
> _Complex int
> freddy (_Complex int x)
> {
>   return x * !x;
> }
> 
> _Complex int
> garply (_Complex int x)
> {
>   return x * (x == 0);
> }
> 
> results in original in:
>   return COMPLEX_EXPR > == 0 && IMAGPART_EXPR 
> > == 0, 0> * x;
> and
>   return (x == __complex__ (0, 0) ? __complex__ (1, 0) : __complex__ (0, 0)) 
> * x;
> but after gimplification it is already something that is really
> too hard to optimize.
> 
> So, is the following still ok if it passes bootstrap/regtest?

OK.

Thanks,
Richard.

> 2024-03-06  Jakub Jelinek  
> 
>   PR tree-optimization/114009
>   * genmatch.cc (decision_tree::gen): Emit ARG_UNUSED for captures
>   argument even for GENERIC, not just for GIMPLE.
>   * match.pd (a * !a -> 0): New simplifications.
> 
>   * gcc.dg/tree-ssa/pr114009.c: New test.
> 
> --- gcc/genmatch.cc.jj2024-02-26 10:09:16.594613314 +0100
> +++ gcc/genmatch.cc   2024-03-06 13:47:35.001458638 +0100
> @@ -4071,7 +4071,7 @@ decision_tree::gen (vec  ,
> for (unsigned i = 0;
>  i < as_a (s->s->s->match)->ops.length (); ++i)
>   fp_decl (f, " tree ARG_UNUSED (_p%d),", i);
> -   fp_decl (f, " tree *captures");
> +   fp_decl (f, " tree *ARG_UNUSED (captures)");
>   }
>for (unsigned i = 0; i < s->s->s->for_subst_vec.length (); ++i)
>   {
> --- gcc/match.pd.jj   2024-03-06 09:34:36.608271525 +0100
> +++ gcc/match.pd  2024-03-06 13:58:27.400588289 +0100
> @@ -1219,6 +1219,17 @@ (define_operator_list SYNC_FETCH_AND_AND
> && tree_nop_conversion_p (type, TREE_TYPE (@1)))
> (lshift @0 @2)))
>  
> +/* Fold a * !a into 0.  */
> +(simplify
> + (mult:c @0 (convert? (eq @0 integer_zerop)))
> +  { build_zero_cst (type); })
> +(simplify
> + (mult:c @0 (vec_cond (eq @0 integer_zerop) @1 integer_zerop))
> +  { build_zero_cst (type); })
> +(simplify
> + (mult:c @0 (vec_cond (ne @0 integer_zerop) integer_zerop @1))
> +  { build_zero_cst (type); })
> +
>  /* Shifts by precision or greater result in zero.  */
>  (for shift (lshift rshift)
>   (simplify
> --- gcc/testsuite/gcc.dg/tree-ssa/pr114009.c.jj   2024-03-06 
> 13:44:33.481924326 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr114009.c  2024-03-06 14:04:09.724918390 
> +0100
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/114009 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop1" } */
> +/* { dg-final { scan-tree-dump-times "  return 0;" 3 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "  (?:return| =) { 0, 0, 0, 0 
> };" 1 "forwprop1" } } */
> +
> +int
> +foo (int x)
> +{
> +  x = (x / 2) * 2;
> +  return (!x) * x;
> +}
> +
> +int
> +bar (int x, int y)
> +{
> +  (void) x;
> +  return y * !y;
> +}
> +
> +unsigned long long
> +baz (unsigned long long x)
> +{
> +  return (!x) * x;
> +}
> +
> +typedef int V __attribute__((vector_size (4 * sizeof (int;
> +
> +V
> +qux (V x)
> +{
> +  return x * (x == 0);
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: Stabilize flaky GCN target/offloading testing

2024-03-06 Thread Richard Biener
On Wed, 6 Mar 2024, Andrew Stubbs wrote:

> On 06/03/2024 12:09, Thomas Schwinge wrote:
> > Hi!
> > 
> > On 2024-02-21T17:32:13+0100, Richard Biener  wrote:
> >> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge :
> >>> [...] per my work on 
> >>> "libgomp make check time is excessive", all execution testing in libgomp
> >>> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  [...]
> >>> (... with the caveat that execution tests for
> >>> effective-targets are *not* governed by that, as I've found yesterday.
> >>> I have a WIP hack for that, too.)
> > 
> >>> What disturbs the testing a lot is, that the GPU may get into a bad
> >>> state, upon which any use either fails with a
> >>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
> >>> 'libhsa-runtime64.so.1'...
> >>>
> >>> I've now tried to debug the latter case (hang).  When the GPU gets into
> >>> this bad state (whatever exactly that is),
> >>> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
> >>> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
> >>> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
> >>> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
> >>> There it hangs until killed (for example, until DejaGnu's timeout
> >>> mechanism kills the process -- just that the next GPU-using execution
> >>> test then runs into the same thing again...).
> >>>
> >>> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
> >>> we're able to recover via:
> >>>
> >>> $ flock /tmp/gpu.lock sudo cat
> >>> /sys/kernel/debug/dri/0/amdgpu_gpu_recover
> >>> 0
> > 
> > At least most of the times.  I've found that -- sometimes... ;-( -- if
> > you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do
> > 'amdgpu_gpu_recover', and then immediately re-execute, you'll again run
> > into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.  That appears to be avoidable
> > by injecting some artificial "cool-down period"...  (The latter I've not
> > yet tested extensively.)
> > 
> >>> This is, obviously, a hack, probably needs a serial lock to not disturb
> >>> other things, has hard-coded 'dri/0', and as I said in
> >>> 
> >>> "GCN RDNA2+ vs. GCC SLP vectorizer":
> >>>
> >>> | I've no idea what
> >>> | 'amdgpu_gpu_recover' would do if the GPU is also used for display.
> >>
> >> It ends up terminating your X session?
> > 
> > Eh  ;'-|
> > 
> >> (there?s some automatic driver recovery that?s also sometimes triggered
> >> which sounds like the same thing).
> > 
> >> I need to try using the integrated graphics for X11 to see if that avoids
> >> the issue.
> > 
> > A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now
> > remember correctly -- basically got it to work, via hand-editing
> > '/etc/X11/xorg.conf' and all that...  But: I couldn't get external HDMI
> > to work in that setup, and therefore reverted to "standard".
> > 
> >> Guess AMD needs to improve the driver/runtime (or we - it?s open source at
> >> least up to the firmware).
> > 
> >>> However, it's very useful in my testing.  :-|
> >>>
> >>> The questions is, how to detect the "hang" state without first running
> >>> into a timeout (and disambiguating such a timeout from a user code
> >>> timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
> >>> initialization, and before the actual GPU kernel launch cancel it with
> >>> 'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
> >>> error message that we can then react on, like for
> >>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
> >>> no-go in libgomp -- instead, use a helper thread to similarly implement a
> >>> watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
> >>> other purposes.)  Any other clever ideas?  What's a suitable value for
> >>> "a few seconds"?
> > 
> > I'm attaching my current "GCN: Watchdog for device image load", covering
> > both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'.
> > (That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. )
> > 
> > That, plus routing *all* potential GPU usage (in particular: including
> > execution tests for effective-targets, see above) through a serial lock
> > ('flock', implemented in DejaGnu board file, outside of the the
> > "DejaGnu timeout domain", similar to
> > 'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus
> > catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and
> > the "fake" ones via "GCN: Watchdog for device image load") and in that
> > case 'amdgpu_gpu_recover' and re-execution of the respective executable,
> > does greatly stabilize flaky GCN target/offloading testing.
> > 
> > Do we have consensus to move forward with this approach, generally?
> 
> I've also observed a number of random hangs in 

[PATCH] match.pd, v2: Optimize a * !a to 0 [PR114009]

2024-03-06 Thread Jakub Jelinek
On Wed, Mar 06, 2024 at 11:45:42AM +0100, Richard Biener wrote:
> OK, though feel free to add ARG_UNUSED to 'captures' as well.

Ok, done below.

> I think the INTEGRAL_TYPE_P should be redundant - the pattern
> should work for vectors and complex as well (with integer components
> which integer_zerop constrains).

Guess you're right.  I think it is correct as is for both complex and
int vector, while float etc. shouldn't appear there because of the
integer_zerop and pointer because we don't do multiplication on pointer
types.

That said, it won't really actually match the vector int case actually
emitted, where we have a VEC_COND_EXPR, added those as new patterns.
And it doesn't match what is emitted for _Complex, though I think it
isn't that important to simplify that.
_Complex int
freddy (_Complex int x)
{
  return x * !x;
}

_Complex int
garply (_Complex int x)
{
  return x * (x == 0);
}

results in original in:
  return COMPLEX_EXPR > == 0 && IMAGPART_EXPR 
> == 0, 0> * x;
and
  return (x == __complex__ (0, 0) ? __complex__ (1, 0) : __complex__ (0, 0)) * 
x;
but after gimplification it is already something that is really
too hard to optimize.

So, is the following still ok if it passes bootstrap/regtest?

2024-03-06  Jakub Jelinek  

PR tree-optimization/114009
* genmatch.cc (decision_tree::gen): Emit ARG_UNUSED for captures
argument even for GENERIC, not just for GIMPLE.
* match.pd (a * !a -> 0): New simplifications.

* gcc.dg/tree-ssa/pr114009.c: New test.

--- gcc/genmatch.cc.jj  2024-02-26 10:09:16.594613314 +0100
+++ gcc/genmatch.cc 2024-03-06 13:47:35.001458638 +0100
@@ -4071,7 +4071,7 @@ decision_tree::gen (vec  ,
  for (unsigned i = 0;
   i < as_a (s->s->s->match)->ops.length (); ++i)
fp_decl (f, " tree ARG_UNUSED (_p%d),", i);
- fp_decl (f, " tree *captures");
+ fp_decl (f, " tree *ARG_UNUSED (captures)");
}
   for (unsigned i = 0; i < s->s->s->for_subst_vec.length (); ++i)
{
--- gcc/match.pd.jj 2024-03-06 09:34:36.608271525 +0100
+++ gcc/match.pd2024-03-06 13:58:27.400588289 +0100
@@ -1219,6 +1219,17 @@ (define_operator_list SYNC_FETCH_AND_AND
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(lshift @0 @2)))
 
+/* Fold a * !a into 0.  */
+(simplify
+ (mult:c @0 (convert? (eq @0 integer_zerop)))
+  { build_zero_cst (type); })
+(simplify
+ (mult:c @0 (vec_cond (eq @0 integer_zerop) @1 integer_zerop))
+  { build_zero_cst (type); })
+(simplify
+ (mult:c @0 (vec_cond (ne @0 integer_zerop) integer_zerop @1))
+  { build_zero_cst (type); })
+
 /* Shifts by precision or greater result in zero.  */
 (for shift (lshift rshift)
  (simplify
--- gcc/testsuite/gcc.dg/tree-ssa/pr114009.c.jj 2024-03-06 13:44:33.481924326 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr114009.c2024-03-06 14:04:09.724918390 
+0100
@@ -0,0 +1,33 @@
+/* PR tree-optimization/114009 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop1" } */
+/* { dg-final { scan-tree-dump-times "  return 0;" 3 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "  (?:return| =) { 0, 0, 0, 0 };" 
1 "forwprop1" } } */
+
+int
+foo (int x)
+{
+  x = (x / 2) * 2;
+  return (!x) * x;
+}
+
+int
+bar (int x, int y)
+{
+  (void) x;
+  return y * !y;
+}
+
+unsigned long long
+baz (unsigned long long x)
+{
+  return (!x) * x;
+}
+
+typedef int V __attribute__((vector_size (4 * sizeof (int;
+
+V
+qux (V x)
+{
+  return x * (x == 0);
+}


Jakub



Re: Stabilize flaky GCN target/offloading testing

2024-03-06 Thread Andrew Stubbs

On 06/03/2024 12:09, Thomas Schwinge wrote:

Hi!

On 2024-02-21T17:32:13+0100, Richard Biener  wrote:

Am 21.02.2024 um 13:34 schrieb Thomas Schwinge :

[...] per my work on 
"libgomp make check time is excessive", all execution testing in libgomp
is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  [...]
(... with the caveat that execution tests for
effective-targets are *not* governed by that, as I've found yesterday.
I have a WIP hack for that, too.)



What disturbs the testing a lot is, that the GPU may get into a bad
state, upon which any use either fails with a
'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
'libhsa-runtime64.so.1'...

I've now tried to debug the latter case (hang).  When the GPU gets into
this bad state (whatever exactly that is),
'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
There it hangs until killed (for example, until DejaGnu's timeout
mechanism kills the process -- just that the next GPU-using execution
test then runs into the same thing again...).

In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
we're able to recover via:

$ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
0


At least most of the times.  I've found that -- sometimes... ;-( -- if
you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do
'amdgpu_gpu_recover', and then immediately re-execute, you'll again run
into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.  That appears to be avoidable
by injecting some artificial "cool-down period"...  (The latter I've not
yet tested extensively.)


This is, obviously, a hack, probably needs a serial lock to not disturb
other things, has hard-coded 'dri/0', and as I said in

"GCN RDNA2+ vs. GCC SLP vectorizer":

| I've no idea what
| 'amdgpu_gpu_recover' would do if the GPU is also used for display.


It ends up terminating your X session…


Eh  ;'-|


(there’s some automatic driver recovery that’s also sometimes triggered which 
sounds like the same thing).



I need to try using the integrated graphics for X11 to see if that avoids the 
issue.


A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now
remember correctly -- basically got it to work, via hand-editing
'/etc/X11/xorg.conf' and all that...  But: I couldn't get external HDMI
to work in that setup, and therefore reverted to "standard".


Guess AMD needs to improve the driver/runtime (or we - it’s open source at 
least up to the firmware).



However, it's very useful in my testing.  :-|

The questions is, how to detect the "hang" state without first running
into a timeout (and disambiguating such a timeout from a user code
timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
initialization, and before the actual GPU kernel launch cancel it with
'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
error message that we can then react on, like for
'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
no-go in libgomp -- instead, use a helper thread to similarly implement a
watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
other purposes.)  Any other clever ideas?  What's a suitable value for
"a few seconds"?


I'm attaching my current "GCN: Watchdog for device image load", covering
both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'.
(That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. )

That, plus routing *all* potential GPU usage (in particular: including
execution tests for effective-targets, see above) through a serial lock
('flock', implemented in DejaGnu board file, outside of the the
"DejaGnu timeout domain", similar to
'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus
catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and
the "fake" ones via "GCN: Watchdog for device image load") and in that
case 'amdgpu_gpu_recover' and re-execution of the respective executable,
does greatly stabilize flaky GCN target/offloading testing.

Do we have consensus to move forward with this approach, generally?


I've also observed a number of random hangs in host-side code outside 
our control, but after the kernel has exited. In general this watchdog 
approach might help with these. I do feel like it's "papering over the 
cracks", but if we can't fix it at the end of the day it's just a 
little extra code.


My only concern is that it might actually cause failures, perhaps on 
heavily loaded systems, or with network filesystems, or during debugging.


Andrew


[patch,avr.applied] Adjusted rtx costs of plus + zero_extend

2024-03-06 Thread Georg-Johann Lay

Adjusted rtx costs of (plus (zero_extend (...)) reg).

Johann

--

AVR: Adjust rtx cost of plus + zero_extend.

gcc/
* config/avr/avr.cc (avr_rtx_costs_1) [PLUS+ZERO_EXTEND]: Adjust
rtx cost.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 36995e05cbe..b87ae6a256d 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -12513,6 +12513,13 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int 
outer_code,

   return true;

 case PLUS:
+  if (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ && REG_P (XEXP (x, 1)))
+   {
+ *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) - 1);
+ return true;
+   }
+
   switch (mode)
{
case E_QImode:


Stabilize flaky GCN target/offloading testing

2024-03-06 Thread Thomas Schwinge
Hi!

On 2024-02-21T17:32:13+0100, Richard Biener  wrote:
> Am 21.02.2024 um 13:34 schrieb Thomas Schwinge :
>> [...] per my work on 
>> "libgomp make check time is excessive", all execution testing in libgomp
>> is serialized in 'libgomp/testsuite/lib/libgomp.exp:libgomp_load'.  [...]
>> (... with the caveat that execution tests for
>> effective-targets are *not* governed by that, as I've found yesterday.
>> I have a WIP hack for that, too.)

>> What disturbs the testing a lot is, that the GPU may get into a bad
>> state, upon which any use either fails with a
>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' error -- or by just hanging, deep in
>> 'libhsa-runtime64.so.1'...
>> 
>> I've now tried to debug the latter case (hang).  When the GPU gets into
>> this bad state (whatever exactly that is),
>> 'hsa_executable_load_code_object' still returns 'HSA_STATUS_SUCCESS', but
>> then GCN target execution ('gcn-run') hangs in 'hsa_executable_freeze'
>> vs. GCN offloading execution ('libgomp-plugin-gcn.so.1') hangs right
>> before 'hsa_executable_freeze', in the GCN heap setup 'hsa_memory_copy'.
>> There it hangs until killed (for example, until DejaGnu's timeout
>> mechanism kills the process -- just that the next GPU-using execution
>> test then runs into the same thing again...).
>> 
>> In this state (and also the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' state),
>> we're able to recover via:
>> 
>>$ flock /tmp/gpu.lock sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>>0

At least most of the times.  I've found that -- sometimes... ;-( -- if
you run into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', then do
'amdgpu_gpu_recover', and then immediately re-execute, you'll again run
into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.  That appears to be avoidable
by injecting some artificial "cool-down period"...  (The latter I've not
yet tested extensively.)

>> This is, obviously, a hack, probably needs a serial lock to not disturb
>> other things, has hard-coded 'dri/0', and as I said in
>> 
>> "GCN RDNA2+ vs. GCC SLP vectorizer":
>> 
>> | I've no idea what
>> | 'amdgpu_gpu_recover' would do if the GPU is also used for display.
>
> It ends up terminating your X session…

Eh  ;'-|

> (there’s some automatic driver recovery that’s also sometimes triggered which 
> sounds like the same thing).

> I need to try using the integrated graphics for X11 to see if that avoids the 
> issue.

A few years ago, I tried that for a Nvidia GPU laptop, and -- if I now
remember correctly -- basically got it to work, via hand-editing
'/etc/X11/xorg.conf' and all that...  But: I couldn't get external HDMI
to work in that setup, and therefore reverted to "standard".

> Guess AMD needs to improve the driver/runtime (or we - it’s open source at 
> least up to the firmware).

>> However, it's very useful in my testing.  :-|
>> 
>> The questions is, how to detect the "hang" state without first running
>> into a timeout (and disambiguating such a timeout from a user code
>> timeout)?  Add a watchdog: call 'alarm([a few seconds])' before device
>> initialization, and before the actual GPU kernel launch cancel it with
>> 'alarm(0)'?  (..., and add a handler for 'SIGALRM' to print a distinct
>> error message that we can then react on, like for
>> 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'.)  Probably 'alarm'/'SIGALRM' is a
>> no-go in libgomp -- instead, use a helper thread to similarly implement a
>> watchdog?  ('libgomp/plugin/plugin-gcn.c' already is using pthreads for
>> other purposes.)  Any other clever ideas?  What's a suitable value for
>> "a few seconds"?

I'm attaching my current "GCN: Watchdog for device image load", covering
both 'gcc/config/gcn/gcn-run.cc' and 'libgomp/plugin/plugin-gcn.c'.
(That's using 'timer_create' etc. instead of 'alarm'/'SIGALRM'. )

That, plus routing *all* potential GPU usage (in particular: including
execution tests for effective-targets, see above) through a serial lock
('flock', implemented in DejaGnu board file, outside of the the
"DejaGnu timeout domain", similar to
'libgomp/testsuite/lib/libgomp.exp:libgomp_load', see above), plus
catching 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' (both the "real" ones and
the "fake" ones via "GCN: Watchdog for device image load") and in that
case 'amdgpu_gpu_recover' and re-execution of the respective executable,
does greatly stabilize flaky GCN target/offloading testing.

Do we have consensus to move forward with this approach, generally?


Grüße
 Thomas


>From 21795353483c263c91a5efa80da41a75a6b2b629 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 22 Feb 2024 21:50:45 +0100
Subject: [PATCH] GCN: Watchdog for device image load

---
 gcc/config/gcn/gcn-run.cc   | 76 ++
 libgomp/plugin/plugin-gcn.c | 81 -
 2 files changed, 156 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn-run.cc 

[PATCH] tree-optimizaton/114239 - rework reduction epilogue driving

2024-03-06 Thread Richard Biener
The following reworks vectorizable_live_operation to pass the
live stmt to vect_create_epilog_for_reduction also for early breaks
and a peeled main exit.  This is to be able to figure the scalar
definition to replace.  This reverts the PR114192 fix as it is
subsumed by this cleanup.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimizaton/114239
* tree-vect-loop.cc (vect_get_vect_def): Remove.
(vect_create_epilog_for_reduction): The passed in stmt_info
should now be the live stmt that produces the scalar reduction
result.  Revert PR114192 fix.  Base reduction info off
info_for_reduction.  Remove special handling of
early-break/peeled, restore original vector def gathering.
Make sure to pick the correct exit PHIs.
(vectorizable_live_operation): Pass in the proper stmt_info
for early break exits.

* gcc.dg/vect/vect-early-break_122-pr114239.c: New testcase.
---
 .../vect/vect-early-break_122-pr114239.c  |  29 +
 gcc/tree-vect-loop.cc | 105 --
 2 files changed, 53 insertions(+), 81 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_122-pr114239.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_122-pr114239.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_122-pr114239.c
new file mode 100644
index 000..7bf4db14209
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_122-pr114239.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+
+int ip4_getbit_a, ip4_getbit_pos, ip4_clrbit_pos;
+void ip4_clrbit(int *a) { *a &= ip4_clrbit_pos; }
+typedef struct {
+  char pxlen;
+  int prefix;
+} net_addr_ip4;
+void fib_get_chain();
+int trie_match_longest_ip4();
+int trie_match_next_longest_ip4(net_addr_ip4 *n) {
+  int __trans_tmp_1;
+  while (n->pxlen) {
+n->pxlen--;
+ip4_clrbit(>prefix);
+__trans_tmp_1 = ip4_getbit_a >> ip4_getbit_pos;
+if (__trans_tmp_1)
+  return 1;
+  }
+  return 0;
+}
+void net_roa_check_ip4_trie_tab() {
+  net_addr_ip4 px0;
+  for (int _n = trie_match_longest_ip4(); _n;
+   _n = trie_match_next_longest_ip4())
+fib_get_chain();
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 761cdc67570..20ee0aad932 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5897,35 +5897,6 @@ vect_create_partial_epilog (tree vec_def, tree vectype, 
code_helper code,
   return new_temp;
 }
 
-/* Retrieves the definining statement to be used for a reduction.
-   For LAST_VAL_REDUC_P we use the current VEC_STMTs which correspond to the
-   final value after vectorization and otherwise we look at the reduction
-   definitions to get the first.  */
-
-tree
-vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
-  slp_instance slp_node_instance, bool last_val_reduc_p,
-  unsigned i, vec  _stmts)
-{
-  tree def;
-
-  if (slp_node)
-{
-  if (!last_val_reduc_p)
-slp_node = slp_node_instance->reduc_phis;
-  def = vect_get_slp_vect_def (slp_node, i);
-}
-  else
-{
-  if (!last_val_reduc_p)
-   reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
-  vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
-  def = gimple_get_lhs (vec_stmts[0]);
-}
-
-  return def;
-}
-
 /* Function vect_create_epilog_for_reduction
 
Create code at the loop-epilog to finalize the result of a reduction
@@ -5989,8 +5960,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  loop-closed PHI of the inner loop which we remember as
  def for the reduction PHI generation.  */
   bool double_reduc = false;
-  bool last_val_reduc_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
- && !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
 {
@@ -6000,8 +5969,6 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
(stmt_info->stmt, 0));
   stmt_info = vect_stmt_to_vectorize (stmt_info);
 }
-  gphi *reduc_def_stmt
-= as_a  (STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))->stmt);
   code_helper code = STMT_VINFO_REDUC_CODE (reduc_info);
   internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
   tree vectype;
@@ -6066,33 +6033,9 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 
   stmt_vec_info single_live_out_stmt[] = { stmt_info };
   array_slice live_out_stmts = single_live_out_stmt;
-  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-  && loop_exit != LOOP_VINFO_IV_EXIT (loop_vinfo)
-  /* ???  We should fend this off earlier.  For conversions we create
-multiple epilogues, one dead.  */
-  && stmt_info == reduc_info->reduc_def)
-{
-  gcc_assert 

Re: [PATCH] s390: Streamline NNPA builtins with POP mnemonics

2024-03-06 Thread Stefan Schulze Frielinghaus
Since there is no straight forward way to introduce an overload with
different return types where we would expand differently depending on an
immediate operand, lets drop this patch.

On Fri, Mar 01, 2024 at 04:18:31PM +0100, Stefan Schulze Frielinghaus wrote:
> At the moment there are no extended mnemonics for vclfn(h,l) and vcrnf
> defined in the Principles of Operation.  Thus, remove the suffix "s"
> from the builtins and expanders and introduce a further operand for the
> data type.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtin-types.def: Update to reflect latest
>   changes.
>   * config/s390/s390-builtins.def: Remove suffix s from
>   s390_vclfn(h,l)s and s390_vcrnfs.
>   * config/s390/s390.md: Similar, remove suffix s from unspec
>   definitions.
>   * config/s390/vecintrin.h (vec_extend_to_fp32_hi): Redefine.
>   (vec_extend_to_fp32_lo): Redefine.
>   (vec_round_from_fp32): Redefine.
>   * config/s390/vx-builtins.md (vclfnhs_v8hi): Remove suffix s.
>   (vclfnh_v8hi): Add with extra operand.
>   (vclfnls_v8hi): Remove suffix s.
>   (vclfnl_v8hi): Add with extra operand.
>   (vcrnfs_v8hi): Remove suffix s.
>   (vcrnf_v8hi): Add with extra operand.
> ---
> OK for mainline?
> 
>  gcc/config/s390/s390-builtin-types.def |  4 ++--
>  gcc/config/s390/s390-builtins.def  |  6 +++---
>  gcc/config/s390/s390.md|  6 +++---
>  gcc/config/s390/vecintrin.h|  6 +++---
>  gcc/config/s390/vx-builtins.md | 27 ++
>  5 files changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-builtin-types.def 
> b/gcc/config/s390/s390-builtin-types.def
> index ce51ae8cd3f..c3d09b42835 100644
> --- a/gcc/config/s390/s390-builtin-types.def
> +++ b/gcc/config/s390/s390-builtin-types.def
> @@ -273,7 +273,6 @@ DEF_FN_TYPE_2 (BT_FN_V2DI_V2DF_V2DF, BT_V2DI, BT_V2DF, 
> BT_V2DF)
>  DEF_FN_TYPE_2 (BT_FN_V2DI_V2DI_V2DI, BT_V2DI, BT_V2DI, BT_V2DI)
>  DEF_FN_TYPE_2 (BT_FN_V2DI_V4SI_V4SI, BT_V2DI, BT_V4SI, BT_V4SI)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_FLT_INT, BT_V4SF, BT_FLT, BT_INT)
> -DEF_FN_TYPE_2 (BT_FN_V4SF_UV8HI_UINT, BT_V4SF, BT_UV8HI, BT_UINT)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_V4SF, BT_V4SF, BT_V4SF, BT_V4SF)
>  DEF_FN_TYPE_2 (BT_FN_V4SI_BV4SI_V4SI, BT_V4SI, BT_BV4SI, BT_V4SI)
> @@ -324,7 +323,6 @@ DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_USHORT_INT, BT_UV8HI, 
> BT_UV8HI, BT_USHORT, BT_I
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_INT)
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_UV8HI)
> -DEF_FN_TYPE_3 (BT_FN_UV8HI_V4SF_V4SF_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, 
> BT_UINT)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_UV16QI_UV16QI_INTPTR, BT_V16QI, BT_UV16QI, 
> BT_UV16QI, BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_INTPTR, BT_V16QI, BT_V16QI, BT_V16QI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_V16QI, BT_V16QI, BT_V16QI, BT_V16QI, 
> BT_V16QI)
> @@ -340,6 +338,7 @@ DEF_FN_TYPE_3 (BT_FN_V2DI_V2DF_INT_INTPTR, BT_V2DI, 
> BT_V2DF, BT_INT, BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V2DI_V2DF_V2DF_INTPTR, BT_V2DI, BT_V2DF, BT_V2DF, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V2DI_V2DI_V2DI_INTPTR, BT_V2DI, BT_V2DI, BT_V2DI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V2DI_V4SI_V4SI_V2DI, BT_V2DI, BT_V4SI, BT_V4SI, BT_V2DI)
> +DEF_FN_TYPE_3 (BT_FN_V4SF_UV8HI_UINT_UINT, BT_V4SF, BT_UV8HI, BT_UINT, 
> BT_UINT)
>  DEF_FN_TYPE_3 (BT_FN_V4SF_V2DF_INT_INT, BT_V4SF, BT_V2DF, BT_INT, BT_INT)
>  DEF_FN_TYPE_3 (BT_FN_V4SF_V4SF_FLT_INT, BT_V4SF, BT_V4SF, BT_FLT, BT_INT)
>  DEF_FN_TYPE_3 (BT_FN_V4SF_V4SF_UCHAR_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR, 
> BT_UCHAR)
> @@ -377,6 +376,7 @@ DEF_FN_TYPE_4 
> (BT_FN_UV4SI_UV4SI_UV4SI_UINTCONSTPTR_UCHAR, BT_UV4SI, BT_UV4SI, B
>  DEF_FN_TYPE_4 (BT_FN_UV4SI_UV4SI_UV4SI_UV4SI_INT, BT_UV4SI, BT_UV4SI, 
> BT_UV4SI, BT_UV4SI, BT_INT)
>  DEF_FN_TYPE_4 (BT_FN_UV8HI_UV8HI_UV8HI_INT_INTPTR, BT_UV8HI, BT_UV8HI, 
> BT_UV8HI, BT_INT, BT_INTPTR)
>  DEF_FN_TYPE_4 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, 
> BT_UV8HI, BT_UV8HI, BT_INT)
> +DEF_FN_TYPE_4 (BT_FN_UV8HI_V4SF_V4SF_UINT_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, 
> BT_UINT, BT_UINT)
>  DEF_FN_TYPE_4 (BT_FN_VOID_UV2DI_UV2DI_ULONGLONGPTR_ULONGLONG, BT_VOID, 
> BT_UV2DI, BT_UV2DI, BT_ULONGLONGPTR, BT_ULONGLONG)
>  DEF_FN_TYPE_4 (BT_FN_VOID_UV4SI_UV4SI_UINTPTR_ULONGLONG, BT_VOID, BT_UV4SI, 
> BT_UV4SI, BT_UINTPTR, BT_ULONGLONG)
>  DEF_FN_TYPE_4 (BT_FN_VOID_V4SI_V4SI_INTPTR_ULONGLONG, BT_VOID, BT_V4SI, 
> BT_V4SI, BT_INTPTR, BT_ULONGLONG)
> diff --git a/gcc/config/s390/s390-builtins.def 
> b/gcc/config/s390/s390-builtins.def
> index 02ff516c677..0d4e20ea425 100644
> --- a/gcc/config/s390/s390-builtins.def
> +++ b/gcc/config/s390/s390-builtins.def
> @@ -3025,10 +3025,10 @@ B_DEF  (s390_vstrszf, 

Re: [PATCH] Fortran: error recovery while simplifying expressions [PR103707, PR106987]

2024-03-06 Thread Paul Richard Thomas
Hi Harald,

This all looks good to me. OK for mainline and, according to intestinal
fortitude on your part, earlier branches.

Thanks

Paul


On Tue, 5 Mar 2024 at 21:24, Harald Anlauf  wrote:

> Dear all,
>
> error recovery on arithmetic errors during simplification has bugged
> me for a long time, especially since the occurence of ICEs depended
> on whether -frange-check is specified or not, whether array ctors
> were involved, etc.
>
> I've now come up with the attached patch that classifies the arithmetic
> result codes into "hard" and "soft" errors.
>
> A "soft" error means that it is an overflow or other exception (e.g. NaN)
> that is ignored with -fno-range-check.  After the patch, a soft error
> will not stop simplification (a hard one will), and error status will be
> passed along.
>
> I took this opportunity to change the emitted error for division by zero
> for real and complex division dependent on whether the numerator is
> regular or not.  This makes e.g. (0.)/0 a NaN and now says so, in
> accordance with some other brands.
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> Other comments?
>
> Thanks,
> Harald
>
>


Re: [PATCH] match.pd: Optimize a * !a to 0 [PR114009]

2024-03-06 Thread Richard Biener
On Wed, 6 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following patch attempts to fix an optimization regression through
> adding a simple simplification.  We already have the
> /* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0  */
> (if (!canonicalize_math_p ())
>  (for cmp (tcc_comparison)
>   (simplify
>(mult:c (convert (cmp@0 @1 @2)) @3)
>(if (INTEGRAL_TYPE_P (type)
> && INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>  (cond @0 @3 { build_zero_cst (type); })))
> optimization which otherwise triggers during the a * !a multiplication,
> but that is done only late and we aren't able through range assumptions
> optimize it yet anyway.
> 
> The patch adds a specific simplification for it.
> If a is zero, then a * !a will be 0 * 1 (or for signed 1-bit 0 * -1)
> and so 0.
> If a is non-zero, then a * !a will be a * 0 and so again 0.
> 
> I had to use a hack, TREE_TYPE (@0) in the condition as opposed to
> using just type, because otherwise I get a false positive warning.
> The captures parameter is marked ARG_UNUSED in gimple, but not in
> generic and we end up with
> tree
> generic_simplify_168 (location_t ARG_UNUSED (loc), const tree ARG_UNUSED 
> (type),
>  tree ARG_UNUSED (_p0), tree ARG_UNUSED (_p1), tree *captures)
> {
>   const bool debug_dump = dump_file && (dump_flags & TDF_FOLDING);
>   if (INTEGRAL_TYPE_P (type)
> )
> {
>   if (TREE_SIDE_EFFECTS (_p1)) goto next_after_fail309;
>   if (UNLIKELY (!dbg_cnt (match))) goto next_after_fail309;
>   {
> tree _r;
> _r =  build_zero_cst (type);
> if (UNLIKELY (debug_dump)) generic_dump_logs ("match.pd", 216, 
> __FILE__, __LINE__, true);
> return _r;
>   }
> next_after_fail309:;
> }
>   return NULL_TREE;
> }
> there.  I've looked at similar cases and in
>  (simplify
>   (mod @0 integer_onep)
>   { build_zero_cst (type); })
> case we don't get captures unused because it is used a single time
> and therefore evaluate it for side-effects instead of punting on them,
> and in
>  (pointer_diff @@0 @0)
>  { build_zero_cst (type); })
> we do so as well.  Though, using @@0 instead of @0 on either operand
> of this simplification doesn't help.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, though feel free to add ARG_UNUSED to 'captures' as well.
I think the INTEGRAL_TYPE_P should be redundant - the pattern
should work for vectors and complex as well (with integer components
which integer_zerop constrains).

Thanks,
Richard.

> 2024-03-06  Jakub Jelinek  
> 
>   PR tree-optimization/114009
>   * match.pd (a * !a -> 0): New simplification.
> 
>   * gcc.dg/tree-ssa/pr114009.c: New test.
> 
> --- gcc/match.pd.jj   2024-03-01 14:56:42.442810053 +0100
> +++ gcc/match.pd  2024-03-05 22:53:25.202579435 +0100
> @@ -1219,6 +1219,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && tree_nop_conversion_p (type, TREE_TYPE (@1)))
> (lshift @0 @2)))
>  
> +/* Fold a * !a into 0.  */
> +(simplify
> + (mult:c @0 (convert? (eq @0 integer_zerop)))
> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> +   { build_zero_cst (type); }))
> +
>  /* Shifts by precision or greater result in zero.  */
>  (for shift (lshift rshift)
>   (simplify
> --- gcc/testsuite/gcc.dg/tree-ssa/pr114009.c.jj   2024-03-05 
> 15:18:41.274541636 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr114009.c  2024-03-05 15:16:09.056589675 
> +0100
> @@ -0,0 +1,24 @@
> +/* PR tree-optimization/114009 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "  return 0;" 3 "optimized" } } */
> +
> +int
> +foo (int x)
> +{
> +  x = (x / 2) * 2;
> +  return (!x) * x;
> +}
> +
> +int
> +bar (int x, int y)
> +{
> +  (void) x;
> +  return y * !y;
> +}
> +
> +unsigned long long
> +baz (unsigned long long x)
> +{
> +  return (!x) * x;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH] MAINTAINERS: Add myself to write after approval

2024-03-06 Thread Demin Han
Hi,

I will commit the patch you mentioned with [NFC] added.
And split and resubmit other patches after more check and discussion.

Regards,
Demin

From: juzhe.zh...@rivai.ai 
Sent: 2024年3月6日 10:13
To: Demin Han ; gcc-patches 

Cc: kito.cheng 
Subject: Re: [PATCH] MAINTAINERS: Add myself to write after approval

Hi, han.

I think you can commit this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/646931.html
RISC-V: Refactor expand_vec_cmp
It's an NFC patch that I approved.

juzhe.zh...@rivai.ai

From: demin.han
Date: 2024-03-04 14:51
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng
Subject: [PATCH] MAINTAINERS: Add myself to write after approval
ChangeLog:

* MAINTAINERS: Add myself

Signed-off-by: demin.han 
mailto:demin@starfivetech.com>>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b01fab16061..a681518d704 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -448,6 +448,7 @@ Wei Guozhi mailto:car...@google.com>>
Vineet Gupta mailto:vine...@rivosinc.com>>
Naveen H.S mailto:nave...@marvell.com>>
Mostafa Hagog mailto:ha...@gcc.gnu.org>>
+Demin Han mailto:demin@starfivetech.com>>
Jivan Hakobyan mailto:jivanhakoby...@gmail.com>>
Andrew Haley mailto:a...@redhat.com>>
Frederik Harwath mailto:frede...@harwath.name>>
--
2.43.2




Re: [PATCH] Revert "Pass GUILE down to subdirectories"

2024-03-06 Thread Andrew Burgess


Tom Tromey  writes:

> This reverts commit b7e5a29602143b53267efcd9c8d5ecc78cd5a62f.
>
> This patch caused problems for some users when building gdb, because
> it would cause 'guild' to be invoked with the wrong versin of guile.
> On the whole it seems simpler to just back this out.

Tom,

After once again forgetting to add GUILE=guile2.2 to my GDB build I was
thinking about this issue again.

Given that GDB has a --with-guile=... configure option, and that our
configure scripts try to identify a matching version of guild and a
shared library to link GDB against, I wondered why we don't just force
the use of a matching version of guile.

I guess I'm suggesting that for building GDB's guile components we
should respect either the --with-guile=... configure option, or what the
configure script auto-detected, and should not be picking up any build
time GUILE=... flag -- setting GUILE=... isn't going to change the
version of guild used, nor is it going to change the shared library that
GDB links against, so just changing the guile version seems like a
recipe for incompatibility problems.

Below is a patch that forces GDB to compile our guile scripts using a
suitable version of guile.  This could be applied irrespective of
whether you revert b7e5a29602143b53267efcd9c8d5ecc78cd5a62f or not.

What do you think?

Thanks,
Andrew

---

commit 6d326010217a2c94ad27d4063f90f689f7c11615
Author: Andrew Burgess 
Date:   Tue Mar 5 12:35:05 2024 +

gdb/data-directory: set GUILE for compiling GDB's guile scripts

After this commit:

  commit d006ec41c448d9f4a84df50391e87cbf0aa8c152
  Date:   Thu Dec 28 14:08:39 2023 -0700

  Pass GUILE down to subdirectories

I have had issues with GDB's guile support.  Commit d006ec41c448 sets
the GUILE variable in the top-level Makefile to the default value
'guile'.

The problem this causes for me is that 'guile' on my system is
2.0.14.  I also have guile2.2 available on my system, and this is
version 2.2.6.

When GDB configures it selects looks for a suitable guile version to
use, and it looks in the order:

  guile-3.0
  guile-2.2
  guile-2.0

This means that on my system GDB chooses to use the guile2.2 library
to link against, and uses guild2.2 as the compiler to pre-compile
GDB's guile scripts.

However, guild (and guile2.2) check for the GUILE environment
variable, and use the value of this variable as the version of guile
interpreter which is used when compiling things.

What this means is that after commit d006ec41c448, I am now using the
guild2.2 binary (version 2.2.6), but this is then invoking the guile
binary (version 2.0.14).  The compiled scripts are then loaded by GDB,
which is linking against libguile-2.2.so.1, this results in the
following error:

  $ ./gdb/gdb --data-directory ./gdb/data-directory
  Exception caught while booting Guile.
  Error in function "load-thunk-from-memory":
  not an ELF file
  ./gdb/gdb: warning: Could not complete Guile gdb module initialization 
from:
  /tmp/binutils-gdb/build/gdb/data-directory/guile/gdb/boot.scm.
  Limited Guile support is available.
  Suggest passing --data-directory=/path/to/gdb/data-directory.
  GDB Version: 15.1

  (gdb)

We could fix this by reverting some parts of d006ec41c448,
specifically, the part which sets 'guile' as the default value of
GUILE in the top-level Makefile.  However, I wonder if there's a
better solution.

GDB already has a configure flag `--with-guile=...` if this isn't
provided then GDB looks for a sensible default.

I think that when compiling GDB's guile files we should use a
consistent set of tools, i.e. a matching version of guild and guile,
both of which correspond to the shared library GDB will link against.

I propose that this should be done by setting the GUILE environment
variable in gdb/data-directory/Makefile.in based on the version of
guile that GDB's configure script selected.

This does mean that doing:

  make all-gdb GUILE=

will no longer work, GDB will ignore this setting, however, if we
really want to compile GDB's guile scripts using a different guile
version then surely we should also be linking to a corresponding
shared library, and (maybe) using the corresponding guild version too?
This can be achieved by configuring GDB using `--with-guile=`.

After this commit I no longer have issues compiling GDB with guile
support.

diff --git a/gdb/acinclude.m4 b/gdb/acinclude.m4
index 915082056dd..6e4b96f0acf 100644
--- a/gdb/acinclude.m4
+++ b/gdb/acinclude.m4
@@ -284,7 +284,29 @@ AC_DEFUN([GDB_GUILE_PROGRAM_NAMES], [
   fi
 
   GUILD="$ac_cv_guild_program_name"
+
+  AC_CACHE_CHECK([for the absolute file name of the 'guile' command],
+

[pushed] aarch64: Define out-of-class static constants

2024-03-06 Thread Richard Sandiford
While reworking the aarch64 feature descriptions, I forgot
to add out-of-class definitions of some static constants.
This could lead to a build failure with some compilers.

This was seen with some WIP to increase the number of extensions
beyond 64.  It's latent on trunk though, and a regression from
before the rework.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64-feature-deps.h (feature_deps::info): Add
out-of-class definitions of static constants.
---
 gcc/config/aarch64/aarch64-feature-deps.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-feature-deps.h 
b/gcc/config/aarch64/aarch64-feature-deps.h
index a1b81f9070b..3641badb82f 100644
--- a/gcc/config/aarch64/aarch64-feature-deps.h
+++ b/gcc/config/aarch64/aarch64-feature-deps.h
@@ -71,6 +71,9 @@ template struct info;
 static constexpr auto enable = flag | get_enable REQUIRES; \
 static constexpr auto explicit_on = enable | get_enable EXPLICIT_ON; \
   };   \
+  const aarch64_feature_flags info::flag;  \
+  const aarch64_feature_flags info::enable;\
+  const aarch64_feature_flags info::explicit_on; \
   constexpr info IDENT ()  \
   {\
 return info ();\
-- 
2.25.1



Re: [PATCH] arm: Support -mfdpic for more targets

2024-03-06 Thread Richard Earnshaw (lists)
On 06/03/2024 05:07, Fangrui Song wrote:
> On Fri, Feb 23, 2024 at 7:33 PM Fangrui Song  wrote:
>>
>> From: Fangrui Song 
>>
>> Targets that are not arm*-*-uclinuxfdpiceabi can use -S -mfdpic, but -c
>> -mfdpic does not pass --fdpic to gas.  This is an unnecessary
>> restriction.  Just define the ASM_SPEC in bpabi.h.
>>
>> Additionally, use armelf[b]_linux_fdpiceabi emulations for -mfdpic in
>> linux-eabi.h.  This will allow a future musl fdpic port to use the
>> desired BFD emulation.
>>
>> gcc/ChangeLog:
>>
>> * config/arm/bpabi.h (TARGET_FDPIC_ASM_SPEC): Transform -mfdpic.
>> * config/arm/linux-eabi.h (TARGET_FDPIC_LINKER_EMULATION): Define.
>> (SUBTARGET_EXTRA_LINK_SPEC): Use TARGET_FDPIC_LINKER_EMULATION
>> if -mfdpic.
>> ---
>>  gcc/config/arm/bpabi.h  | 2 +-
>>  gcc/config/arm/linux-eabi.h | 5 -
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
>> index 7a279f3ed3c..6778be1a8bf 100644
>> --- a/gcc/config/arm/bpabi.h
>> +++ b/gcc/config/arm/bpabi.h
>> @@ -55,7 +55,7 @@
>>  #define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*"\
>>    "|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"
>>
>> -#define TARGET_FDPIC_ASM_SPEC ""
>> +#define TARGET_FDPIC_ASM_SPEC "%{mfdpic: --fdpic}"
>>
>>  #define BE8_LINK_SPEC  \
>>    "%{!r:%{!mbe32:%:be8_linkopt(%{mlittle-endian:little}"   \
>> diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
>> index eef791f6a02..0c5c58e4928 100644
>> --- a/gcc/config/arm/linux-eabi.h
>> +++ b/gcc/config/arm/linux-eabi.h
>> @@ -46,12 +46,15 @@
>>  #undef  TARGET_LINKER_EMULATION
>>  #if TARGET_BIG_ENDIAN_DEFAULT
>>  #define TARGET_LINKER_EMULATION "armelfb_linux_eabi"
>> +#define TARGET_FDPIC_LINKER_EMULATION "armelfb_linux_fdpiceabi"
>>  #else
>>  #define TARGET_LINKER_EMULATION "armelf_linux_eabi"
>> +#define TARGET_FDPIC_LINKER_EMULATION "armelf_linux_fdpiceabi"
>>  #endif
>>
>>  #undef  SUBTARGET_EXTRA_LINK_SPEC
>> -#define SUBTARGET_EXTRA_LINK_SPEC " -m " TARGET_LINKER_EMULATION
>> +#define SUBTARGET_EXTRA_LINK_SPEC " -m %{mfdpic: " \
>> +  TARGET_FDPIC_LINKER_EMULATION ";:" TARGET_LINKER_EMULATION "}"
>>
>>  /* GNU/Linux on ARM currently supports three dynamic linkers:
>> - ld-linux.so.2 - for the legacy ABI
>> --
>> 2.44.0.rc1.240.g4c46232300-goog
>>
> 
> Ping:)
> 

We're in stage4 at present and this is new material.  I'll look at it after the 
branch has been cut.

R.

> 
> -- 
> 宋方睿



Re: [PATCH] LoongArch: testsuite: Rewrite {x, }vfcmp-{d, f}.c to avoid named registers

2024-03-06 Thread chenglulu

This test case is so cleverly designed!

I have no problem. Thank you!

在 2024/3/5 下午9:00, Xi Ruoyao 写道:

Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names.  A barrier is needed to always load the first operand before the
second operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.
---

Tested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/testsuite/gcc.target/loongarch/vfcmp-d.c  | 202 --
  gcc/testsuite/gcc.target/loongarch/vfcmp-f.c  | 347 ++
  gcc/testsuite/gcc.target/loongarch/xvfcmp-d.c | 202 --
  gcc/testsuite/gcc.target/loongarch/xvfcmp-f.c | 204 --
  4 files changed, 816 insertions(+), 139 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c 
b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
index 8b870ef38a0..87e4ed19e96 100644
--- a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
+++ b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
@@ -1,28 +1,188 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -mlsx -ffixed-f0 -ffixed-f1 -ffixed-f2 
-fno-vect-cost-model" } */
+/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */
+/* { dg-final { check-function-bodies "**" "" } } */
  
  #define F double

  #define I long long
  
  #include "vfcmp-f.c"
  
-/* { dg-final { scan-assembler "compare_quiet_equal:.*\tvfcmp\\.ceq\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_equal\n" } } */

-/* { dg-final { scan-assembler 
"compare_quiet_not_equal:.*\tvfcmp\\.cune\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_greater:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_less:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_less:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_greater:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_unordered:.*\tvfcmp\\.cun\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_ordered:.*\tvfcmp\\.cor\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_ordered\n"
 } } */
+/*
+** compare_quiet_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.ceq.d (\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_quiet_not_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.cune.d(\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_signaling_greater:
+** vld 

Re: [PATCH V3] rs6000: Don't ICE when compiling the __builtin_vsx_splat_2di built-in [PR113950]

2024-03-06 Thread Kewen.Lin
Hi,

on 2024/3/4 02:55, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
>   
> When we expand the __builtin_vsx_splat_2di function, we were allowing 
> immediate
> value for second operand which causes an unrecognizable insn ICE. Even though
> the immediate value was forced into a register, it wasn't correctly assigned
> to the second operand. So corrected the assignment of op1 to operands[1].
> 
> 2024-02-29  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/113950
>   * config/rs6000/vsx.md (vsx_splat_): Corrected assignment to
>   operand1.

Nit: s/Corrected/Correct/, maybe add "and simplify else if with else.".

> 
> gcc/testsuite/
>   PR target/113950
>   * gcc.target/powerpc/pr113950.c: New testcase.
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 6111cc90eb7..f135fa079bd 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4666,8 +4666,8 @@
>rtx op1 = operands[1];
>if (MEM_P (op1))
>  operands[1] = rs6000_force_indexed_or_indirect_mem (op1);
> -  else if (!REG_P (op1))
> -op1 = force_reg (mode, op1);
> +  else
> +operands[1] = force_reg (mode, op1);
>  })
>  
>  (define_insn "vsx_splat__reg"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113950.c 
> b/gcc/testsuite/gcc.target/powerpc/pr113950.c
> new file mode 100644
> index 000..64566a580d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113950.c
> @@ -0,0 +1,24 @@
> +/* PR target/113950 */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O1 -mvsx" } */

Nit: s/-O1/-O2/, -O2 is preferred when the failure can be reproduced
with -O2 (not just -O1), since optimization may change the level in
which it's placed, -O2 is more general.

As the discussions in the thread of the previous versions, I think
Segher agreed this approach, so OK for trunk with the above nits
tweaked, thanks!

BR,
Kewen

> +
> +/* Verify we do not ICE on the following.  */
> +
> +void abort (void);
> +
> +int main ()
> +{
> +  int i;
> +  vector signed long long vsll_result, vsll_expected_result;
> +  signed long long sll_arg1;
> +
> +  sll_arg1 = 300;
> +  vsll_expected_result = (vector signed long long) {300, 300};
> +  vsll_result = __builtin_vsx_splat_2di (sll_arg1);  
> +
> +  for (i = 0; i < 2; i++)
> +if (vsll_result[i] != vsll_expected_result[i])
> +  abort();
> +
> +  return 0;
> +}
> 
> 



Re: [PATCH, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-03-06 Thread Kewen.Lin
Hi,

on 2024/3/1 10:41, HAO CHEN GUI wrote:
> Hi,
>   This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
> combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an out AND. It matches a DImode rotate and mask insert on
> rs6000.
> 
> Trying 2 -> 7:
> 2: r122:DI=r129:DI
>   REG_DEAD r129:DI
> 7: r125:SI=r122:DI#0 0>>0x1f
>   REG_DEAD r122:DI
> Failed to match this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
> (zero_extract:DI (reg:DI 129)
> (const_int 32 [0x20])
> (const_int 1 [0x1])))
> Successfully matched this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
> (and:DI (lshiftrt:DI (reg:DI 129)
> (const_int 31 [0x1f]))
> (const_int 4294967295 [0x])))
> 
> This conversion blocks the further combination which combines to a SImode
> rotate and mask insert insn.
> 
> Trying 9, 7 -> 10:
> 9: r127:SI=r130:DI#0&0xfffe
>   REG_DEAD r130:DI
> 7: r125:SI#0=r129:DI 0>>0x1f&0x
>   REG_DEAD r129:DI
>10: r124:SI=r127:SI|r125:SI
>   REG_DEAD r125:SI
>   REG_DEAD r127:SI
> Failed to match this instruction:
> (set (reg:SI 124)
> (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
> (const_int -2 [0xfffe]))
> (subreg:SI (zero_extract:DI (reg:DI 129)
> (const_int 32 [0x20])
> (const_int 1 [0x1])) 0)))
> Failed to match this instruction:
> (set (reg:SI 124)
> (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
> (const_int -2 [0xfffe]))
> (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
> (const_int 31 [0x1f]))
> (const_int 4294967295 [0x])) 0)))
> 
>   The root cause of the issue is if it's necessary to do the widen mode for
> lshiftrt when the target already has the narrow mode lshiftrt and its cost
> is not high. My former patch tried to fix the problem but not accepted yet.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

Hope Segher can chime in this proposal updating combine pass, I can understand
the new proposal by introducing new patterns in target code is able to fix the
issue, but IMHO it's likely there are some other mis-optimization which don't
get noticed and they need some similar pattern extension (duplicate some pattern
& adjust with subreg) to optimize, from this perspective, it would be nice if
it's possible to have a more general fix.

Some minor comments for this patch itself are inlined.

> 
>   As it's stage 4 now, I drafted this patch to fix the regression by adding
> subreg patterns of SImode rotate and mask insert. It actually does reversed
> things and narrow the mode for lshiftrt so that it can matches the SImode
> rotate and mask insert.
> 
>   The case "rlwimi-2.c" is fixed and restore the corresponding number of
> insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm"
> is replaced with 9 "rldicl" as the sequence of combine is changed. It's not
> a regression as the total number of insns isn't changed.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Add subreg patterns for SImode rotate and mask insert
> 
> In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an AND.  The new pattern matches rotate and mask insert on
> rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
> and mask insert pattern.  This patch fixes the problem by adding two subreg
> pattern for SImode rotate and mask insert patterns.
> 
> gcc/
>   PR target/93738
>   * config/rs6000/rs6000.md (*rotlsi3_insert_9): New.
>   (*rotlsi3_insert_8): New.
> 
> gcc/testsuite/
>   PR target/93738
>   * gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
>   rotate instructions.
>   * gcc.target/powerpc/rlwinm-0.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bc8bc6ab060..b0b40f91e3e 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert"
>  ; difference between rlwimi and rldimi.  We also might want dot forms,
>  ; but not for rlwimi on POWER4 and similar processors.
> 
> +; Subreg pattern of insn "*rotlsi3_insert"
> +(define_insn_and_split "*rotlsi3_insert_9"

Nit: "*rotlsi3_insert_subreg" seems a better name, ...

> +  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> + (ior:SI (and:SI
> +  (match_operator:SI 8 "lowpart_subreg_operator"
> +   [(and:DI (match_operator:DI 4 "rotate_mask_operator"
> + [(match_operand:DI 1 "gpc_reg_operand" "r")
> +  (match_operand:SI 2 "const_int_operand" "n")])
> +(match_operand:DI 3 

[PATCH] tree-optimization/114246 - invalid call argument from DSE

2024-03-06 Thread Richard Biener
The following makes sure to strip type conversions added by
build_fold_addr_expr before placing the result in a call argument.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114246
* tree-ssa-dse.cc (increment_start_addr): Strip useless
type conversions from the adjusted address.

* gcc.dg/torture/pr114246.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr114246.c | 11 +++
 gcc/tree-ssa-dse.cc |  2 ++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr114246.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr114246.c 
b/gcc/testsuite/gcc.dg/torture/pr114246.c
new file mode 100644
index 000..eb20db594cd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr114246.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int a, b;
+
+void
+foo (void)
+{
+  __builtin_memcpy (, (char *) - 1, 2);
+  __builtin_memcpy (, , 1);
+}
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 7c348516ddf..fce4fc76a56 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-data-ref.h"
 #include "internal-fn.h"
+#include "tree-ssa.h"
 
 /* This file implements dead store elimination.
 
@@ -658,6 +659,7 @@ increment_start_addr (gimple *stmt, tree *where, int 
increment)
  *where,
  build_int_cst (ptr_type_node,
 increment)));
+  STRIP_USELESS_TYPE_CONVERSION (*where);
 }
 
 /* STMT is builtin call that writes bytes in bitmap ORIG, some bytes are dead
-- 
2.35.3


[PATCH] tree-optimization/114249 - ICE with BB reduction vectorization

2024-03-06 Thread Richard Biener
When we scrap the last def of an odd lane numbered BB reduction
we can end up recording a pattern def which will later wreck
code generation.  The following puts this logic where it better
belongs, avoiding this issue.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114249
* tree-vect-slp.cc (vect_build_slp_instance): Move making
a BB reduction lane number even ...
(vect_slp_check_for_roots): ... here to avoid leaking
pattern defs.

* gcc.dg/vect/bb-slp-pr114249.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr114249.c | 20 
 gcc/tree-vect-slp.cc| 20 ++--
 2 files changed, 30 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr114249.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr114249.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr114249.c
new file mode 100644
index 000..64c93cd9a2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr114249.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+enum { SEG_THIN_POOL } read_only;
+struct {
+  unsigned skip_block_zeroing;
+  unsigned ignore_discard;
+  unsigned no_discard_passdown;
+  unsigned error_if_no_space;
+} _thin_pool_emit_segment_line_seg;
+void dm_snprintf();
+void _emit_segment()
+{
+  int features =
+  (_thin_pool_emit_segment_line_seg.error_if_no_space ? 1 : 0) +
+  (read_only ? 1 : 0) +
+  (_thin_pool_emit_segment_line_seg.ignore_discard ? 1 : 0) +
+  (_thin_pool_emit_segment_line_seg.no_discard_passdown ? 1 : 0) +
+  (_thin_pool_emit_segment_line_seg.skip_block_zeroing ? 1 : 0);
+  dm_snprintf(features);
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 324400db19e..527b06c9f9c 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3288,15 +3288,6 @@ vect_build_slp_instance (vec_info *vinfo,
 "  %G", scalar_stmts[i]->stmt);
 }
 
-  /* When a BB reduction doesn't have an even number of lanes
- strip it down, treating the remaining lane as scalar.
- ???  Selecting the optimal set of lanes to vectorize would be nice
- but SLP build for all lanes will fail quickly because we think
- we're going to need unrolling.  */
-  if (kind == slp_inst_kind_bb_reduc
-  && (scalar_stmts.length () & 1))
-remain.safe_insert (0, gimple_get_lhs (scalar_stmts.pop ()->stmt));
-
   /* Build the tree for the SLP instance.  */
   unsigned int group_size = scalar_stmts.length ();
   bool *matches = XALLOCAVEC (bool, group_size);
@@ -7549,6 +7540,7 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo)
  /* ???  For now do not allow mixing ops or externs/constants.  */
  bool invalid = false;
  unsigned remain_cnt = 0;
+ unsigned last_idx = 0;
  for (unsigned i = 0; i < chain.length (); ++i)
{
  if (chain[i].code != code)
@@ -7563,7 +7555,13 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo)
  (chain[i].op)->stmt)
  != chain[i].op))
remain_cnt++;
+ else
+   last_idx = i;
}
+ /* Make sure to have an even number of lanes as we later do
+all-or-nothing discovery, not trying to split further.  */
+ if ((chain.length () - remain_cnt) & 1)
+   remain_cnt++;
  if (!invalid && chain.length () - remain_cnt > 1)
{
  vec stmts;
@@ -7576,7 +7574,9 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo)
  stmt_vec_info stmt_info;
  if (chain[i].dt == vect_internal_def
  && ((stmt_info = bb_vinfo->lookup_def (chain[i].op)),
- gimple_get_lhs (stmt_info->stmt) == chain[i].op))
+ gimple_get_lhs (stmt_info->stmt) == chain[i].op)
+ && (i != last_idx
+ || (stmts.length () & 1)))
stmts.quick_push (stmt_info);
  else
remain.quick_push (chain[i].op);
-- 
2.35.3


Re: [C++ coroutines] Initial implementation pushed to master.

2024-03-06 Thread Iain Sandoe



> On 5 Mar 2024, at 17:31, H.J. Lu  wrote:
> 
> On Sat, Jan 18, 2020 at 4:54 AM Iain Sandoe  wrote:
>> 

>> 2020-01-18  Iain Sandoe  
>> 
>>* Makefile.in: Add coroutine-passes.o.
>>* builtin-types.def (BT_CONST_SIZE): New.
>>(BT_FN_BOOL_PTR): New.
>>(BT_FN_PTR_PTR_CONST_SIZE_BOOL): New.
>>* builtins.def (DEF_COROUTINE_BUILTIN): New.
>>* coroutine-builtins.def: New file.
>>* coroutine-passes.cc: New file.
> 
> There are
> 
>  tree res_tgt = TREE_OPERAND (gimple_call_arg (stmt, 2), 0);
>  tree _dest = destinations.get_or_insert (idx, );
>  if (existed && dump_file)
>Why does this behavior depend on dump_file?

This was checking for a potential wrong-code error during development;
there is no point in making it into a diagnostic (since the user could not fix
the problem if it happened).  I guess changing to a gcc_checking_assert()
would be reasonable but I’d prefer to do that once GCC-15 opens.

Have you found any instance where this results in a reported bug?
(I do not recall anything on my coroutines bug list that would seem to indicate 
this).

thanks for noting it.
Iain


>{
>  fprintf (
>dump_file,
>"duplicate YIELD RESUME point (" HOST_WIDE_INT_PRINT_DEC
>") ?\n",
>idx);
>  print_gimple_stmt (dump_file, stmt, 0, TDF_VOPS|TDF_MEMSYMS);
>}
>  else
>res_dest = res_tgt;
> 
> H.J.



[PATCH v1] LoongArch: testsuite:Fix problems with incorrect results in vector test cases.

2024-03-06 Thread chenxiaolong
In simd_correctness_check.h, the role of the macro ASSERTEQ_64 is to check the
result of the passed vector values for the 64-bit data of each array element.
It turns out that it uses the abs() function to check only the lower 32 bits
of the data at a time, so it replaces abs() with the llabs() function.

However, the following two problems may occur after modification:

1.FAIL in lasx-xvfrint_s.c and lsx-vfrint_s.c
The reason for the error is because vector test cases that use __m{128,256} to
define vector types are composed of 32-bit primitive types, they should use
ASSERTEQ_32 instead of ASSERTEQ_64 to check for correctness.

2.FAIL in lasx-xvshuf_b.c and lsx-vshuf.c
The cause of the error is that the expected result of the function setting in
the test case is incorrect.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c: Replace
ASSERTEQ_64 with the macro ASSERTEQ_32.
* gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Modify the expected
test results of some functions according to the function of the vector
instruction.
* gcc.target/loongarch/vector/lsx/lsx-vfrint_s.c: Same
modification as lasx-xvfrint_s.c.
* gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Same
modification as lasx-xvshuf_b.c.
* gcc.target/loongarch/vector/simd_correctness_check.h: Use the llabs()
function instead of abs() to check the correctness of the results.
---
 .../loongarch/vector/lasx/lasx-xvfrint_s.c| 58 +--
 .../loongarch/vector/lasx/lasx-xvshuf_b.c | 14 ++---
 .../loongarch/vector/lsx/lsx-vfrint_s.c   | 50 
 .../loongarch/vector/lsx/lsx-vshuf.c  | 12 ++--
 .../loongarch/vector/simd_correctness_check.h |  2 +-
 5 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c
index fbfe300eac4..4538528a67f 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c
@@ -184,7 +184,7 @@ main ()
   *((int *)&__m256_result[1]) = 0x;
   *((int *)&__m256_result[0]) = 0x;
   __m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
 
   *((int *)&__m256_op0[7]) = 0x;
   *((int *)&__m256_op0[6]) = 0x;
@@ -203,7 +203,7 @@ main ()
   *((int *)&__m256_result[1]) = 0x;
   *((int *)&__m256_result[0]) = 0x;
   __m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
 
   *((int *)&__m256_op0[7]) = 0x;
   *((int *)&__m256_op0[6]) = 0x;
@@ -222,7 +222,7 @@ main ()
   *((int *)&__m256_result[1]) = 0x;
   *((int *)&__m256_result[0]) = 0x;
   __m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
 
   *((int *)&__m256_op0[7]) = 0x01010101;
   *((int *)&__m256_op0[6]) = 0x01010101;
@@ -241,7 +241,7 @@ main ()
   *((int *)&__m256_result[1]) = 0x;
   *((int *)&__m256_result[0]) = 0x;
   __m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
 
   *((int *)&__m256_op0[7]) = 0x;
   *((int *)&__m256_op0[6]) = 0x;
@@ -260,7 +260,7 @@ main ()
   *((int *)&__m256_result[1]) = 0x;
   *((int *)&__m256_result[0]) = 0x;
   __m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
 
   *((int *)&__m256_op0[7]) = 0x;
   *((int *)&__m256_op0[6]) = 0x;
@@ -279,7 +279,7 @@ main ()
   *((int *)&__m256_result[1]) = 0x;
   *((int *)&__m256_result[0]) = 0x;
   __m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
 
   *((int *)&__m256_op0[7]) = 0x;
   *((int *)&__m256_op0[6]) = 0x;
@@ -298,7 +298,7 @@ main ()
   *((int *)&__m256_result[1]) = 0x;
   *((int *)&__m256_result[0]) = 0x;
   __m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
 
   *((int *)&__m256_op0[7]) = 0x01010101;
   *((int *)&__m256_op0[6]) = 0x01010101;
@@ -317,7 +317,7 @@ main ()
   *((int *)&__m256_result[1]) = 0x;
   *((int *)&__m256_result[0]) = 0x;
   __m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
 
   *((int *)&__m256_op0[7]) 

Re: [PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-06 Thread LIU Hao
May I suggest you keep the mcf thread model for aarch-w64-mingw32? I requested Martin Storsjö to 
test it on a physical Windows 11 on ARM machine with Clang and all tests passed. I think it should 
work once the GCC support is complete.



--
Best regards,
LIU Hao


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH] i386: Fix up the vzeroupper REG_DEAD/REG_UNUSED note workaround [PR114190]

2024-03-06 Thread Uros Bizjak
On Wed, Mar 6, 2024 at 9:10 AM Jakub Jelinek  wrote:
>
> Hi!
>
> When writing the rest_of_handle_insert_vzeroupper workaround to manually
> remove all the REG_DEAD/REG_UNUSED notes from the IL, I've missed that
> there is a df_analyze () call right after it and that the problems added
> earlier in the pass, like df_note_add_problem () done during mode switching,
> doesn't affect just the next df_analyze () call right after it, but all
> other df_analyze () calls until the end of the current pass where
> df_finish_pass removes the optional problems.
>
> So, as can be seen on the following patch, the workaround doesn't actually
> work there, because while rest_of_handle_insert_vzeroupper carefully removes
> all REG_DEAD/REG_UNUSED notes, the df_analyze () call at the end of the
> function immediately adds them in again (so, I must say I have no idea
> why the workaround worked on the earlier testcases).
>
> Now, I could move the df_analyze () call just before the REG_DEAD/REG_UNUSED
> note removal loop, but I think the following patch is better, because
> the df_analyze () call doesn't have to recompute the problem when we don't
> care about it and will actively strip all traces of it away.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-03-06  Jakub Jelinek  
>
> PR rtl-optimization/114190
> * config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
> Call df_remove_problem for df_note before calling df_analyze.
>
> * gcc.target/i386/avx-pr114190.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-features.cc.jj 2024-02-22 10:10:18.658032517 +0100
> +++ gcc/config/i386/i386-features.cc2024-03-05 09:23:54.496112264 +0100
> @@ -2690,6 +2690,7 @@ rest_of_handle_insert_vzeroupper (void)
> }
> }
>
> +  df_remove_problem (df_note);
>df_analyze ();
>return 0;
>  }
> --- gcc/testsuite/gcc.target/i386/avx-pr114190.c.jj 2024-03-05 
> 10:07:24.869454305 +0100
> +++ gcc/testsuite/gcc.target/i386/avx-pr114190.c2024-03-05 
> 10:06:52.870889687 +0100
> @@ -0,0 +1,27 @@
> +/* PR rtl-optimization/114190 */
> +/* { dg-do run { target avx } } */
> +/* { dg-options "-O2 -fno-dce -fharden-compares -mavx 
> --param=max-rtl-if-conversion-unpredictable-cost=136 -mno-avx512f -Wno-psabi" 
> } */
> +
> +#include "avx-check.h"
> +
> +typedef unsigned char U __attribute__((vector_size (64)));
> +typedef unsigned int V __attribute__((vector_size (64)));
> +U u;
> +
> +V
> +foo (V a, V b)
> +{
> +  u[0] = __builtin_sub_overflow (0, (int) a[0], [b[7] & 5]) ? -u[1] : 
> -b[3];
> +  b ^= 0 != b;
> +  return (V) u + (V) a + (V) b;
> +}
> +
> +static void
> +avx_test (void)
> +{
> +  V x = foo ((V) { 1 }, (V) { 0, 0, 0, 1 });
> +  if (x[0] != -1U)
> +__builtin_abort ();
> +  if (x[3] != -2U)
> +__builtin_abort ();
> +}
>
> Jakub
>


[PATCH] match.pd: Optimize a * !a to 0 [PR114009]

2024-03-06 Thread Jakub Jelinek
Hi!

The following patch attempts to fix an optimization regression through
adding a simple simplification.  We already have the
/* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0  */
(if (!canonicalize_math_p ())
 (for cmp (tcc_comparison)
  (simplify
   (mult:c (convert (cmp@0 @1 @2)) @3)
   (if (INTEGRAL_TYPE_P (type)
&& INTEGRAL_TYPE_P (TREE_TYPE (@0)))
 (cond @0 @3 { build_zero_cst (type); })))
optimization which otherwise triggers during the a * !a multiplication,
but that is done only late and we aren't able through range assumptions
optimize it yet anyway.

The patch adds a specific simplification for it.
If a is zero, then a * !a will be 0 * 1 (or for signed 1-bit 0 * -1)
and so 0.
If a is non-zero, then a * !a will be a * 0 and so again 0.

I had to use a hack, TREE_TYPE (@0) in the condition as opposed to
using just type, because otherwise I get a false positive warning.
The captures parameter is marked ARG_UNUSED in gimple, but not in
generic and we end up with
tree
generic_simplify_168 (location_t ARG_UNUSED (loc), const tree ARG_UNUSED (type),
 tree ARG_UNUSED (_p0), tree ARG_UNUSED (_p1), tree *captures)
{
  const bool debug_dump = dump_file && (dump_flags & TDF_FOLDING);
  if (INTEGRAL_TYPE_P (type)
)
{
  if (TREE_SIDE_EFFECTS (_p1)) goto next_after_fail309;
  if (UNLIKELY (!dbg_cnt (match))) goto next_after_fail309;
  {
tree _r;
_r =  build_zero_cst (type);
if (UNLIKELY (debug_dump)) generic_dump_logs ("match.pd", 216, 
__FILE__, __LINE__, true);
return _r;
  }
next_after_fail309:;
}
  return NULL_TREE;
}
there.  I've looked at similar cases and in
 (simplify
  (mod @0 integer_onep)
  { build_zero_cst (type); })
case we don't get captures unused because it is used a single time
and therefore evaluate it for side-effects instead of punting on them,
and in
 (pointer_diff @@0 @0)
 { build_zero_cst (type); })
we do so as well.  Though, using @@0 instead of @0 on either operand
of this simplification doesn't help.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-06  Jakub Jelinek  

PR tree-optimization/114009
* match.pd (a * !a -> 0): New simplification.

* gcc.dg/tree-ssa/pr114009.c: New test.

--- gcc/match.pd.jj 2024-03-01 14:56:42.442810053 +0100
+++ gcc/match.pd2024-03-05 22:53:25.202579435 +0100
@@ -1219,6 +1219,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(lshift @0 @2)))
 
+/* Fold a * !a into 0.  */
+(simplify
+ (mult:c @0 (convert? (eq @0 integer_zerop)))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+   { build_zero_cst (type); }))
+
 /* Shifts by precision or greater result in zero.  */
 (for shift (lshift rshift)
  (simplify
--- gcc/testsuite/gcc.dg/tree-ssa/pr114009.c.jj 2024-03-05 15:18:41.274541636 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr114009.c2024-03-05 15:16:09.056589675 
+0100
@@ -0,0 +1,24 @@
+/* PR tree-optimization/114009 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "  return 0;" 3 "optimized" } } */
+
+int
+foo (int x)
+{
+  x = (x / 2) * 2;
+  return (!x) * x;
+}
+
+int
+bar (int x, int y)
+{
+  (void) x;
+  return y * !y;
+}
+
+unsigned long long
+baz (unsigned long long x)
+{
+  return (!x) * x;
+}

Jakub



[PATCH] i386: Fix up the vzeroupper REG_DEAD/REG_UNUSED note workaround [PR114190]

2024-03-06 Thread Jakub Jelinek
Hi!

When writing the rest_of_handle_insert_vzeroupper workaround to manually
remove all the REG_DEAD/REG_UNUSED notes from the IL, I've missed that
there is a df_analyze () call right after it and that the problems added
earlier in the pass, like df_note_add_problem () done during mode switching,
doesn't affect just the next df_analyze () call right after it, but all
other df_analyze () calls until the end of the current pass where
df_finish_pass removes the optional problems.

So, as can be seen on the following patch, the workaround doesn't actually
work there, because while rest_of_handle_insert_vzeroupper carefully removes
all REG_DEAD/REG_UNUSED notes, the df_analyze () call at the end of the
function immediately adds them in again (so, I must say I have no idea
why the workaround worked on the earlier testcases).

Now, I could move the df_analyze () call just before the REG_DEAD/REG_UNUSED
note removal loop, but I think the following patch is better, because
the df_analyze () call doesn't have to recompute the problem when we don't
care about it and will actively strip all traces of it away.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-06  Jakub Jelinek  

PR rtl-optimization/114190
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Call df_remove_problem for df_note before calling df_analyze.

* gcc.target/i386/avx-pr114190.c: New test.

--- gcc/config/i386/i386-features.cc.jj 2024-02-22 10:10:18.658032517 +0100
+++ gcc/config/i386/i386-features.cc2024-03-05 09:23:54.496112264 +0100
@@ -2690,6 +2690,7 @@ rest_of_handle_insert_vzeroupper (void)
}
}
 
+  df_remove_problem (df_note);
   df_analyze ();
   return 0;
 }
--- gcc/testsuite/gcc.target/i386/avx-pr114190.c.jj 2024-03-05 
10:07:24.869454305 +0100
+++ gcc/testsuite/gcc.target/i386/avx-pr114190.c2024-03-05 
10:06:52.870889687 +0100
@@ -0,0 +1,27 @@
+/* PR rtl-optimization/114190 */
+/* { dg-do run { target avx } } */
+/* { dg-options "-O2 -fno-dce -fharden-compares -mavx 
--param=max-rtl-if-conversion-unpredictable-cost=136 -mno-avx512f -Wno-psabi" } 
*/
+
+#include "avx-check.h"
+
+typedef unsigned char U __attribute__((vector_size (64)));
+typedef unsigned int V __attribute__((vector_size (64)));
+U u;
+
+V
+foo (V a, V b)
+{
+  u[0] = __builtin_sub_overflow (0, (int) a[0], [b[7] & 5]) ? -u[1] : -b[3];
+  b ^= 0 != b;
+  return (V) u + (V) a + (V) b;
+}
+
+static void
+avx_test (void)
+{
+  V x = foo ((V) { 1 }, (V) { 0, 0, 0, 1 });
+  if (x[0] != -1U)
+__builtin_abort ();
+  if (x[3] != -2U)
+__builtin_abort ();
+}

Jakub