Re: [PATCH] testsuite: Fix up scev-16.c test [PR113446]

2024-01-17 Thread Jakub Jelinek
On Thu, Jan 18, 2024 at 08:40:04AM +0100, Richard Biener wrote:
> > This test FAILs on i686-linux or e.g. sparc*-solaris*, because
> > it uses vect_int effective target outside of */vect/ testsuite.
> > That is wrong, vect_int assumes the extra added flags by vect.exp
> > by default, which aren't added in other testsuites.
> > 
> > The following patch fixes that by moving the test into gcc.dg/vect/
> > and doing small tweaks.
> > 
> > Regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> OK, note -O2 -ftree-vectorize are default in vect.exp (so is
> -fno-vect-cost-model)

It is actually the check_vect_support_and_set_flags added flags
here, so on i686-linux -msse2, on sparc*-* -mcpu=ultrasparc -mvis,
-mvsx -mno-allow-movmisalign or -mcpu=970 on some powerpc*,
etc. that matters here.

Jakub



Re: [PATCH] testsuite: Fix up scev-16.c test [PR113446]

2024-01-17 Thread Richard Biener
On Thu, 18 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> This test FAILs on i686-linux or e.g. sparc*-solaris*, because
> it uses vect_int effective target outside of */vect/ testsuite.
> That is wrong, vect_int assumes the extra added flags by vect.exp
> by default, which aren't added in other testsuites.
> 
> The following patch fixes that by moving the test into gcc.dg/vect/
> and doing small tweaks.
> 
> Regtested on x86_64-linux and i686-linux, ok for trunk?

OK, note -O2 -ftree-vectorize are default in vect.exp (so is
-fno-vect-cost-model)

> 2024-01-18  Jakub Jelinek  
> 
>   PR tree-optimization/112774
>   PR testsuite/113446
>   * gcc.dg/tree-ssa/scev-16.c: Move test ...
>   * gcc.dg/vect/pr112774.c: ... here.  Add PR comment line, use
>   dg-additional-options instead of dg-options and drop
>   -fdump-tree-vect-details.
> 
> --- gcc/testsuite/gcc.dg/tree-ssa/scev-16.c.jj2023-12-08 
> 08:28:23.790168953 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/scev-16.c   2024-01-17 18:21:26.397146209 
> +0100
> @@ -1,18 +0,0 @@
> -/* { dg-do compile } */
> -/* { dg-require-effective-target vect_int } */
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> -
> -int A[1024 * 2];
> -
> -int foo (unsigned offset, unsigned N)
> -{
> -  int sum = 0;
> -
> -  for (unsigned i = 0; i < N; i++)
> -sum += A[i + offset];
> -
> -  return sum;
> -}
> -
> -/* Loop can be vectorized by referring "i + offset" is nonwrapping from 
> array.  */
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { 
> avr-*-* msp430-*-* pru-*-* } } } } } */
> --- gcc/testsuite/gcc.dg/vect/pr112774.c.jj   2024-01-17 18:20:25.401978923 
> +0100
> +++ gcc/testsuite/gcc.dg/vect/pr112774.c  2024-01-17 18:21:16.194285496 
> +0100
> @@ -0,0 +1,19 @@
> +/* PR tree-optimization/112774 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O2 -ftree-vectorize" } */
> +
> +int A[1024 * 2];
> +
> +int foo (unsigned offset, unsigned N)
> +{
> +  int sum = 0;
> +
> +  for (unsigned i = 0; i < N; i++)
> +sum += A[i + offset];
> +
> +  return sum;
> +}
> +
> +/* Loop can be vectorized by referring "i + offset" is nonwrapping from 
> array.  */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { 
> avr-*-* msp430-*-* pru-*-* } } } } } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-17 Thread Xi Ruoyao
On Thu, 2024-01-18 at 15:15 +0800, chenglulu wrote:

> > gcc.dg/tree-ssa/scev-16.c is OK to move
> > gcc.dg/pr104992.c should simply add -fno-tree-vectorize to the used
> > options and remove the vect_* stuff
> 
> Hi Richard:
> 
> I have a question. I don't understand the purpose of adding 
> '-fno-tree-vectorize' here.

I don't think -fno-tree-vectorize will make a difference here.  This
test case uses __attribute__((vector_size(...))) explicitly so the
vector operation will be used even if -fno-tree-vectorize.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] testsuite: Fix up gcc.target/i386/sse4_1-stv-1.c test [PR113452]

2024-01-17 Thread Richard Biener
On Thu, 18 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> From what I can see, this test has been written for a backend fix and
> assumes the loop isn't vectorized (at least, it wasn't when the test was
> added, it contains an early exit), but that is no longer true and because
> of the vectorization it now contains an instruction which the test scans
> for not being present.
> 
> I think we should just disable vectorization here.
> 
> Regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2024-01-18  Jakub Jelinek  
> 
>   PR testsuite/113452
>   * gcc.target/i386/sse4_1-stv-1.c: Add -fno-tree-vectorize to
>   dg-options.
> 
> --- gcc/testsuite/gcc.target/i386/sse4_1-stv-1.c.jj   2022-05-31 
> 11:33:51.603250042 +0200
> +++ gcc/testsuite/gcc.target/i386/sse4_1-stv-1.c  2024-01-17 
> 17:46:21.999689350 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target ia32 } } */
> -/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */
> +/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign -fno-tree-vectorize" } 
> */
>  long long a[1024];
>  long long b[1024];
>  
> 
>   Jakub


[PATCH] testsuite: Fix up scev-16.c test [PR113446]

2024-01-17 Thread Jakub Jelinek
Hi!

This test FAILs on i686-linux or e.g. sparc*-solaris*, because
it uses vect_int effective target outside of */vect/ testsuite.
That is wrong, vect_int assumes the extra added flags by vect.exp
by default, which aren't added in other testsuites.

The following patch fixes that by moving the test into gcc.dg/vect/
and doing small tweaks.

Regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-18  Jakub Jelinek  

PR tree-optimization/112774
PR testsuite/113446
* gcc.dg/tree-ssa/scev-16.c: Move test ...
* gcc.dg/vect/pr112774.c: ... here.  Add PR comment line, use
dg-additional-options instead of dg-options and drop
-fdump-tree-vect-details.

--- gcc/testsuite/gcc.dg/tree-ssa/scev-16.c.jj  2023-12-08 08:28:23.790168953 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/scev-16.c 2024-01-17 18:21:26.397146209 
+0100
@@ -1,18 +0,0 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target vect_int } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
-
-int A[1024 * 2];
-
-int foo (unsigned offset, unsigned N)
-{
-  int sum = 0;
-
-  for (unsigned i = 0; i < N; i++)
-sum += A[i + offset];
-
-  return sum;
-}
-
-/* Loop can be vectorized by referring "i + offset" is nonwrapping from array. 
 */
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { 
avr-*-* msp430-*-* pru-*-* } } } } } */
--- gcc/testsuite/gcc.dg/vect/pr112774.c.jj 2024-01-17 18:20:25.401978923 
+0100
+++ gcc/testsuite/gcc.dg/vect/pr112774.c2024-01-17 18:21:16.194285496 
+0100
@@ -0,0 +1,19 @@
+/* PR tree-optimization/112774 */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O2 -ftree-vectorize" } */
+
+int A[1024 * 2];
+
+int foo (unsigned offset, unsigned N)
+{
+  int sum = 0;
+
+  for (unsigned i = 0; i < N; i++)
+sum += A[i + offset];
+
+  return sum;
+}
+
+/* Loop can be vectorized by referring "i + offset" is nonwrapping from array. 
 */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { 
avr-*-* msp430-*-* pru-*-* } } } } } */

Jakub



Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-01-17 Thread Andrew Pinski
On Wed, Jan 17, 2024 at 11:28 PM Matthias Kretz  wrote:
>
> On Thursday, 4 January 2024 10:10:12 CET Andrew Pinski wrote:
> > I really doubt this would work in the end. Because HW which is 128bits
> > only, can't support -msve-vector-bits=2048 . I am thinking
> > std::experimental::simd is not the right way of supporting this.
> > Really the route the standard should be heading towards is non
> > constant at compile time sized vectors instead and then you could use
> > the constant sized ones to emulate the Variable length ones.
>
> I don't follow. "non-constant at compile time sized vectors" implies
> sizeless (no constexpr sizeof), no?
> Let me try to explain where I'm coming from. One of the motivating use
> cases for simd types is composition. Like this:
>
> template 
> struct Point
> {
>   T x, y, z;
>
>   T distance_to_origin() {
> return sqrt(x * x + y * y + z * z);
>   }
> };
>
> Point is one point in 3D space, Point> stores multiple
> points in 3D space and can work on them in parallel.
>
> This implies that simd must have a sizeof. C++ is unlikely to get
> sizeless types (the discussions were long, there were many papers, ...).
> Should sizeless types in C++ ever happen, then composition is likely going
> to be constrained to the last data member.

Even this is a bad design in general for simd. It means the code needs
to know the size.
Also AoS vs SoA is always an interesting point here. In some cases you
want an array of structs
for speed and Point> does not work there at all. I guess
This is all water under the bridge with how folks design code.
You are basically pushing AoSoA idea here which is much worse idea than before.

That being said sometimes it is not a vector of N elements you want to
work on but rather 1/2/3 vector of  N elements. Seems like this is
just pushing the idea one of one vector of one type of element which
again is wrong push.
Also more over, I guess pushing one idea of SIMD is worse than pushing
any idea of SIMD. For Mathematical code, it is better for the compiler
to do the vectorization than the user try to be semi-portable between
different targets. This is what was learned on Fortran but I guess
some folks in the C++ likes to expose the underlying HW instead of
thinking high level here.

Thanks,
Andrew Pinski

>
> With the above as our design constraints, SVE at first seems to be a bad
> fit for implementing std::simd. However, if (at least initially) we accept
> the need for different binaries for different SVE implementations, then you
> can look at the "scalable" part of SVE as an efficient way of reducing the
> number of opcodes necessary for supporting all kinds of different vector
> lengths. But otherwise you can treat it as fixed-size registers - which it
> is for a given CPU. In the case of a multi-CPU shared-memory system (e.g.
> RDMA between different ARM implementations) all you need is a different
> name for incompatible types. So std::simd on SVE256 must have a
> different name on SVE512. Same for std::simd (which is currently
> not the case with Sriniva's patch, I think, and needs to be resolved).

For SVE that is a bad design. It means The code is not portable at all.

>
> > I think we should not depend on __ARM_FEATURE_SVE_BITS being set here
> > and being meanful in any way.
>
> I'd love to. In the same way I'd love to *not depend* on __AVX__,
> __AVX512F__ etc.
>
> - Matthias
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
>  std::simd
> ──


[PATCH] testsuite: Fix up gcc.target/i386/sse4_1-stv-1.c test [PR113452]

2024-01-17 Thread Jakub Jelinek
Hi!

>From what I can see, this test has been written for a backend fix and
assumes the loop isn't vectorized (at least, it wasn't when the test was
added, it contains an early exit), but that is no longer true and because
of the vectorization it now contains an instruction which the test scans
for not being present.

I think we should just disable vectorization here.

Regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-18  Jakub Jelinek  

PR testsuite/113452
* gcc.target/i386/sse4_1-stv-1.c: Add -fno-tree-vectorize to
dg-options.

--- gcc/testsuite/gcc.target/i386/sse4_1-stv-1.c.jj 2022-05-31 
11:33:51.603250042 +0200
+++ gcc/testsuite/gcc.target/i386/sse4_1-stv-1.c2024-01-17 
17:46:21.999689350 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile { target ia32 } } */
-/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */
+/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign -fno-tree-vectorize" } */
 long long a[1024];
 long long b[1024];
 

Jakub



Re: [PATCH] opts: Fix up -ffold-mem-offsets option keywords

2024-01-17 Thread Richard Biener
On Thu, 18 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> While the option was originally meant to be a Target option for a single
> target, it is an option for all targets, so should be Common rather than
> Target, and because it is an optimization option which could be different
> in between different LTO TUs, I've added Optimization keyword too.
> From what I can see, Bool is a non-documented non-existing keyword (at
> least, grep Bool *.awk shows nothing, so I've dropped that too.  Seems
> that the option parsing simply parses and ignores any non-existing keywords.
> 
> Guess we should drop the Bool keywords from the gcc/config/riscv/riscv.opt
> file eventually, so that people don't copy this around.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2024-01-18  Jakub Jelinek  
> 
>   PR other/113399
>   * common.opt (ffold-mem-offsets): Remove Target and Bool keywords, add
>   Common and Optimization.
> 
> --- gcc/common.opt.jj 2024-01-03 11:51:31.467732078 +0100
> +++ gcc/common.opt2024-01-17 17:22:05.975424001 +0100
> @@ -1262,7 +1262,7 @@ Common Var(flag_cprop_registers) Optimiz
>  Perform a register copy-propagation optimization pass.
>  
>  ffold-mem-offsets
> -Target Bool Var(flag_fold_mem_offsets) Init(1)
> +Common Var(flag_fold_mem_offsets) Init(1) Optimization
>  Fold instructions calculating memory offsets to the memory access 
> instruction if possible.
>  
>  fcrossjumping
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] tree-optimization/113374 - early break vect and virtual operands

2024-01-17 Thread Richard Biener
The following fixes wrong virtual operands being used for peeled
early breaks where we can have different live ones and for multiple
exits it makes sure to update the correct PHI arguments.

I've introduced SET_PHI_ARG_DEF_ON_EDGE so we can avoid using
a wrong edge to compute the PHI arg index from.

I've took the liberty to understand the code again and refactor
and comment it a bit differently.  The main functional change
is that we preserve the live virtual operand on all exits.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/113374
* tree-ssa-operands.h (SET_PHI_ARG_DEF_ON_EDGE): New.
* tree-vect-loop.cc (move_early_exit_stmts): Update
virtual LC PHIs.
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Refactor.  Preserve virtual LC PHIs on all exits.

* gcc.dg/vect/vect-early-break_106-pr113374.c: New testcase.
---
 .../vect/vect-early-break_106-pr113374.c  |  19 ++
 gcc/tree-ssa-operands.h   |   3 +
 gcc/tree-vect-loop-manip.cc   | 202 +-
 gcc/tree-vect-loop.cc |   6 +
 4 files changed, 124 insertions(+), 106 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_106-pr113374.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_106-pr113374.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_106-pr113374.c
new file mode 100644
index 000..e2995322af2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_106-pr113374.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+
+typedef __SIZE_TYPE__ size_t;
+struct S { unsigned char *a, *b; };
+
+void
+foo (struct S x)
+{
+  for (size_t i = x.b - x.a; i > 0; --i)
+{
+  size_t t = x.b - x.a;
+  size_t u = i - 1;
+  if (u >= t)
+__builtin_abort ();
+  if (x.a[i - 1]--)
+break;
+}
+}
diff --git a/gcc/tree-ssa-operands.h b/gcc/tree-ssa-operands.h
index 7276bc63e44..8072932564a 100644
--- a/gcc/tree-ssa-operands.h
+++ b/gcc/tree-ssa-operands.h
@@ -82,6 +82,9 @@ struct GTY(()) ssa_operands {
 #define PHI_ARG_DEF(PHI, I)gimple_phi_arg_def ((PHI), (I))
 #define SET_PHI_ARG_DEF(PHI, I, V) \
SET_USE (PHI_ARG_DEF_PTR ((PHI), (I)), (V))
+#define SET_PHI_ARG_DEF_ON_EDGE(PHI, E, V)   \
+   SET_USE (gimple_phi_arg_imm_use_ptr_from_edge \
+  ((PHI), (E)), (V))
 #define PHI_ARG_DEF_FROM_EDGE(PHI, E)  \
gimple_phi_arg_def_from_edge ((PHI), (E))
 #define PHI_ARG_DEF_PTR_FROM_EDGE(PHI, E)  \
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 8aa9224e1a9..983ed2e9b1f 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1578,97 +1578,105 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
  flush_pending_stmts (new_exit);
}
 
+  bool need_virtual_phi = get_virtual_phi (loop->header);
+
+  /* For the main loop exit preserve the LC PHI nodes.  For vectorization
+we need them to continue or finalize reductions.  Since we do not
+copy the loop exit blocks we have to materialize PHIs at the
+new destination before redirecting edges.  */
+  for (auto gsi_from = gsi_start_phis (loop_exit->dest);
+  !gsi_end_p (gsi_from); gsi_next (_from))
+   {
+ tree res = gimple_phi_result (*gsi_from);
+ create_phi_node (copy_ssa_name (res), new_preheader);
+   }
+  edge e = redirect_edge_and_branch (loop_exit, new_preheader);
+  gcc_assert (e == loop_exit);
+  flush_pending_stmts (loop_exit);
+  set_immediate_dominator (CDI_DOMINATORS, new_preheader, loop_exit->src);
+
   bool multiple_exits_p = loop_exits.length () > 1;
   basic_block main_loop_exit_block = new_preheader;
   basic_block alt_loop_exit_block = NULL;
-  /* Create intermediate edge for main exit.  But only useful for early
-exits.  */
+  /* Create the CFG for multiple exits.
+  | loop_exit   | alt1   | altN
+  v v   ...  v
+   main_loop_exit_block:   alt_loop_exit_block:
+  |  /
+  v v
+   new_preheader:
+where in the new preheader we need merge PHIs for
+the continuation values into the epilogue header.
+Do not bother with exit PHIs for the early exits but
+their live virtual operand.  We'll fix up things below.  */
   if (multiple_exits_p)
{
  edge loop_e = single_succ_edge (new_preheader);
  new_preheader = split_edge (loop_e);
-   }
 
-  auto_vec  

[PATCH] opts: Fix up -ffold-mem-offsets option keywords

2024-01-17 Thread Jakub Jelinek
Hi!

While the option was originally meant to be a Target option for a single
target, it is an option for all targets, so should be Common rather than
Target, and because it is an optimization option which could be different
in between different LTO TUs, I've added Optimization keyword too.
>From what I can see, Bool is a non-documented non-existing keyword (at
least, grep Bool *.awk shows nothing, so I've dropped that too.  Seems
that the option parsing simply parses and ignores any non-existing keywords.

Guess we should drop the Bool keywords from the gcc/config/riscv/riscv.opt
file eventually, so that people don't copy this around.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-18  Jakub Jelinek  

PR other/113399
* common.opt (ffold-mem-offsets): Remove Target and Bool keywords, add
Common and Optimization.

--- gcc/common.opt.jj   2024-01-03 11:51:31.467732078 +0100
+++ gcc/common.opt  2024-01-17 17:22:05.975424001 +0100
@@ -1262,7 +1262,7 @@ Common Var(flag_cprop_registers) Optimiz
 Perform a register copy-propagation optimization pass.
 
 ffold-mem-offsets
-Target Bool Var(flag_fold_mem_offsets) Init(1)
+Common Var(flag_fold_mem_offsets) Init(1) Optimization
 Fold instructions calculating memory offsets to the memory access instruction 
if possible.
 
 fcrossjumping

Jakub



[PATCH] tree-optimization/113431 - wrong dependence with invariant load

2024-01-17 Thread Richard Biener
The vectorizer dependence analysis is confused with invariant loads
when figuring whether the circumstances are so that we preserve
scalar stmt execution order.  The following rectifies this.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/113431
* tree-vect-data-refs.cc (vect_preserves_scalar_order_p):
When there is an invariant load we might not preserve
scalar order.

* gcc.dg/vect/pr113431.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr113431.c | 18 ++
 gcc/tree-vect-data-refs.cc   |  6 ++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr113431.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr113431.c 
b/gcc/testsuite/gcc.dg/vect/pr113431.c
new file mode 100644
index 000..04448d9dd81
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113431.c
@@ -0,0 +1,18 @@
+/* { dg-additional-options "-O3 -fdump-tree-slp1-details" } */
+
+#include "tree-vect.h"
+
+int a[2][9];
+int b;
+int main()
+{
+  check_vect ();
+  for (b = 0; b < 2; b++)
+for (long e = 8; e > 0; e--)
+  a[b][e] = a[0][1] == 0;
+  if (a[1][1] != 0)
+__builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "optimized: basic block part vectorized" 
2 "slp1" { target vect_int } } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 0495842b350..f592aeb8028 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -282,6 +282,12 @@ vect_preserves_scalar_order_p (dr_vec_info *dr_info_a, 
dr_vec_info *dr_info_b)
   && !STMT_VINFO_GROUPED_ACCESS (stmtinfo_b))
 return true;
 
+  /* If there is a loop invariant read involved we might vectorize it in
+ the prologue, breaking scalar oder with respect to the in-loop store.  */
+  if ((DR_IS_READ (dr_info_a->dr) && integer_zerop (DR_STEP (dr_info_a->dr)))
+  || (DR_IS_READ (dr_info_b->dr) && integer_zerop (DR_STEP 
(dr_info_b->dr
+return false;
+
   /* STMT_A and STMT_B belong to overlapping groups.  All loads are
  emitted at the position of the first scalar load.
  Stores in a group are emitted at the position of the last scalar store.
-- 
2.35.3


Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-17 Thread Richard Biener
On Thu, 18 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> On aarch64 the backend decides to use non-BLKmode for some arrays
> like unsigned long[4] - OImode in that case, but the corresponding
> BITINT_TYPEs have BLKmode (like structures containing that many limb
> elements).  This both isn't a good idea (we really want such underlying vars
> to live in memory and access them there, rather than live in registers and
> access their parts in there) and causes ICEs during expansion
> (VIEW_CONVERT_EXPR from such OImode array to BLKmode BITINT_TYPE), so the
> following patch makes sure such arrays reflect the BLKmode of BITINT_TYPEs
> it is accessed with (if any).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

So the issue is only manifesting during expansion?  I think it would
be better to detect the specific issue (V_C_E from register to BLKmode)
in discover_nonconstant_array_refs_r and force the register argument
to stack?

> 2024-01-18  Jakub Jelinek  
> 
>   * gimple-lower-bitint.cc (gimple_lower_bitint): When creating
>   array VAR_DECL for BITINT_TYPE SSA_NAMEs which have BLKmode, force
>   DECL_MODE of those vars to be BLKmode as well.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-17 14:43:33.498961304 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-17 14:50:50.252889131 +0100
> @@ -6348,7 +6348,15 @@ gimple_lower_bitint (void)
> tree s = ssa_name (i);
> int p = var_to_partition (large_huge.m_map, s);
> if (large_huge.m_vars[p] != NULL_TREE)
> - continue;
> + {
> +   /* If BITINT_TYPE is BLKmode, make sure the underlying
> +  variable is BLKmode as well.  */
> +   if (TYPE_MODE (TREE_TYPE (s)) == BLKmode
> +   && VAR_P (large_huge.m_vars[p])
> +   && DECL_MODE (large_huge.m_vars[p]) != BLKmode)
> + DECL_MODE (large_huge.m_vars[p]) = BLKmode;
> +   continue;
> + }
> if (atype == NULL_TREE
> || !tree_int_cst_equal (TYPE_SIZE (atype),
> TYPE_SIZE (TREE_TYPE (s
> @@ -6359,6 +6367,11 @@ gimple_lower_bitint (void)
>   }
> large_huge.m_vars[p] = create_tmp_var (atype, "bitint");
> mark_addressable (large_huge.m_vars[p]);
> +   /* If BITINT_TYPE is BLKmode, make sure the underlying
> +  variable is BLKmode as well.  */
> +   if (TYPE_MODE (TREE_TYPE (s)) == BLKmode
> +   && DECL_MODE (large_huge.m_vars[p]) != BLKmode)
> + DECL_MODE (large_huge.m_vars[p]) = BLKmode;
>   }
>  }
>  
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] i386: Add -masm=intel profiling support [PR113122]

2024-01-17 Thread Jakub Jelinek
Hi!

x86_function_profiler emits assembly directly into file and only emits
AT syntax.  The following patch adjusts it to emit MASM syntax
if -masm=intel.
As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects.

I've tested using
for i in -mcmodel=large "-mcmodel=large -fpic" "" -fpic "-m32 -fpic" "-m32"; do
./xgcc -B ./ -c -O2 -fprofile $i -masm=att pr113122.c -o pr113122.o1;
./xgcc -B ./ -c -O2 -fprofile $i -masm=intel pr113122.c -o pr113122.o2;
objdump -dr pr113122.o1 > /tmp/1; objdump -dr pr113122.o2 > /tmp/2;
diff -up /tmp/1 /tmp/2; done
that the emitted sequences are identical after assembly.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-18  Jakub Jelinek  

PR target/113122
* config/i386/i386.cc (x86_function_profiler): Add -masm=intel
support.  Add missing space after , in emitted assembly in some
cases.  Formatting fixes.

* gcc.target/i386/pr113122-1.c: New test.
* gcc.target/i386/pr113122-2.c: New test.
* gcc.target/i386/pr113122-3.c: New test.
* gcc.target/i386/pr113122-4.c: New test.

--- gcc/config/i386/i386.cc.jj  2024-01-05 15:22:21.810685516 +0100
+++ gcc/config/i386/i386.cc 2024-01-17 16:52:48.026177278 +0100
@@ -22746,7 +22746,10 @@ x86_function_profiler (FILE *file, int l
   if (TARGET_64BIT)
 {
 #ifndef NO_PROFILE_COUNTERS
-  fprintf (file, "\tleaq\t%sP%d(%%rip),%%r11\n", LPREFIX, labelno);
+  if (ASSEMBLER_DIALECT == ASM_INTEL)
+   fprintf (file, "\tlea\tr11, %sP%d[rip]\n", LPREFIX, labelno);
+  else
+   fprintf (file, "\tleaq\t%sP%d(%%rip), %%r11\n", LPREFIX, labelno);
 #endif
 
   if (!TARGET_PECOFF)
@@ -22757,12 +22760,29 @@ x86_function_profiler (FILE *file, int l
  /* NB: R10 is caller-saved.  Although it can be used as a
 static chain register, it is preserved when calling
 mcount for nested functions.  */
- fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
-  mcount_name);
+ if (ASSEMBLER_DIALECT == ASM_INTEL)
+   fprintf (file, "1:\tmovabs\tr10, OFFSET FLAT:%s\n"
+  "\tcall\tr10\n", mcount_name);
+ else
+   fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
+mcount_name);
  break;
case CM_LARGE_PIC:
 #ifdef NO_PROFILE_COUNTERS
- fprintf (file, "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, %%r11\n");
+ if (ASSEMBLER_DIALECT == ASM_INTEL)
+   {
+ fprintf (file, "1:movabs\tr11, "
+"OFFSET FLAT:_GLOBAL_OFFSET_TABLE_-1b\n");
+ fprintf (file, "\tlea\tr10, 1b[rip]\n");
+ fprintf (file, "\tadd\tr10, r11\n");
+ fprintf (file, "\tmovabs\tr11, OFFSET FLAT:%s@PLTOFF\n",
+  mcount_name);
+ fprintf (file, "\tadd\tr10, r11\n");
+ fprintf (file, "\tcall\tr10\n");
+ break;
+   }
+ fprintf (file,
+  "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, %%r11\n");
  fprintf (file, "\tleaq\t1b(%%rip), %%r10\n");
  fprintf (file, "\taddq\t%%r11, %%r10\n");
  fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n", mcount_name);
@@ -22776,7 +22796,12 @@ x86_function_profiler (FILE *file, int l
case CM_MEDIUM_PIC:
  if (!ix86_direct_extern_access)
{
- fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", 
mcount_name);
+ if (ASSEMBLER_DIALECT == ASM_INTEL)
+   fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]",
+mcount_name);
+ else
+   fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n",
+mcount_name);
  break;
}
  /* fall through */
@@ -22791,23 +22816,37 @@ x86_function_profiler (FILE *file, int l
   else if (flag_pic)
 {
 #ifndef NO_PROFILE_COUNTERS
-  fprintf (file, "\tleal\t%sP%d@GOTOFF(%%ebx),%%" PROFILE_COUNT_REGISTER 
"\n",
-  LPREFIX, labelno);
+  if (ASSEMBLER_DIALECT == ASM_INTEL)
+   fprintf (file,
+"\tlea\t" PROFILE_COUNT_REGISTER ", %sP%d@GOTOFF[ebx]\n",
+LPREFIX, labelno);
+  else
+   fprintf (file,
+"\tleal\t%sP%d@GOTOFF(%%ebx), %%" PROFILE_COUNT_REGISTER "\n",
+LPREFIX, labelno);
 #endif
-  fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
+  if (ASSEMBLER_DIALECT == ASM_INTEL)
+   fprintf (file, "1:\tcall\t[DWORD PTR %s@GOT[ebx]]\n", mcount_name);
+  else
+   fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
 }
   else
 {
 #ifndef NO_PROFILE_COUNTERS
-  fprintf (file, "\tmovl\t$%sP%d,%%" 

Re: [pushed][PATCH v2] LoongArch: testsuite:Fix fail in gen-vect-{2,25}.c file.

2024-01-17 Thread chenglulu

Pushed to r14-8204.

在 2024/1/13 下午3:28, chenxiaolong 写道:

1.Added  dg-do compile on LoongArch.
   When binutils does not support vector instruction sets, an error occurs
because the assembler does not recognize vector instructions.

2.Added "-mlsx" option for vectorization on LoongArch.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/gen-vect-2.c: Added detection of compilation
behavior and "-mlsx" option on LoongArch.
* gcc.dg/tree-ssa/gen-vect-25.c: Dito.
---
  gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c  | 2 ++
  gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c | 2 ++
  2 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
index b84f3184427..a35999a172a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
@@ -1,6 +1,8 @@
  /* { dg-do run { target vect_cmdline_needed } } */
+/* { dg-do compile { target { loongarch_sx && {! loongarch_sx_hw } } } } */
  /* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize 
-fdump-tree-vect-details -fvect-cost-model=dynamic" } */
  /* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } */
  
  #include 
  
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c

index 18fe1aa1502..9f14a54c413 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c
@@ -1,6 +1,8 @@
  /* { dg-do run { target vect_cmdline_needed } } */
+/* { dg-do compile { target { loongarch_sx && {! loongarch_sx_hw } } } } */
  /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details 
-fvect-cost-model=dynamic" } */
  /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details 
-fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } */
  
  #include 
  




Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-01-17 Thread Matthias Kretz
On Sunday, 10 December 2023 14:29:45 CET Richard Sandiford wrote:
> Thanks for the patch and sorry for the slow review.

Sorry for my slow reaction. I needed a long vacation. For now I'll focus on 
the design question wrt. multi-arch compilation.

> I can only comment on the usage of SVE, rather than on the scaffolding
> around it.  Hopefully Jonathan or others can comment on the rest.

That's very useful!

> The main thing that worries me is:
> 
> #if _GLIBCXX_SIMD_HAVE_SVE
> constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS/8;
> #else
> constexpr inline int __sve_vectorized_size_bytes = 0;
> #endif
> 
> Although -msve-vector-bits is currently a per-TU setting, that isn't
> necessarily going to be the case in future.

This is a topic that I care about a lot... as simd user, implementer, and WG21 
proposal author. Are you thinking of a plan to implement the target_clones 
function attribute for different SVE lengths? Or does it go further than that? 
PR83875 is raising the same issue and solution ideas for x86. If I understand 
your concern correctly, then the issue you're raising exists in the same form 
for x86.

If anyone is interested in working on a "translation phase 7 replacement" for 
compiler flags macros I'd be happy to give some input of what I believe is 
necessary to make target_clones work with std(x)::simd. This seems to be about 
constant expressions that return compiler-internal state - probably similar to 
how static reflection needs to work.

For a sketch of a direction: what I'm already doing in 
std::experimental::simd, is to tag all non-always_inline function names with a 
bitflag, representing a relevant subset of -f and -m flags. That way, I'm 
guarding against surprises on linking TUs compiled with different flags.

> Ideally it would be
> possible to define different implementations of a function with
> different (fixed) vector lengths within the same TU.  The value at
> the point that the header file is included is then somewhat arbitrary.
> 
> So rather than have:
> >  using __neon128 = _Neon<16>;
> >  using __neon64 = _Neon<8>;
> > 
> > +using __sve = _Sve<>;
> 
> would it be possible instead to have:
> 
>   using __sve128 = _Sve<128>;
>   using __sve256 = _Sve<256>;
>   ...etc...
> 
> ?  Code specialised for 128-bit vectors could then use __sve128 and
> code specialised for 256-bit vectors could use __sve256.

Hmm, as things stand we'd need two numbers, IIUC:
_Sve

On x86, "NumberOfUsedBytes" is sufficient, because 33-64 implies zmm registers 
(and -mavx512f), 17-32 implies ymm, and <=16 implies xmm (except where it 
doesn't ;) ).

> Perhaps that's not possible as things stand, but it'd be interesting
> to know the exact failure mode if so.  Either way, it would be good to
> remove direct uses of __ARM_FEATURE_SVE_BITS from simd_sve.h if possible,
> and instead rely on information derived from template parameters.

The TS spec requires std::experimental::native_simd to basically give you 
the largest, most efficient, full SIMD register. (And it can't be a sizeless 
type because they don't exist in C++). So how would you do that without 
looking at __ARM_FEATURE_SVE_BITS in the simd implementation?


> It should be possible to use SVE to optimise some of the __neon*
> implementations, which has the advantage of working even for VLA SVE.
> That's obviously a separate patch, though.  Just saying for the record.

I learned that NVidia Grace CPUs alias NEON and SVE registers. But I must 
assume that other SVE implementations (especially those with 
__ARM_FEATURE_SVE_BITS > 128) don't do that and might incur a significant 
latency when going from a NEON register to an SVE register and back (which 
each requires a store-load, IIUC). So are you thinking of implementing 
everything via SVE? That would break ABI, no?

- Matthias

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
 std::simd
──





Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-01-17 Thread Matthias Kretz
On Thursday, 4 January 2024 10:10:12 CET Andrew Pinski wrote:
> I really doubt this would work in the end. Because HW which is 128bits
> only, can't support -msve-vector-bits=2048 . I am thinking
> std::experimental::simd is not the right way of supporting this.
> Really the route the standard should be heading towards is non
> constant at compile time sized vectors instead and then you could use
> the constant sized ones to emulate the Variable length ones.

I don't follow. "non-constant at compile time sized vectors" implies 
sizeless (no constexpr sizeof), no?
Let me try to explain where I'm coming from. One of the motivating use 
cases for simd types is composition. Like this:

template 
struct Point
{
  T x, y, z;

  T distance_to_origin() {
return sqrt(x * x + y * y + z * z);
  }
};

Point is one point in 3D space, Point> stores multiple 
points in 3D space and can work on them in parallel.

This implies that simd must have a sizeof. C++ is unlikely to get 
sizeless types (the discussions were long, there were many papers, ...). 
Should sizeless types in C++ ever happen, then composition is likely going 
to be constrained to the last data member.

With the above as our design constraints, SVE at first seems to be a bad 
fit for implementing std::simd. However, if (at least initially) we accept 
the need for different binaries for different SVE implementations, then you 
can look at the "scalable" part of SVE as an efficient way of reducing the 
number of opcodes necessary for supporting all kinds of different vector 
lengths. But otherwise you can treat it as fixed-size registers - which it 
is for a given CPU. In the case of a multi-CPU shared-memory system (e.g. 
RDMA between different ARM implementations) all you need is a different 
name for incompatible types. So std::simd on SVE256 must have a 
different name on SVE512. Same for std::simd (which is currently 
not the case with Sriniva's patch, I think, and needs to be resolved).

> I think we should not depend on __ARM_FEATURE_SVE_BITS being set here
> and being meanful in any way.

I'd love to. In the same way I'd love to *not depend* on __AVX__, 
__AVX512F__ etc.

- Matthias

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
 std::simd
──


Re: [PATCH v2] test regression fix: Add !vect128 for variable length targets of bb-slp-subgroups-3.c

2024-01-17 Thread Richard Biener
On Thu, 18 Jan 2024, Juzhe-Zhong wrote:

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/bb-slp-subgroups-3.c: Add !vect128.

OK

> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> index fb719915db7..d1d79125731 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> @@ -42,7 +42,7 @@ main (int argc, char **argv)
>  /* Because we disable the cost model, targets with variable-length
> vectors can end up vectorizing the store to a[0..7] on its own.
> With the cost model we do something sensible.  */
> -/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
> target { ! amdgcn-*-* } xfail vect_variable_length } } } */
> +/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
> target { ! amdgcn-*-* } xfail { vect_variable_length && { ! vect128 } } } } } 
> */
>  
>  /* amdgcn can do this in one vector.  */
>  /* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { 
> target amdgcn-*-* } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-17 Thread Jakub Jelinek
Hi!

On aarch64 the backend decides to use non-BLKmode for some arrays
like unsigned long[4] - OImode in that case, but the corresponding
BITINT_TYPEs have BLKmode (like structures containing that many limb
elements).  This both isn't a good idea (we really want such underlying vars
to live in memory and access them there, rather than live in registers and
access their parts in there) and causes ICEs during expansion
(VIEW_CONVERT_EXPR from such OImode array to BLKmode BITINT_TYPE), so the
following patch makes sure such arrays reflect the BLKmode of BITINT_TYPEs
it is accessed with (if any).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-18  Jakub Jelinek  

* gimple-lower-bitint.cc (gimple_lower_bitint): When creating
array VAR_DECL for BITINT_TYPE SSA_NAMEs which have BLKmode, force
DECL_MODE of those vars to be BLKmode as well.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-17 14:43:33.498961304 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-17 14:50:50.252889131 +0100
@@ -6348,7 +6348,15 @@ gimple_lower_bitint (void)
  tree s = ssa_name (i);
  int p = var_to_partition (large_huge.m_map, s);
  if (large_huge.m_vars[p] != NULL_TREE)
-   continue;
+   {
+ /* If BITINT_TYPE is BLKmode, make sure the underlying
+variable is BLKmode as well.  */
+ if (TYPE_MODE (TREE_TYPE (s)) == BLKmode
+ && VAR_P (large_huge.m_vars[p])
+ && DECL_MODE (large_huge.m_vars[p]) != BLKmode)
+   DECL_MODE (large_huge.m_vars[p]) = BLKmode;
+ continue;
+   }
  if (atype == NULL_TREE
  || !tree_int_cst_equal (TYPE_SIZE (atype),
  TYPE_SIZE (TREE_TYPE (s
@@ -6359,6 +6367,11 @@ gimple_lower_bitint (void)
}
  large_huge.m_vars[p] = create_tmp_var (atype, "bitint");
  mark_addressable (large_huge.m_vars[p]);
+ /* If BITINT_TYPE is BLKmode, make sure the underlying
+variable is BLKmode as well.  */
+ if (TYPE_MODE (TREE_TYPE (s)) == BLKmode
+ && DECL_MODE (large_huge.m_vars[p]) != BLKmode)
+   DECL_MODE (large_huge.m_vars[p]) = BLKmode;
}
 }
 

Jakub



Re: [pushed][PATCH] LoongArch: Assign the '/u' attribute to the mem to which the global offset table belongs.

2024-01-17 Thread chenglulu

Pushed to r14-8203.

在 2024/1/13 下午2:37, Lulu Cheng 写道:

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_split_symbol):
Assign the '/u' attribute to the mem.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/got-load.C: New test.
---
  gcc/config/loongarch/loongarch.cc |  5 +
  gcc/testsuite/g++.target/loongarch/got-load.C | 19 +++
  2 files changed, 24 insertions(+)
  create mode 100644 gcc/testsuite/g++.target/loongarch/got-load.C

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3b8559bfdc8..82467474288 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3202,6 +3202,11 @@ loongarch_split_symbol (rtx temp, rtx addr, machine_mode 
mode, rtx *low_out)
  rtx mem = gen_rtx_MEM (Pmode, low);
  *low_out = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, mem),
 UNSPEC_LOAD_FROM_GOT);
+
+ /* Nonzero in a mem, if the memory is statically allocated and
+read-only.  A common example of the later is a shared library’s
+global offset table.  */
+ MEM_READONLY_P (mem) = 1;
}
  
  	  break;

diff --git a/gcc/testsuite/g++.target/loongarch/got-load.C 
b/gcc/testsuite/g++.target/loongarch/got-load.C
new file mode 100644
index 000..20924c73942
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/got-load.C
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand" } */
+/* { dg-final { scan-rtl-dump-times "mem/u" 2 "expand" } } */
+
+#include 
+
+using namespace std;
+
+int lr[15][2];
+
+void
+test(void)
+{
+  int n;
+
+  cin >> n;
+  for (int i = 0; i < n; ++i)
+cin >> lr[i][0] >> lr[i][1];
+}




Re: [PATCH] modula2: Many powerpc platforms do _not_ have support for IEEE754 long double [PR111956]

2024-01-17 Thread Richard Biener
On Thu, Jan 18, 2024 at 1:58 AM Gaius Mulley  wrote:
>
>
> ok for master ?
>
> Bootstrapped on power8 (cfarm135), power9 (cfarm120) and
> x86_64-linux-gnu.

OK.

I wonder what this does to the libm2 ABI?

> ---
>
> This patch corrects commit
> r14-4149-g81d5ca0b9b8431f1bd7a5ec8a2c94f04bb0cf032 which assummed
> all powerpc platforms would have IEEE754 long double.  The patch
> ensures that cc1gm2 obtains the default IEEE754 long double availability
> from the configure generated tm_defines.  The user command
> line switches -mabi=ibmlongdouble and -mabi=ieeelongdouble are implemented
> to override the configuration defaults.
>
> gcc/m2/ChangeLog:
>
> PR modula2/111956
> * Make-lang.in (host_mc_longreal): Remove.
> * configure: Regenerate.
> * configure.ac (M2C_LONGREAL_FLOAT128): Remove.
> (M2C_LONGREAL_PPC64LE): Remove.
> * gm2-compiler/M2Options.def (SetIBMLongDouble): New procedure.
> (GetIBMLongDouble): New procedure function.
> (SetIEEELongDouble): New procedure.
> (GetIEEELongDouble): New procedure function.
> * gm2-compiler/M2Options.mod (SetIBMLongDouble): New procedure.
> (GetIBMLongDouble): New procedure function.
> (SetIEEELongDouble): New procedure.
> (GetIEEELongDouble): New procedure function.
> (InitializeLongDoubleFlags): New procedure called during
> module block initialization.
> * gm2-gcc/m2configure.cc: Remove duplicate includes.
> (m2configure_M2CLongRealFloat128): Remove.
> (m2configure_M2CLongRealIBM128): Remove.
> (m2configure_M2CLongRealLongDouble): Remove.
> (m2configure_M2CLongRealLongDoublePPC64LE): Remove.
> (m2configure_TargetIEEEQuadDefault): New function.
> * gm2-gcc/m2configure.def (M2CLongRealFloat128): Remove.
> (M2CLongRealIBM128): Remove.
> (M2CLongRealLongDouble): Remove.
> (M2CLongRealLongDoublePPC64LE): Remove.
> (TargetIEEEQuadDefault): New function.
> * gm2-gcc/m2configure.h (m2configure_M2CLongRealFloat128): Remove.
> (m2configure_M2CLongRealIBM128): Remove.
> (m2configure_M2CLongRealLongDouble): Remove.
> (m2configure_M2CLongRealLongDoublePPC64LE): Remove.
> (m2configure_TargetIEEEQuadDefault): New function.
> * gm2-gcc/m2options.h (M2Options_SetIBMLongDouble): New prototype.
> (M2Options_GetIBMLongDouble): New prototype.
> (M2Options_SetIEEELongDouble): New prototype.
> (M2Options_GetIEEELongDouble): New prototype.
> * gm2-gcc/m2type.cc (build_m2_long_real_node): Re-implement using
> results of M2Options_GetIBMLongDouble and M2Options_GetIEEELongDouble.
> * gm2-lang.cc (gm2_langhook_handle_option): Add case
> OPT_mabi_ibmlongdouble and call M2Options_SetIBMLongDouble.
> Add case OPT_mabi_ieeelongdouble and call M2Options_SetIEEELongDouble.
> * gm2config.aci.in: Regenerate.
> * gm2spec.cc (lang_specific_driver): Remove block defined by
> M2C_LONGREAL_PPC64LE.
> Remove case OPT_mabi_ibmlongdouble.
> Remove case OPT_mabi_ieeelongdouble.
>
> libgm2/ChangeLog:
>
> PR modula2/111956
> * Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * Makefile.in: Regenerate.
> * libm2cor/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2cor/Makefile.in: Regenerate.
> * libm2iso/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2iso/Makefile.in: Regenerate.
> * libm2log/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2log/Makefile.in: Regenerate.
> * libm2min/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2min/Makefile.in: Regenerate.
> * libm2pim/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
> * libm2pim/Makefile.in: Regenerate.
>
> ---
>
> diff --git a/gcc/m2/Make-lang.in b/gcc/m2/Make-lang.in
> index d7bc7362bbf..45bfa933dca 100644
> --- a/gcc/m2/Make-lang.in
> +++ b/gcc/m2/Make-lang.in
> @@ -98,9 +98,6 @@ GM2_PROG_DEP=gm2$(exeext) xgcc$(exeext) cc1gm2$(exeext)
>
>  include m2/config-make
>
> -# Determine if float128 should represent the Modula-2 type LONGREAL.
> -host_mc_longreal := $(if $(strip $(filter 
> powerpc64le%,$(host))),--longreal=__float128)
> -
>  LIBSTDCXX=../$(TARGET_SUBDIR)/libstdc++-v3/src/.libs/libstdc++.a
>
>  PGE=m2/pge$(exeext)
> @@ -474,8 +471,7 @@ MC_ARGS= --olang=c++ \
>   -I$(srcdir)/m2/gm2-gcc \
>   --quiet \
>   $(MC_COPYRIGHT) \
> - --gcc-config-system \
> - $(host_mc_longreal)
> + --gcc-config-system
>
>  MCDEPS=m2/boot-bin/mc$(exeext)
>
> diff --git a/gcc/m2/configure b/gcc/m2/configure
> index f62f3d8729c..46530970785 100755
> --- a/gcc/m2/configure
> +++ b/gcc/m2/configure
> @@ -3646,24 +3646,6 @@ $as_echo "#define HAVE_OPENDIR 1" >>confdefs.h
>  fi
>
>
> -case $target in #(
> -  powerpc64le*) :
> -
> -$as_echo "#define M2C_LONGREAL_FLOAT128 1" >>confdefs.h
> - ;; #(
> -  *) :
> 

[PATCH v2] test regression fix: Add !vect128 for variable length targets of bb-slp-subgroups-3.c

2024-01-17 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-subgroups-3.c: Add !vect128.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
index fb719915db7..d1d79125731 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
@@ -42,7 +42,7 @@ main (int argc, char **argv)
 /* Because we disable the cost model, targets with variable-length
vectors can end up vectorizing the store to a[0..7] on its own.
With the cost model we do something sensible.  */
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
target { ! amdgcn-*-* } xfail vect_variable_length } } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
target { ! amdgcn-*-* } xfail { vect_variable_length && { ! vect128 } } } } } */
 
 /* amdgcn can do this in one vector.  */
 /* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { 
target amdgcn-*-* } } } */
-- 
2.36.3



Re: [PATCH] sra: Disqualify bases of operands of asm gotos

2024-01-17 Thread Richard Biener
On Wed, 17 Jan 2024, Martin Jambor wrote:

> Hi,
> 
> PR 110422 shows that SRA can ICE assuming there is a single edge
> outgoing from a block terminated with an asm goto.  We need that for
> BB-terminating statements so that any adjustments they make to the
> aggregates can be copied over to their replacements.  Because we can't
> have that after ASM gotos, we need to punt.
> 
> Bootstrapped and tested on x86_64-linux, OK for master?  It will need
> some tweaking for release branches, is it in principle OK for them too
> (after testing)?

OK.

> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2024-01-17  Martin Jambor  
> 
>   PR tree-optimization/110422
>   * tree-sra.cc (scan_function): Disqualify bases of operands of asm
>   gotos.
> 
> gcc/testsuite/ChangeLog:
> 
> 2024-01-17  Martin Jambor  
> 
>   PR tree-optimization/110422
>   * gcc.dg/torture/pr110422.c: New test.
> ---
>  gcc/testsuite/gcc.dg/torture/pr110422.c | 10 +
>  gcc/tree-sra.cc | 29 -
>  2 files changed, 33 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr110422.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr110422.c 
> b/gcc/testsuite/gcc.dg/torture/pr110422.c
> new file mode 100644
> index 000..2e171a7a19e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr110422.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +
> +struct T { int x; };
> +int foo(void) {
> +  struct T v;
> +  asm goto("" : "+r"(v.x) : : : lab);
> +  return 0;
> +lab:
> +  return -5;
> +}
> diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
> index 6a1141b7377..f8e71ec48b9 100644
> --- a/gcc/tree-sra.cc
> +++ b/gcc/tree-sra.cc
> @@ -1559,15 +1559,32 @@ scan_function (void)
>   case GIMPLE_ASM:
> {
>   gasm *asm_stmt = as_a  (stmt);
> - for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
> + if (stmt_ends_bb_p (asm_stmt)
> + && !single_succ_p (gimple_bb (asm_stmt)))
> {
> - t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
> - ret |= build_access_from_expr (t, asm_stmt, false);
> + for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
> +   {
> + t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
> + disqualify_base_of_expr (t, "OP of asm goto.");
> +   }
> + for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
> +   {
> + t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
> + disqualify_base_of_expr (t, "OP of asm goto.");
> +   }
> }
> - for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
> + else
> {
> - t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
> - ret |= build_access_from_expr (t, asm_stmt, true);
> + for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
> +   {
> + t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
> + ret |= build_access_from_expr (t, asm_stmt, false);
> +   }
> + for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
> +   {
> + t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
> + ret |= build_access_from_expr (t, asm_stmt, true);
> +   }
> }
> }
> break;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-17 Thread chenglulu





gcc.dg/tree-ssa/scev-16.c is OK to move
gcc.dg/pr104992.c should simply add -fno-tree-vectorize to the used
options and remove the vect_* stuff


Hi Richard:

I have a question. I don't understand the purpose of adding 
'-fno-tree-vectorize' here.


Thanks!



[PATCH] i386: Default to -mcet-switch [PR104816]

2024-01-17 Thread Fangrui Song
When -fcf-protection=branch is used, with the current -mno-cet-switch
default, a NOTRACK indirect jump is generated for jump tables, which can
target a non-ENDBR instruction.  However, the overwhelming opinion is to
avoid NOTRACK (PR104816) to improve safety.  Projects such as Linux
kernel and Xen even specify -fno-jump-table to avoid NOTRACK. Therefore,
let's change the default.

Note, for `default: __builtin_unreachable()`, LLVM AArch64 even made a
decision (https://reviews.llvm.org/D155485) to keep the range check,
which can otherwise be optimized out.  This reinforces the opinion that
people want protection for jump tables.

#define DO A(0) A(1) A(2) A(3) A(4) A(5) A(6) A(7) A(8) A(9) A(10) A(11) 
A(12) A(13)
#define A(i) void bar##i();
DO
#undef A
void ext();
void foo(int i) {
  switch (i) {
#define A(i) case i: bar##i(); break;
DO
// -mbranch-protection=bti causes Clang AArch64 to keep the i <= 13 range 
check
  default: __builtin_unreachable();
  }
  ext();
}

gcc/ChangeLog:

PR target/104816
* config/i386/i386.opt: Default to -mcet-switch.
* doc/invoke.texi: Update doc.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cet-switch-1.c: Add -mno-cet-switch.
* gcc.target/i386/cet-switch-2.c: Remove -mcet-switch to check the
  default.
---
 gcc/config/i386/i386.opt |  2 +-
 gcc/doc/invoke.texi  | 19 +--
 gcc/testsuite/gcc.target/i386/cet-switch-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/cet-switch-2.c |  2 +-
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 5b4f1bff25f..0e168f3c07a 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1074,7 +1074,7 @@ Enable shadow stack built-in functions from Control-flow 
Enforcement
 Technology (CET).
 
 mcet-switch
-Target Var(flag_cet_switch) Init(0)
+Target Var(flag_cet_switch) Init(1)
 Turn on CET instrumentation for switch statements that use a jump table and
 an indirect jump.
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 16e31a3c6db..720be71f8fa 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1455,7 +1455,7 @@ See RS/6000 and PowerPC Options.
 -msse4a  -m3dnow  -m3dnowa  -mpopcnt  -mabm  -mbmi  -mtbm  -mfma4  -mxop
 -madx  -mlzcnt  -mbmi2  -mfxsr  -mxsave  -mxsaveopt  -mrtm  -mhle  -mlwp
 -mmwaitx  -mclzero  -mpku  -mthreads  -mgfni  -mvaes  -mwaitpkg
--mshstk -mmanual-endbr -mcet-switch -mforce-indirect-call
+-mshstk -mmanual-endbr -mno-cet-switch -mforce-indirect-call
 -mavx512vbmi2 -mavx512bf16 -menqcmd
 -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b  -mavx512vpopcntdq
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid
@@ -34886,16 +34886,15 @@ function attribute. This is useful when used with the 
option
 @option{-fcf-protection=branch} to control ENDBR insertion at the
 function entry.
 
+@opindex mno-cet-switch
 @opindex mcet-switch
-@item -mcet-switch
-By default, CET instrumentation is turned off on switch statements that
-use a jump table and indirect branch track is disabled.  Since jump
-tables are stored in read-only memory, this does not result in a direct
-loss of hardening.  But if the jump table index is attacker-controlled,
-the indirect jump may not be constrained by CET.  This option turns on
-CET instrumentation to enable indirect branch track for switch statements
-with jump tables which leads to the jump targets reachable via any indirect
-jumps.
+@item -mno-cet-switch
+When @option{-fcf-protection=branch} is enabled, by default, switch statements
+that use a jump table are instrumented to use ENDBR instructions and constrain
+the indirect jump with CET to protect against an attacker-controlled jump table
+index.  @option{-mno-cet-switch} generates a NOTRACK indirect jump and removes
+ENDBR instructions, which may make the jump table smaller at the cost of an
+unprotected indirect jump.
 
 @opindex mcall-ms2sysv-xlogues
 @opindex mno-call-ms2sysv-xlogues
diff --git a/gcc/testsuite/gcc.target/i386/cet-switch-1.c 
b/gcc/testsuite/gcc.target/i386/cet-switch-1.c
index afe5adc2f3d..4931c3ad1d2 100644
--- a/gcc/testsuite/gcc.target/i386/cet-switch-1.c
+++ b/gcc/testsuite/gcc.target/i386/cet-switch-1.c
@@ -1,6 +1,6 @@
 /* Verify that CET works.  */
 /* { dg-do compile } */
-/* { dg-options "-O -fcf-protection" } */
+/* { dg-options "-O -fcf-protection -mno-cet-switch" } */
 /* { dg-final { scan-assembler-times "endbr32" 1 { target ia32 } } } */
 /* { dg-final { scan-assembler-times "endbr64" 1 { target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-times "notrack jmp\[ \t]+\[*]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/cet-switch-2.c 
b/gcc/testsuite/gcc.target/i386/cet-switch-2.c
index 69ddc6fd5b7..11578d1a30c 100644
--- a/gcc/testsuite/gcc.target/i386/cet-switch-2.c
+++ b/gcc/testsuite/gcc.target/i386/cet-switch-2.c
@@ -1,6 +1,6 @@
 /* Verify 

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-17 Thread Kewen.Lin
Hi David,

on 2024/1/18 09:27, David Edelsohn wrote:
> If the fixes remove the failures on AIX, then the patch to disable the tests 
> also can be reverted.
> 

Since I didn't find strub-unsupported*.c failed on ppc64 linux, to ensure it's
related, I reverted your commit r14-6838 and my fix r14-7089 locally and 
supposed
to see those test cases failed on aix, but they passed.  Then I tried to reset
the repo to r14-6275 which added those test cases, and supposed to see they 
failed,
then they still passed.  Not sure if I missed something in the testing, could 
you
kindly double check if those test cases started to fail from r14-6275 on your
env? or some other specific commit?  Or maybe directly verify if they can pass
on latest trunk with r14-6838 reverted.  Just to ensure the reverting matches
our expectation.  Thanks in advance!

btw, the command I used to test on aix is:
make check-gcc RUNTESTFLAGS="--target_board=unix'{-m64,-m32}' 
dg.exp=strub-unsupported*.c"

BR,
Kewen
 
> Thanks, David
> 
> 
> On Wed, Jan 17, 2024 at 8:06 PM Alexandre Oliva  > wrote:
> 
> David,
> 
> On Jan  7, 2024, "Kewen.Lin"  > wrote:
> 
> > As PR113100 shows, the unbiasing introduced by r14-6737 can
> > cause the scrubbing to overrun and screw some critical data
> > on stack like saved toc base consequently cause segfault on
> > Power.
> 
> I suppose this problem that Kewen fixed (thanks) was what caused you to
> install commit r14-6838.  According to posted test results, strub worked
> on AIX until Dec 20, when the fixes for sparc that broke strub on ppc
> went in.
> 
> I can't seem to find the email in which you posted the patch, and I'd
> have appreciated if you'd copied me.  I wouldn't have missed it for so
> long if you had.  Since I couldn't find that patch, I'm responding in
> this thread instead.
> 
> The r14-6838 patch is actually very very broken.  Disabling strub on a
> target is not a matter of changing only the testsuite.  Your additions
> to the tests even broke the strub-unsupported testcases, that tested
> exactly the feature that enables ports to disable strub in a way that
> informs users in case they attempt to use it.
> 
> I'd thus like to revert that patch.
> 
> Kewen's patch needs a little additional cleanup, that I'm preparing now,
> to restore fully-functioning strub on sparc32.
> 
> Please let me know in case you observe any other problems related with
> strub.  I'd be happy to fix them, but I can only do so once I'm aware of
> them.
> 
> In case the reversal or the upcoming cleanup has any negative impact,
> please make sure you let me know.
> 
> Thanks,
> 
> Happy GNU Year!
> 
> -- 
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/ 
> 
>    Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
> 


[committed] testsuite, rs6000: Adjust fold-vec-extract-char.p7.c [PR111850]

2024-01-17 Thread Kewen.Lin
Hi,

As PR101169 comment #c4 shows, previously the addi count
update on fold-vec-extract-char.p7.c covered a sub-optimal
code gen issue.  On trunk, pass fold-mem-offsets helps to
recover the best code sequence, so this patch is to
revert the count back to the original which matches the
optimal addi count.

Tested well on powerpc64-linux-gnu P8/P9,
powerpc64le-linux-gnu P9/P10 and powerpc-ibm-aix.

Pushed as r14-8201.

PR testsuite/111850

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fold-vec-extract-char.p7.c: Update the
checking count of addi to 6.
---
 gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
index 29a8aa84db2..42599c214e4 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
@@ -11,7 +11,7 @@
 /* one extsb (extend sign-bit) instruction generated for each test against
unsigned types */

-/* { dg-final { scan-assembler-times {\maddi\M} 9 } } */
+/* { dg-final { scan-assembler-times {\maddi\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mli\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mstxvw4x\M|\mstvx\M|\mstxv\M} 6 } } */
 /* -m32 target uses rlwinm in place of rldicl. */
--
2.34.1


Re: [PATCH] c++: ICE when xobj is not the first parm [PR113389]

2024-01-17 Thread Jason Merrill

On 1/17/24 20:17, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
In grokdeclarator/cdk_function the comment says that the find_xobj_parm
lambda clears TREE_PURPOSE so that we can correctly detect an xobj that
is not the first parameter.  That's all good, but we should also clear
the TREE_PURPOSE once we've given the error, otherwise we crash later in
check_default_argument because the 'this' TREE_PURPOSE lacks a type.

PR c++/113389

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator) : Set TREE_PURPOSE to
NULL_TREE when emitting an error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-diagnostics10.C: New test.
---
  gcc/cp/decl.cc  | 1 +
  gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C | 8 
  2 files changed, 9 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 322e48dee2e..3e41fd4fa31 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -13391,6 +13391,7 @@ grokdeclarator (const cp_declarator *declarator,
  if (TREE_PURPOSE (parm) != this_identifier)
continue;
  bad_xobj_parm_encountered = true;
+ TREE_PURPOSE (parm) = NULL_TREE;
  gcc_rich_location bad_xobj_parm
(DECL_SOURCE_LOCATION (TREE_VALUE (parm)));
  error_at (_xobj_parm,
diff --git a/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C 
b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C
new file mode 100644
index 000..354823db166
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C
@@ -0,0 +1,8 @@
+// PR c++/113389
+// { dg-do compile { target c++23 } }
+
+struct A {
+  void foo(A, this A); // { dg-error "only the first parameter" }
+  void qux(A, this A,  // { dg-error "only the first parameter" }
+  this A); // { dg-error "only the first parameter" }
+};

base-commit: 4a8430c8c3abb1c2c14274105b3a621100f251a2




Re: [PATCH] hwasan: Check if Intel LAM_U57 is enabled

2024-01-17 Thread Hongtao Liu
On Wed, Jan 10, 2024 at 12:47 AM H.J. Lu  wrote:
>
> When -fsanitize=hwaddress is used, libhwasan will try to enable LAM_U57
> in the startup code.  Update the target check to enable hwaddress tests
> if LAM_U57 is enabled.  Also compile hwaddress tests with -mlam=u57 on
> x86-64 since hwasan requires LAM_U57 on x86-64.
I've tested it on lam enabled SRF, and it passed all hwasan testcases
except below

FAIL: c-c++-common/hwasan/alloca-outside-caught.c   -O0  output pattern test
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O1
scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O2
scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O3 -g
scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -Os
scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times bl
s*__hwasan_tag_mismatch4 1
FAIL: c-c++-common/hwasan/vararray-outside-caught.c   -O0  output pattern test

Basically they're testcase issues, the testcases needs to be adjusted
for x86/ I'll commit a separate patch for those after this commit is
upstream.
Also I've also tested the patch on lam unsupported platforms, all
hwasan testcases shows unsupported.
So the patch LGTM.

>
> * lib/hwasan-dg.exp (check_effective_target_hwaddress_exec):
> Return 1 if Intel LAM_U57 is enabled.
> (hwasan_init): Add -mlam=u57 on x86-64.
> ---
>  gcc/testsuite/lib/hwasan-dg.exp | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/lib/hwasan-dg.exp b/gcc/testsuite/lib/hwasan-dg.exp
> index e9c5ef6524d..76057502ee6 100644
> --- a/gcc/testsuite/lib/hwasan-dg.exp
> +++ b/gcc/testsuite/lib/hwasan-dg.exp
> @@ -44,11 +44,25 @@ proc check_effective_target_hwaddress_exec {} {
> #ifdef __cplusplus
> extern "C" {
> #endif
> +   extern int arch_prctl (int, unsigned long int *);
> extern int prctl(int, unsigned long, unsigned long, unsigned long, 
> unsigned long);
> #ifdef __cplusplus
> }
> #endif
> int main (void) {
> +   #ifdef __x86_64__
> +   # ifdef __LP64__
> +   #  define ARCH_GET_UNTAG_MASK 0x4001
> +   #  define LAM_U57_MASK (0x3fULL << 57)
> + unsigned long mask = 0;
> + if (arch_prctl(ARCH_GET_UNTAG_MASK, ) != 0)
> +   return 1;
> + if (mask != ~LAM_U57_MASK)
> +   return 1;
> + return 0;
> +   # endif
> + return 1;
> +   #else
> #define PR_SET_TAGGED_ADDR_CTRL 55
> #define PR_GET_TAGGED_ADDR_CTRL 56
> #define PR_TAGGED_ADDR_ENABLE (1UL << 0)
> @@ -58,6 +72,7 @@ proc check_effective_target_hwaddress_exec {} {
>   || !prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0))
> return 1;
>   return 0;
> +   #endif
> }
>  }] {
> return 0;
> @@ -102,6 +117,10 @@ proc hwasan_init { args } {
>
>  setenv HWASAN_OPTIONS "random_tags=0"
>
> +if [istarget x86_64-*-*] {
> +  set target_hwasan_flags "-mlam=u57"
> +}
> +
>  set link_flags ""
>  if ![is_remote host] {
> if [info exists TOOL_OPTIONS] {
> @@ -119,12 +138,12 @@ proc hwasan_init { args } {
>  if [info exists ALWAYS_CXXFLAGS] {
> set hwasan_saved_ALWAYS_CXXFLAGS $ALWAYS_CXXFLAGS
> set ALWAYS_CXXFLAGS [concat "{ldflags=$link_flags}" $ALWAYS_CXXFLAGS]
> -   set ALWAYS_CXXFLAGS [concat "{additional_flags=-fsanitize=hwaddress 
> --param hwasan-random-frame-tag=0 -g $include_flags}" $ALWAYS_CXXFLAGS]
> +   set ALWAYS_CXXFLAGS [concat "{additional_flags=-fsanitize=hwaddress 
> $target_hwasan_flags --param hwasan-random-frame-tag=0 -g $include_flags}" 
> $ALWAYS_CXXFLAGS]
>  } else {
> if [info exists TEST_ALWAYS_FLAGS] {
> -   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress --param 
> hwasan-random-frame-tag=0 -g $include_flags $TEST_ALWAYS_FLAGS"
> +   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress 
> $target_hwasan_flags --param hwasan-random-frame-tag=0 -g $include_flags 
> $TEST_ALWAYS_FLAGS"
> } else {
> -   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress --param 
> hwasan-random-frame-tag=0 -g $include_flags"
> +   set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=hwaddress 
> $target_hwasan_flags --param hwasan-random-frame-tag=0 -g $include_flags"
> }
>  }
>  }
> --
> 2.43.0
>


-- 
BR,
Hongtao


Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-17 Thread Greg McGary
On Tue, Jan 16, 2024 at 11:44 PM Richard Biener 
wrote:

> > On Tue, Jan 16, 2024 at 11:20 PM Greg McGary  wrote:

> > >

> > > The sign bit of a sign-extending load cannot be known until runtime,

> > > so don't attempt to simplify it in the combiner.

> >
> It feels like this papers over an issue downstream?

While the code comment is true, perhaps it obscures the primary intent,
which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is destined
to expand into a single memory-load instruction and no simplification is
possible, so why waste time with further analysis or transformation? There
are plenty of other conditions that also short circuit to "do nothing" and
this seems just as straightforward as those others. Efforts to catch this
further downstream add gratuitous complexity.

G


Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Palmer Dabbelt

On Wed, 17 Jan 2024 19:19:58 PST (-0800), monk.chi...@sifive.com wrote:

Thanks for your advice!! I agree it should be fixed in the RISC-V backend
when expansion.


On Wed, Jan 17, 2024 at 10:37 PM Jeff Law  wrote:




On 1/17/24 05:14, Richard Biener wrote:
> On Wed, 17 Jan 2024, Monk Chiang wrote:
>
>> This allows the backend to generate movcc instructions, if target
>> machine has movcc pattern.
>>
>> branchless-cond.c needs to be updated since some target machines have
>> conditional move instructions, and the experssion will not change to
>> branchless expression.
>
> While I agree this pattern should possibly be applied during RTL
> expansion or instruction selection on x86 which also has movcc
> the multiplication is cheaper.  So I don't think this isn't the way to
go.
>
> I'd rather revert the change than trying to "fix" it this way?
WRT reverting -- the patch in question's sole purpose was to enable
branchless sequences for that very same code.  Reverting would regress
performance on a variety of micro-architectures.  IIUC, the issue is
that the SiFive part in question has a fusion which allows it to do the
branchy sequence cheaply.

ISTM this really needs to be addressed during expansion and most likely
with a RISC-V target twiddle for the micro-archs which have
short-forward-branch optimizations.


IIRC I ran into some of these middle-end interactions a year or two ago 
and determined that we'd need middle-end changes to get this working 
smoothly -- essentially replacing the expander checks for a MOVCC insn  
with some sort of costing.


Without that, we're just going to end up with some missed optimizations 
that favor one way or the other.




jeff



Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Monk Chiang
Thanks for your advice!! I agree it should be fixed in the RISC-V backend
when expansion.


On Wed, Jan 17, 2024 at 10:37 PM Jeff Law  wrote:

>
>
> On 1/17/24 05:14, Richard Biener wrote:
> > On Wed, 17 Jan 2024, Monk Chiang wrote:
> >
> >> This allows the backend to generate movcc instructions, if target
> >> machine has movcc pattern.
> >>
> >> branchless-cond.c needs to be updated since some target machines have
> >> conditional move instructions, and the experssion will not change to
> >> branchless expression.
> >
> > While I agree this pattern should possibly be applied during RTL
> > expansion or instruction selection on x86 which also has movcc
> > the multiplication is cheaper.  So I don't think this isn't the way to
> go.
> >
> > I'd rather revert the change than trying to "fix" it this way?
> WRT reverting -- the patch in question's sole purpose was to enable
> branchless sequences for that very same code.  Reverting would regress
> performance on a variety of micro-architectures.  IIUC, the issue is
> that the SiFive part in question has a fusion which allows it to do the
> branchy sequence cheaply.
>
> ISTM this really needs to be addressed during expansion and most likely
> with a RISC-V target twiddle for the micro-archs which have
> short-forward-branch optimizations.
>
> jeff
>


[PATCH] libstdc++: Fix constexpr _Safe_iterator in C++20 mode

2024-01-17 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

Some _Safe_iterator member functions define a variable of non-literal
type __gnu_cxx::__scoped_lock, which automatically disqualifies them from
being constexpr in C++20 mode even if that code path is never constant
evaluated.  This restriction was lifted by P2242R3 for C++23, but we
need to work around it in C++20 mode.  To that end this patch defines
a pair of macros that encapsulate the lambda-based workaround mentioned
in that paper and uses them to make the functions valid C++20 constexpr
functions.  The augmented std::vector test element_access/constexpr.cc
now successfully compiles in C++20 mode with -D_GLIBCXX_DEBUG (and it
tests all modified member functions).

libstdc++-v3/ChangeLog:

* include/debug/safe_base.h (_Safe_sequence_base::_M_swap):
Remove _GLIBCXX20_CONSTEXPR.
* include/debug/safe_iterator.h 
(_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN):
(_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END): Define.
(_Safe_iterator::operator=): Use them around the code path that
defines a variable of type __gnu_cxx::__scoped_lock.
(_Safe_iterator::operator++): Likewise.
(_Safe_iterator::operator--): Likewise.
(_Safe_iterator::operator+=): Likewise.
(_Safe_iterator::operator-=): Likewise.
* testsuite/23_containers/vector/element_access/constexpr.cc
(test_iterators): Also test copy and move assignment.
* testsuite/std/ranges/adaptors/all.cc (test08) [_GLIBCXX_DEBUG]:
Use std::vector unconditionally.
---
 libstdc++-v3/include/debug/safe_base.h|  1 -
 libstdc++-v3/include/debug/safe_iterator.h| 48 ++-
 .../vector/element_access/constexpr.cc|  2 +
 .../testsuite/std/ranges/adaptors/all.cc  |  4 --
 4 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/libstdc++-v3/include/debug/safe_base.h 
b/libstdc++-v3/include/debug/safe_base.h
index 107fef3cb02..d5fbe4b1320 100644
--- a/libstdc++-v3/include/debug/safe_base.h
+++ b/libstdc++-v3/include/debug/safe_base.h
@@ -268,7 +268,6 @@ namespace __gnu_debug
  *  operation is complete all iterators that originally referenced
  *  one container now reference the other container.
  */
-_GLIBCXX20_CONSTEXPR
 void
 _M_swap(_Safe_sequence_base& __x) _GLIBCXX_USE_NOEXCEPT;
 
diff --git a/libstdc++-v3/include/debug/safe_iterator.h 
b/libstdc++-v3/include/debug/safe_iterator.h
index 1bc7c904ee0..929fd9b0ade 100644
--- a/libstdc++-v3/include/debug/safe_iterator.h
+++ b/libstdc++-v3/include/debug/safe_iterator.h
@@ -65,6 +65,20 @@
   _GLIBCXX_DEBUG_VERIFY_OPERANDS(_Lhs, _Rhs, __msg_distance_bad,   \
 __msg_distance_different)
 
+// This pair of macros helps with writing valid C++20 constexpr functions that
+// contain a non-constexpr code path that defines a non-literal variable, which
+// was otherwise disallowed until P2242R3 for C++23.  We use them below for
+// __gnu_cxx::__scoped_lock so that the containing functions are still
+// considered valid C++20 constexpr functions.
+
+#if __cplusplus >= 202002L && __cpp_constexpr < 202110L
+# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN [&]() -> void { do
+# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END while(false); }();
+#else
+# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN
+# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END
+#endif
+
 namespace __gnu_debug
 {
   /** Helper struct to deal with sequence offering a before_begin
@@ -266,11 +280,11 @@ namespace __gnu_debug
  ._M_iterator(__x, "other"));
 
if (this->_M_sequence && this->_M_sequence == __x._M_sequence)
- {
+ _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
base() = __x.base();
_M_version = __x._M_sequence->_M_version;
- }
+ } _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END
else
  {
_M_detach();
@@ -306,11 +320,11 @@ namespace __gnu_debug
  return *this;
 
if (this->_M_sequence && this->_M_sequence == __x._M_sequence)
- {
+ _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
base() = __x.base();
_M_version = __x._M_sequence->_M_version;
- }
+ } _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END
else
  {
_M_detach();
@@ -378,8 +392,10 @@ namespace __gnu_debug
_GLIBCXX_DEBUG_VERIFY(this->_M_incrementable(),
  _M_message(__msg_bad_inc)
  ._M_iterator(*this, "this"));
-   __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
-   ++base();
+   _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
+ __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
+ ++base();
+   

[COMMITTED] Document negative forms of -Wtsan and -Wxor-used-as-pow [PR110847]

2024-01-17 Thread Sandra Loosemore
These warnings are enabled by default, thus the manual should document the
-no form instead of the positive form.

gcc/ChangeLog
PR middle-end/110847
* doc/invoke.texi (Option Summary): Document negative forms of
-Wtsan and -Wxor-used-as-pow.
(Warning Options): Likewise.
---
 gcc/doc/invoke.texi | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a537be66736..4d43dda9839 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -410,7 +410,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wswitch  -Wno-switch-bool  -Wswitch-default  -Wswitch-enum
 -Wno-switch-outside-range  -Wno-switch-unreachable  -Wsync-nand
 -Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs
--Wtrivial-auto-var-init -Wtsan -Wtype-limits  -Wundef
+-Wtrivial-auto-var-init  -Wno-tsan  -Wtype-limits  -Wundef
 -Wuninitialized  -Wunknown-pragmas
 -Wunsuffixed-float-constants  -Wunused
 -Wunused-but-set-parameter  -Wunused-but-set-variable
@@ -424,7 +424,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wvector-operation-performance
 -Wvla  -Wvla-larger-than=@var{byte-size}  -Wno-vla-larger-than
 -Wvolatile-register-var  -Wwrite-strings
--Wxor-used-as-pow
+-Wno-xor-used-as-pow
 -Wzero-length-bounds}
 
 @item Static Analyzer Options
@@ -9090,14 +9090,13 @@ This warning is enabled by default.
 
 @opindex Wtsan
 @opindex Wno-tsan
-@item -Wtsan
-Warn about unsupported features in ThreadSanitizer.
+@item -Wno-tsan
+
+Disable warnings about unsupported features in ThreadSanitizer.
 
 ThreadSanitizer does not support @code{std::atomic_thread_fence} and
 can report false positives.
 
-This warning is enabled by default.
-
 @opindex Wtype-limits
 @opindex Wno-type-limits
 @item -Wtype-limits
@@ -10434,17 +10433,18 @@ and/or writes to register variables.  This warning is 
enabled by
 
 @opindex Wxor-used-as-pow
 @opindex Wno-xor-used-as-pow
-@item -Wxor-used-as-pow @r{(C, C++, Objective-C and Objective-C++ only)}
-Warn about uses of @code{^}, the exclusive or operator, where it appears
-the user meant exponentiation.  Specifically, the warning occurs when the
+@item -Wno-xor-used-as-pow @r{(C, C++, Objective-C and Objective-C++ only)}
+Disable warnings about uses of @code{^}, the exclusive or operator,
+where it appears the code meant exponentiation.
+Specifically, the warning occurs when the
 left-hand side is the decimal constant 2 or 10 and the right-hand side
 is also a decimal constant.
 
 In C and C++, @code{^} means exclusive or, whereas in some other languages
 (e.g. TeX and some versions of BASIC) it means exponentiation.
 
-This warning is enabled by default.  It can be silenced by converting one
-of the operands to hexadecimal.
+This warning can be silenced by converting one of the operands to
+hexadecimal as well as by compiling with @option{-Wno-xor-used-as-pow}.
 
 @opindex Wdisabled-optimization
 @opindex Wno-disabled-optimization
-- 
2.31.1



Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-17 Thread David Edelsohn
If the fixes remove the failures on AIX, then the patch to disable the
tests also can be reverted.

Thanks, David


On Wed, Jan 17, 2024 at 8:06 PM Alexandre Oliva  wrote:

> David,
>
> On Jan  7, 2024, "Kewen.Lin"  wrote:
>
> > As PR113100 shows, the unbiasing introduced by r14-6737 can
> > cause the scrubbing to overrun and screw some critical data
> > on stack like saved toc base consequently cause segfault on
> > Power.
>
> I suppose this problem that Kewen fixed (thanks) was what caused you to
> install commit r14-6838.  According to posted test results, strub worked
> on AIX until Dec 20, when the fixes for sparc that broke strub on ppc
> went in.
>
> I can't seem to find the email in which you posted the patch, and I'd
> have appreciated if you'd copied me.  I wouldn't have missed it for so
> long if you had.  Since I couldn't find that patch, I'm responding in
> this thread instead.
>
> The r14-6838 patch is actually very very broken.  Disabling strub on a
> target is not a matter of changing only the testsuite.  Your additions
> to the tests even broke the strub-unsupported testcases, that tested
> exactly the feature that enables ports to disable strub in a way that
> informs users in case they attempt to use it.
>
> I'd thus like to revert that patch.
>
> Kewen's patch needs a little additional cleanup, that I'm preparing now,
> to restore fully-functioning strub on sparc32.
>
> Please let me know in case you observe any other problems related with
> strub.  I'd be happy to fix them, but I can only do so once I'm aware of
> them.
>
> In case the reversal or the upcoming cleanup has any negative impact,
> please make sure you let me know.
>
> Thanks,
>
> Happy GNU Year!
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>


[PATCH] c++: ICE when xobj is not the first parm [PR113389]

2024-01-17 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In grokdeclarator/cdk_function the comment says that the find_xobj_parm
lambda clears TREE_PURPOSE so that we can correctly detect an xobj that
is not the first parameter.  That's all good, but we should also clear
the TREE_PURPOSE once we've given the error, otherwise we crash later in
check_default_argument because the 'this' TREE_PURPOSE lacks a type.

PR c++/113389

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator) : Set TREE_PURPOSE to
NULL_TREE when emitting an error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-diagnostics10.C: New test.
---
 gcc/cp/decl.cc  | 1 +
 gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C | 8 
 2 files changed, 9 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 322e48dee2e..3e41fd4fa31 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -13391,6 +13391,7 @@ grokdeclarator (const cp_declarator *declarator,
  if (TREE_PURPOSE (parm) != this_identifier)
continue;
  bad_xobj_parm_encountered = true;
+ TREE_PURPOSE (parm) = NULL_TREE;
  gcc_rich_location bad_xobj_parm
(DECL_SOURCE_LOCATION (TREE_VALUE (parm)));
  error_at (_xobj_parm,
diff --git a/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C 
b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C
new file mode 100644
index 000..354823db166
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/explicit-obj-diagnostics10.C
@@ -0,0 +1,8 @@
+// PR c++/113389
+// { dg-do compile { target c++23 } }
+
+struct A {
+  void foo(A, this A); // { dg-error "only the first parameter" }
+  void qux(A, this A,  // { dg-error "only the first parameter" }
+  this A); // { dg-error "only the first parameter" }
+};

base-commit: 4a8430c8c3abb1c2c14274105b3a621100f251a2
-- 
2.43.0



[Committed V3] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong
V3: Rebase to trunk and commit it.

This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible 
check
for conflict vsetvl fusion.

Buggy assembler before this patch:

.L69:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
j   .L37
.L68:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
addia3,a5,8
vmv.v.i v1,0
vse8.v  v1,0(a5)
vse8.v  v1,0(a3)
addia4,a4,-16
li  a3,8
bltua4,a3,.L37
j   .L69
.L67:
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
vse8.v  v1,0(a3)
addia4,a4,-32
li  a3,16
bltua4,a3,.L36
j   .L68

After this patch:

.L63:
ble s1,zero,.L49
sllia4,s1,3
li  a3,32
addia5,sp,48
bltua4,a3,.L62
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
addia4,a4,-32
addia5,sp,80
vse8.v  v1,0(a3)
.L35:
li  a3,16
bltua4,a3,.L36
addia3,a5,8
vmv.v.i v1,0
addia4,a4,-16
vse8.v  v1,0(a5)
addia5,a5,16
vse8.v  v1,0(a3)
.L36:
li  a3,8
bltua4,a3,.L37
vmv.v.i v1,0
vse8.v  v1,0(a5)

Tested on both RV32/RV64 no regression, Ok for trunk ?

PR target/113429

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): 
Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-4.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-5.c: Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 43 +++
 .../riscv/rvv/vsetvl/vlmax_conflict-4.c   |  5 +--
 .../riscv/rvv/vsetvl/vlmax_conflict-5.c   | 10 ++---
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 41d4b80648f..2067073185f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2254,6 +2254,22 @@ private:
 return true;
   }
 
+  bool has_compatible_reaching_vsetvl_p (vsetvl_info info)
+  {
+unsigned int index;
+sbitmap_iterator sbi;
+EXECUTE_IF_SET_IN_BITMAP (m_vsetvl_def_in[info.get_bb ()->index ()], 0,
+ index, sbi)
+  {
+   const auto prev_info = *m_vsetvl_def_exprs[index];
+   if (!prev_info.valid_p ())
+ continue;
+   if (m_dem.compatible_p (prev_info, info))
+ return true;
+  }
+return false;
+  }
+
   bool preds_all_same_avl_and_ratio_p (const vsetvl_info _info)
   {
 gcc_assert (
@@ -3076,22 +3092,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
{
  vsetvl_info new_curr_info = curr_info;
  new_curr_info.set_bb (crtl->ssa->bb (eg->dest));
- bool has_compatible_p = false;
- unsigned int def_expr_index;
- sbitmap_iterator sbi2;
- EXECUTE_IF_SET_IN_BITMAP (
-   m_vsetvl_def_in[new_curr_info.get_bb ()->index ()], 0,
-   def_expr_index, sbi2)
-   {
- vsetvl_info _info = *m_vsetvl_def_exprs[def_expr_index];
- if (!prev_info.valid_p ())
-   continue;
- if (m_dem.compatible_p (prev_info, new_curr_info))
-   {
- has_compatible_p = true;
- break;
-   }
-   }
+ bool has_compatible_p
+   = has_compatible_reaching_vsetvl_p (new_curr_info);
  if (!has_compatible_p)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3146,7 +3148,10 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
  else
{
  /* Cancel lift up if probabilities are equal.  */
- if (successors_probability_equal_p (eg->src))
+ if (successors_probability_equal_p (eg->src)
+ || (dest_block_info.probability
+   > src_block_info.probability
+ && !has_compatible_reaching_vsetvl_p (curr_info)))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -3154,8 +3159,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
   "  Reset bb %u:",
  

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-17 Thread Alexandre Oliva
David,

On Jan  7, 2024, "Kewen.Lin"  wrote:

> As PR113100 shows, the unbiasing introduced by r14-6737 can
> cause the scrubbing to overrun and screw some critical data
> on stack like saved toc base consequently cause segfault on
> Power.

I suppose this problem that Kewen fixed (thanks) was what caused you to
install commit r14-6838.  According to posted test results, strub worked
on AIX until Dec 20, when the fixes for sparc that broke strub on ppc
went in.

I can't seem to find the email in which you posted the patch, and I'd
have appreciated if you'd copied me.  I wouldn't have missed it for so
long if you had.  Since I couldn't find that patch, I'm responding in
this thread instead.

The r14-6838 patch is actually very very broken.  Disabling strub on a
target is not a matter of changing only the testsuite.  Your additions
to the tests even broke the strub-unsupported testcases, that tested
exactly the feature that enables ports to disable strub in a way that
informs users in case they attempt to use it.

I'd thus like to revert that patch.

Kewen's patch needs a little additional cleanup, that I'm preparing now,
to restore fully-functioning strub on sparc32.

Please let me know in case you observe any other problems related with
strub.  I'd be happy to fix them, but I can only do so once I'm aware of
them.

In case the reversal or the upcoming cleanup has any negative impact,
please make sure you let me know.

Thanks,

Happy GNU Year!

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] modula2: Many powerpc platforms do _not_ have support for IEEE754 long double [PR111956]

2024-01-17 Thread Gaius Mulley


ok for master ?

Bootstrapped on power8 (cfarm135), power9 (cfarm120) and
x86_64-linux-gnu.

---

This patch corrects commit
r14-4149-g81d5ca0b9b8431f1bd7a5ec8a2c94f04bb0cf032 which assummed
all powerpc platforms would have IEEE754 long double.  The patch
ensures that cc1gm2 obtains the default IEEE754 long double availability
from the configure generated tm_defines.  The user command
line switches -mabi=ibmlongdouble and -mabi=ieeelongdouble are implemented
to override the configuration defaults.

gcc/m2/ChangeLog:

PR modula2/111956
* Make-lang.in (host_mc_longreal): Remove.
* configure: Regenerate.
* configure.ac (M2C_LONGREAL_FLOAT128): Remove.
(M2C_LONGREAL_PPC64LE): Remove.
* gm2-compiler/M2Options.def (SetIBMLongDouble): New procedure.
(GetIBMLongDouble): New procedure function.
(SetIEEELongDouble): New procedure.
(GetIEEELongDouble): New procedure function.
* gm2-compiler/M2Options.mod (SetIBMLongDouble): New procedure.
(GetIBMLongDouble): New procedure function.
(SetIEEELongDouble): New procedure.
(GetIEEELongDouble): New procedure function.
(InitializeLongDoubleFlags): New procedure called during
module block initialization.
* gm2-gcc/m2configure.cc: Remove duplicate includes.
(m2configure_M2CLongRealFloat128): Remove.
(m2configure_M2CLongRealIBM128): Remove.
(m2configure_M2CLongRealLongDouble): Remove.
(m2configure_M2CLongRealLongDoublePPC64LE): Remove.
(m2configure_TargetIEEEQuadDefault): New function.
* gm2-gcc/m2configure.def (M2CLongRealFloat128): Remove.
(M2CLongRealIBM128): Remove.
(M2CLongRealLongDouble): Remove.
(M2CLongRealLongDoublePPC64LE): Remove.
(TargetIEEEQuadDefault): New function.
* gm2-gcc/m2configure.h (m2configure_M2CLongRealFloat128): Remove.
(m2configure_M2CLongRealIBM128): Remove.
(m2configure_M2CLongRealLongDouble): Remove.
(m2configure_M2CLongRealLongDoublePPC64LE): Remove.
(m2configure_TargetIEEEQuadDefault): New function.
* gm2-gcc/m2options.h (M2Options_SetIBMLongDouble): New prototype.
(M2Options_GetIBMLongDouble): New prototype.
(M2Options_SetIEEELongDouble): New prototype.
(M2Options_GetIEEELongDouble): New prototype.
* gm2-gcc/m2type.cc (build_m2_long_real_node): Re-implement using
results of M2Options_GetIBMLongDouble and M2Options_GetIEEELongDouble.
* gm2-lang.cc (gm2_langhook_handle_option): Add case
OPT_mabi_ibmlongdouble and call M2Options_SetIBMLongDouble.
Add case OPT_mabi_ieeelongdouble and call M2Options_SetIEEELongDouble.
* gm2config.aci.in: Regenerate.
* gm2spec.cc (lang_specific_driver): Remove block defined by
M2C_LONGREAL_PPC64LE.
Remove case OPT_mabi_ibmlongdouble.
Remove case OPT_mabi_ieeelongdouble.

libgm2/ChangeLog:

PR modula2/111956
* Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* Makefile.in: Regenerate.
* libm2cor/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2cor/Makefile.in: Regenerate.
* libm2iso/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2log/Makefile.in: Regenerate.
* libm2min/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2min/Makefile.in: Regenerate.
* libm2pim/Makefile.am (TARGET_LONGDOUBLE_ABI): Remove.
* libm2pim/Makefile.in: Regenerate.

---

diff --git a/gcc/m2/Make-lang.in b/gcc/m2/Make-lang.in
index d7bc7362bbf..45bfa933dca 100644
--- a/gcc/m2/Make-lang.in
+++ b/gcc/m2/Make-lang.in
@@ -98,9 +98,6 @@ GM2_PROG_DEP=gm2$(exeext) xgcc$(exeext) cc1gm2$(exeext)
 
 include m2/config-make
 
-# Determine if float128 should represent the Modula-2 type LONGREAL.
-host_mc_longreal := $(if $(strip $(filter 
powerpc64le%,$(host))),--longreal=__float128)
-
 LIBSTDCXX=../$(TARGET_SUBDIR)/libstdc++-v3/src/.libs/libstdc++.a
 
 PGE=m2/pge$(exeext)
@@ -474,8 +471,7 @@ MC_ARGS= --olang=c++ \
  -I$(srcdir)/m2/gm2-gcc \
  --quiet \
  $(MC_COPYRIGHT) \
- --gcc-config-system \
- $(host_mc_longreal)
+ --gcc-config-system
 
 MCDEPS=m2/boot-bin/mc$(exeext)
 
diff --git a/gcc/m2/configure b/gcc/m2/configure
index f62f3d8729c..46530970785 100755
--- a/gcc/m2/configure
+++ b/gcc/m2/configure
@@ -3646,24 +3646,6 @@ $as_echo "#define HAVE_OPENDIR 1" >>confdefs.h
 fi
 
 
-case $target in #(
-  powerpc64le*) :
-
-$as_echo "#define M2C_LONGREAL_FLOAT128 1" >>confdefs.h
- ;; #(
-  *) :
- ;;
-esac
-
-case $target in #(
-  powerpc64le*) :
-
-$as_echo "#define M2C_LONGREAL_PPC64LE 1" >>confdefs.h
- ;; #(
-  *) :
- ;;
-esac
-
 ac_config_headers="$ac_config_headers gm2config.aci"
 
 cat >confcache <<\_ACEOF
diff --git a/gcc/m2/configure.ac b/gcc/m2/configure.ac
index efcca628068..15be50936f7 

[COMMITTED] Re-alphabetize attribute tables in extend.texi.

2024-01-17 Thread Sandra Loosemore
These sections used to be alphabetized, but when I was working on the
fix for PR111659 I noticed documentation for some newer attributes had
been inserted at random places in the tables instead of maintaining
alphabetical order.  There's no change to content here, just moving
blocks of text around.

gcc/ChangeLog
* doc/extend.texi (Common Function Attributes): Re-alphabetize
the table.
(Common Variable Attributes): Likewise.
(Common Type Attributes): Likewise.
---
 gcc/doc/extend.texi | 857 ++--
 1 file changed, 430 insertions(+), 427 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 91f0b669b9e..d1893ad860c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3028,19 +3028,6 @@ types (@pxref{Variable Attributes}, @pxref{Type 
Attributes}.)
 The message attached to the attribute is affected by the setting of
 the @option{-fmessage-length} option.
 
-@cindex @code{unavailable} function attribute
-@item unavailable
-@itemx unavailable (@var{msg})
-The @code{unavailable} attribute results in an error if the function
-is used anywhere in the source file.  This is useful when identifying
-functions that have been removed from a particular variation of an
-interface.  Other than emitting an error rather than a warning, the
-@code{unavailable} attribute behaves in the same manner as
-@code{deprecated}.
-
-The @code{unavailable} attribute can also be used for variables and
-types (@pxref{Variable Attributes}, @pxref{Type Attributes}.)
-
 @cindex @code{error} function attribute
 @cindex @code{warning} function attribute
 @item error ("@var{message}")
@@ -3666,6 +3653,10 @@ This attribute locally overrides the 
@option{-fstack-limit-register}
 and @option{-fstack-limit-symbol} command-line options; it has the effect
 of disabling stack limit checking in the function it applies to.
 
+@cindex @code{no_stack_protector} function attribute
+@item no_stack_protector
+This attribute prevents stack protection code for the function.
+
 @cindex @code{noclone} function attribute
 @item noclone
 This function attribute prevents a function from being considered for
@@ -3761,63 +3752,6 @@ my_memcpy (void *dest, const void *src, size_t len)
 __attribute__((nonnull));
 @end smallexample
 
-@cindex @code{null_terminated_string_arg} function attribute
-@item null_terminated_string_arg
-@itemx null_terminated_string_arg (@var{N})
-The @code{null_terminated_string_arg} attribute may be applied to a
-function that takes a @code{char *} or @code{const char *} at
-referenced argument @var{N}.
-
-It indicates that the passed argument must be a C-style null-terminated
-string.  Specifically, the presence of the attribute implies that, if
-the pointer is non-null, the function may scan through the referenced
-buffer looking for the first zero byte.
-
-In particular, when the analyzer is enabled (via @option{-fanalyzer}),
-if the pointer is non-null, it will simulate scanning for the first
-zero byte in the referenced buffer, and potentially emit
-@option{-Wanalyzer-use-of-uninitialized-value}
-or @option{-Wanalyzer-out-of-bounds} on improperly terminated buffers.
-
-For example, given the following:
-
-@smallexample
-char *example_1 (const char *p)
-  __attribute__((null_terminated_string_arg (1)));
-@end smallexample
-
-the analyzer will check that any non-null pointers passed to the function
-are validly terminated.
-
-If the parameter must be non-null, it is appropriate to use both this
-attribute and the attribute @code{nonnull}, such as in:
-
-@smallexample
-extern char *example_2 (const char *p)
-  __attribute__((null_terminated_string_arg (1),
- nonnull (1)));
-@end smallexample
-
-See the @code{nonnull} attribute for more information and
-caveats.
-
-If the pointer argument is also referred to by an @code{access} attribute on 
the
-function with @var{access-mode} either @code{read_only} or @code{read_write}
-and the latter attribute has the optional @var{size-index} argument
-referring to a size argument, this expressses the maximum size of the access.
-For example, given:
-
-@smallexample
-extern char *example_fn (const char *p, size_t n)
-  __attribute__((null_terminated_string_arg (1),
- access (read_only, 1, 2),
- nonnull (1)));
-@end smallexample
-
-the analyzer will require the first parameter to be non-null, and either
-be validly null-terminated, or validly readable up to the size specified by
-the second parameter.
-
 @cindex @code{noplt} function attribute
 @item noplt
 The @code{noplt} attribute is the counterpart to option @option{-fno-plt}.
@@ -3896,6 +3830,63 @@ the standard C library can be guaranteed not to throw an 
exception
 with the notable exceptions of @code{qsort} and @code{bsearch} that
 take function pointer arguments.
 
+@cindex @code{null_terminated_string_arg} function attribute
+@item null_terminated_string_arg
+@itemx 

[COMMITTED] Clean up documentation for -Wstrict-flex-arrays [PR111659]

2024-01-17 Thread Sandra Loosemore
gcc/ChangeLog
PR middle-end/111659
* doc/extend.texi (Common Variable Attributes): Fix long lines
in documentation of strict_flex_array + other minor copy-editing.
Add a cross-reference to -Wstrict-flex-arrays.
* doc/invoke.texi (Option Summary): Fix whitespace in tables
before -fstrict-flex-arrays and -Wstrict-flex-arrays.
(C Dialect Options): Combine the docs for the two
-fstrict-flex-arrays forms into a single entry.  Note this option
is for C/C++ only.  Add a cross-reference to -Wstrict-flex-arrays.
(Warning Options): Note -Wstrict-flex-arrays is for C/C++ only.
Minor copy-editing.  Add cross references to the strict_flex_array
attribute and -fstrict-flex-arrays option.  Add note that this
option depends on -ftree-vrp.
---
 gcc/doc/extend.texi | 30 +++---
 gcc/doc/invoke.texi | 51 ++---
 2 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89e823629e3..91f0b669b9e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7790,18 +7790,24 @@ are treated as flexible array members. @var{level}=3 is 
the strictest level,
 only when the trailing array is declared as a flexible array member per C99
 standard onwards (@samp{[]}), it is treated as a flexible array member.
 
-There are two more levels in between 0 and 3, which are provided to support
-older codes that use GCC zero-length array extension (@samp{[0]}) or 
one-element
-array as flexible array members (@samp{[1]}):
-When @var{level} is 1, the trailing array is treated as a flexible array member
-when it is declared as either @samp{[]}, @samp{[0]}, or @samp{[1]};
-When @var{level} is 2, the trailing array is treated as a flexible array member
-when it is declared as either @samp{[]}, or @samp{[0]}.
-
-This attribute can be used with or without the @option{-fstrict-flex-arrays}.
-When both the attribute and the option present at the same time, the level of
-the strictness for the specific trailing array field is determined by the
-attribute.
+There are two more levels in between 0 and 3, which are provided to
+support older codes that use GCC zero-length array extension
+(@samp{[0]}) or one-element array as flexible array members
+(@samp{[1]}).  When @var{level} is 1, the trailing array is treated as
+a flexible array member when it is declared as either @samp{[]},
+@samp{[0]}, or @samp{[1]}; When @var{level} is 2, the trailing array
+is treated as a flexible array member when it is declared as either
+@samp{[]}, or @samp{[0]}.
+
+This attribute can be used with or without the
+@option{-fstrict-flex-arrays} command-line option.  When both the
+attribute and the option are present at the same time, the level of
+the strictness for the specific trailing array field is determined by
+the attribute.
+
+The @code{strict_flex_array} attribute interacts with the
+@option{-Wstrict-flex-arrays} option.  @xref{Warning Options}, for more
+information.
 
 @cindex @code{alloc_size} variable attribute
 @item alloc_size (@var{position})
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 43fd3c3a3cd..a537be66736 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -207,7 +207,7 @@ in the following sections.
 -fopenmp  -fopenmp-simd  -fopenmp-target-simd-clone@r{[}=@var{device-type}@r{]}
 -fpermitted-flt-eval-methods=@var{standard}
 -fplan9-extensions  -fsigned-bitfields  -funsigned-bitfields
--fsigned-char  -funsigned-char -fstrict-flex-arrays[=@var{n}]
+-fsigned-char  -funsigned-char  -fstrict-flex-arrays[=@var{n}]
 -fsso-struct=@var{endianness}}
 
 @item C++ Language Options
@@ -405,7 +405,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wstrict-aliasing=n  -Wstrict-overflow  -Wstrict-overflow=@var{n}
 -Wstring-compare
 -Wno-stringop-overflow -Wno-stringop-overread
--Wno-stringop-truncation -Wstrict-flex-arrays
+-Wno-stringop-truncation  -Wstrict-flex-arrays
 -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{|}malloc@r{]}
 -Wswitch  -Wno-switch-bool  -Wswitch-default  -Wswitch-enum
 -Wno-switch-outside-range  -Wno-switch-unreachable  -Wsync-nand
@@ -2945,22 +2945,22 @@ is always just like one of those two.
 
 @opindex fstrict-flex-arrays
 @opindex fno-strict-flex-arrays
-@item -fstrict-flex-arrays
-Control when to treat the trailing array of a structure as a flexible array
-member for the purpose of accessing the elements of such an array.
-The positive form is equivalent to @option{-fstrict-flex-arrays=3}, which is 
the
-strictest.  A trailing array is treated as a flexible array member only when it
-is declared as a flexible array member per C99 standard onwards.
-The negative form is equivalent to @option{-fstrict-flex-arrays=0}, which is 
the
-least strict.  All trailing arrays of structures are treated as flexible array
-members.
-
 @opindex fstrict-flex-arrays=@var{level}
-@item 

[COMMITTEDv2] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow all subreg [PR113221]

2024-01-17 Thread Andrew Pinski
So the problem here is that aarch64_ldp_reg_operand will all subreg even subreg 
of lo_sum.
When LRA tries to fix that up, all things break. So the fix is to change the 
check to only
allow reg and subreg of regs.

Note the tendancy here is to use register_operand but that checks the mode of 
the register
but we need to allow a mismatch modes for this predicate for now.

Committed as approved.
Built and tested for aarch64-linux-gnu with no regressions
(Also tested with the LD/ST pair pass back on).

PR target/113221

gcc/ChangeLog:

* config/aarch64/predicates.md (aarch64_ldp_reg_operand): For subreg,
only allow REG operands instead of allowing all.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr113221-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/predicates.md |  6 +-
 gcc/testsuite/gcc.c-torture/compile/pr113221-1.c | 12 
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr113221-1.c

diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 8a204e48bb5..b895f5dcb86 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -313,7 +313,11 @@ (define_predicate "pmode_plus_operator"
 
 (define_special_predicate "aarch64_ldp_reg_operand"
   (and
-(match_code "reg,subreg")
+(ior
+  (match_code "reg")
+  (and
+   (match_code "subreg")
+   (match_test "REG_P (SUBREG_REG (op))")))
 (match_test "aarch64_ldpstp_operand_mode_p (GET_MODE (op))")
 (ior
   (match_test "mode == VOIDmode")
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
new file mode 100644
index 000..942fa5eea88
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
@@ -0,0 +1,12 @@
+/* { dg-options "-fno-move-loop-invariants -funroll-all-loops" } */
+/* PR target/113221 */
+/* This used to ICE after the `load/store pair fusion pass` was added
+   due to the predicate aarch64_ldp_reg_operand allowing too much. */
+
+
+void bar();
+void foo(int* b) {
+  for (;;)
+*b++ = (__SIZE_TYPE__)bar;
+}
+
-- 
2.39.3



Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin

2024-01-17 Thread François Dumont

Hi

Looks like a great finding to me, this is indeed a useless check, thanks!

Have you any figures on the performance enhancement ? It might help to 
get proper approval as gcc is currently in dev stage 4 that is to say 
only bug fixes normally.


François

On 17/01/2024 09:11, Huanghui Nie wrote:


Hi.

When I implemented a hash table with reference to the C++ STL, I found 
that when the hash table in the C++ STL deletes elements, if the first 
element deleted is the begin element, the before begin node is 
repeatedly assigned. This creates unnecessary performance overhead.



First, let’s see the code implementation:

In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when 
&_M_before_begin == _M_buckets[__bkt]. That also means 
_M_buckets[__bkt]->_M_nxt is assigned under some conditions.


_M_remove_bucket_begin is called by _M_erase and _M_extract_node:

 1. Case _M_erase a range: _M_remove_bucket_begin is called in a for
loop when __is_bucket_begin is true. And if __is_bucket_begin is
true and &_M_before_begin == _M_buckets[__bkt], __prev_n must be
&_M_before_begin. __prev_n->_M_nxt is always assigned in _M_erase.
That means _M_before_begin._M_nxt is always assigned, if
_M_remove_bucket_begin is called and &_M_before_begin ==
_M_buckets[__bkt]. So there’s no need to assign
_M_before_begin._M_nxt in _M_remove_bucket_begin.
 2. Other cases: _M_remove_bucket_begin is called when __prev_n ==
_M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in
_M_erase and _M_before_begin. That means _M_buckets[__bkt]->_M_nxt
is always assigned. So there's no need to assign
_M_buckets[__bkt]->_M_nxt in _M_remove_bucket_begin.

In summary, there’s no need to check &_M_before_begin == 
_M_buckets[__bkt] and assign _M_before_begin._M_nxt in 
_M_remove_bucket_begin.



Then let’s see the responsibility of each method:

The hash table in the C++ STL is composed of hash buckets and a node 
list. The update of the node list is responsible for _M_erase and 
_M_extract_node method. _M_remove_bucket_begin method only needs to 
update the hash buckets. The update of _M_before_begin belongs to the 
update of the node list. So _M_remove_bucket_begin doesn’t need to 
update _M_before_begin.



Existing tests listed below cover this change:

23_containers/unordered_set/allocator/copy.cc

23_containers/unordered_set/allocator/copy_assign.cc

23_containers/unordered_set/allocator/move.cc

23_containers/unordered_set/allocator/move_assign.cc

23_containers/unordered_set/allocator/swap.cc

23_containers/unordered_set/erase/1.cc

23_containers/unordered_set/erase/24061-set.cc

23_containers/unordered_set/modifiers/extract.cc

23_containers/unordered_set/operations/count.cc

23_containers/unordered_set/requirements/exception/basic.cc

23_containers/unordered_map/allocator/copy.cc

23_containers/unordered_map/allocator/copy_assign.cc

23_containers/unordered_map/allocator/move.cc

23_containers/unordered_map/allocator/move_assign.cc

23_containers/unordered_map/allocator/swap.cc

23_containers/unordered_map/erase/1.cc

23_containers/unordered_map/erase/24061-map.cc

23_containers/unordered_map/modifiers/extract.cc

23_containers/unordered_map/modifiers/move_assign.cc

23_containers/unordered_map/operations/count.cc

23_containers/unordered_map/requirements/exception/basic.cc


Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?


---

ChangeLog:


libstdc++: hashtable: No need to update before begin node in 
_M_remove_bucket_begin



2024-01-16Huanghui Nie


gcc/

* libstdc++-v3/include/bits/hashtable.h


---


diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h


index b48610036fa..6056639e663 100644

--- a/libstdc++-v3/include/bits/hashtable.h

+++ b/libstdc++-v3/include/bits/hashtable.h

@@ -872,13 +872,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

      if (!__next_n || __next_bkt != __bkt)

        {

          // Bucket is now empty

-         // First update next bucket if any

+         // Update next bucket if any

          if (__next_n)

            _M_buckets[__next_bkt] = _M_buckets[__bkt];

-         // Second update before begin node if necessary

-         if (&_M_before_begin == _M_buckets[__bkt])

-           _M_before_begin._M_nxt = __next_n;

          _M_buckets[__bkt] = nullptr;

        }

    }



[COMITTED 1/2] RISC-V: RVV: add toggle to control vsetvl pass behavior

2024-01-17 Thread Vineet Gupta
RVV requires VSET?VL? instructions to dynamically configure VLEN at
runtime. There's a custom pass to do that which has a simple mode
which generates a VSETVL for each V insn and a lazy/optimal mode which
uses LCM dataflow to move VSETVL around, identify/delete the redundant
ones.

Currently simple mode is default for !optimize invocations while lazy
mode being the default.

This patch allows simple mode to be forced via a toggle independent of
the optimization level. A lot of gcc developers are currently doing this
in some form in their local setups, as in the initial phase of autovec
development issues are expected. It makes sense to provide this facility
upstream. It could potentially also be used by distro builder for any
quick workarounds in autovec bugs of future.

gcc/ChangeLog:
* config/riscv/riscv.opt: New -param=vsetvl-strategy.
* config/riscv/riscv-opts.h: New enum vsetvl_strategy_enum.
* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::pre_global_vsetvl_info): Use vsetvl_strategy.
(pass_vsetvl::execute): Use vsetvl_strategy.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-opts.h|  9 +
 gcc/config/riscv/riscv-vsetvl.cc |  2 +-
 gcc/config/riscv/riscv.opt   | 14 ++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff4406ab8eaf..ca57dddf1d9a 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -116,6 +116,15 @@ enum stringop_strategy_enum {
   STRATEGY_AUTO = STRATEGY_SCALAR | STRATEGY_VECTOR
 };
 
+/* Behavior of VSETVL Pass.  */
+enum vsetvl_strategy_enum {
+  /* Simple: Insert a vsetvl* instruction for each Vector instruction.  */
+  VSETVL_SIMPLE = 1,
+  /* Optimized: Run LCM dataflow analysis to reduce vsetvl* insns and
+ delete any redundant ones generated in the process.  */
+  VSETVL_OPT = 2
+};
+
 #define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && 
TARGET_64BIT))
 
 /* Bit of riscv_zvl_flags will set contintuly, N-1 bit will set if N-bit is
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index df7ed149388a..78a2f7b38faf 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3671,7 +3671,7 @@ pass_vsetvl::execute (function *)
   if (!has_vector_insn (cfun))
 return 0;
 
-  if (!optimize)
+  if (!optimize || vsetvl_strategy & VSETVL_SIMPLE)
 simple_vsetvl ();
   else
 lazy_vsetvl ();
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 44ed6d69da29..fd4f1a4df206 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -546,6 +546,20 @@ Target Undocumented Bool Var(riscv_vector_abi) Init(0)
 Enable the use of vector registers for function arguments and return value.
 This is an experimental switch and may be subject to change in the future.
 
+Enum
+Name(vsetvl_strategy) Type(enum vsetvl_strategy_enum)
+Valid arguments to -param=vsetvl-strategy=:
+
+EnumValue
+Enum(vsetvl_strategy) String(simple) Value(VSETVL_SIMPLE)
+
+EnumValue
+Enum(vsetvl_strategy) String(optim) Value(VSETVL_OPT)
+
+-param=vsetvl-strategy=
+Target Undocumented RejectNegative Joined Enum(vsetvl_strategy) 
Var(vsetvl_strategy) Init(VSETVL_OPT)
+-param=vsetvl-strategy=Set the optimization level of VSETVL 
insert pass.
+
 Enum
 Name(stringop_strategy) Type(enum stringop_strategy_enum)
 Valid arguments to -mstringop-strategy=:
-- 
2.34.1



[COMITTED 2/2] RISC-V: fix some vsetvl debug info in pass's Phase 2 code [NFC]

2024-01-17 Thread Vineet Gupta
When staring at VSETVL pass for PR/113429, spotted some minor
improvements.

1. For readablity, remove some redundant condition check in Phase 2
   function  earliest_fuse_vsetvl_info ().
2. Add iteration count in debug prints in same function.

gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (earliest_fuse_vsetvl_info):
Remove redundant checks in else condition for readablity.
(earliest_fuse_vsetvl_info) Print iteration count in debug
prints.
(earliest_fuse_vsetvl_info) Fix misleading vsetvl info
dump details in certain cases.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-vsetvl.cc | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 78a2f7b38faf..41d4b80648f6 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2343,7 +2343,7 @@ public:
   void compute_lcm_local_properties ();
 
   void fuse_local_vsetvl_info ();
-  bool earliest_fuse_vsetvl_info ();
+  bool earliest_fuse_vsetvl_info (int iter);
   void pre_global_vsetvl_info ();
   void emit_vsetvl ();
   void cleaup ();
@@ -2961,7 +2961,7 @@ pre_vsetvl::fuse_local_vsetvl_info ()
 
 
 bool
-pre_vsetvl::earliest_fuse_vsetvl_info ()
+pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
 {
   compute_avl_def_data ();
   compute_vsetvl_def_data ();
@@ -2984,7 +2984,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
-  fprintf (dump_file, "\n  Compute LCM earliest insert data:\n\n");
+  fprintf (dump_file, "\n  Compute LCM earliest insert data (lift 
%d):\n\n",
+  iter);
   fprintf (dump_file, "Expression List (%u):\n", num_exprs);
   for (unsigned i = 0; i < num_exprs; i++)
{
@@ -3032,7 +3033,7 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
-  fprintf (dump_file, "Fused global info result:\n");
+  fprintf (dump_file, "Fused global info result (lift %d):\n", iter);
 }
 
   bool changed = false;
@@ -3142,8 +3143,7 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
  if (src_block_info.has_info ())
src_block_info.probability += dest_block_info.probability;
}
- else if (src_block_info.has_info ()
-  && !m_dem.compatible_p (prev_info, curr_info))
+ else
{
  /* Cancel lift up if probabilities are equal.  */
  if (successors_probability_equal_p (eg->src))
@@ -3151,11 +3151,11 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file,
-  "  Change empty bb %u to from:",
+  "  Reset bb %u:",
   eg->src->index);
  prev_info.dump (dump_file, "");
  fprintf (dump_file,
-  "to (higher probability):");
+  "due to (same probability):");
  curr_info.dump (dump_file, "");
}
  src_block_info.set_empty_info ();
@@ -3170,7 +3170,7 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file,
-  "  Change empty bb %u to from:",
+  "  Change bb %u from:",
   eg->src->index);
  prev_info.dump (dump_file, "");
  fprintf (dump_file,
@@ -3627,7 +3627,7 @@ pass_vsetvl::lazy_vsetvl ()
 {
   if (dump_file)
fprintf (dump_file, "  Try lift up %d.\n\n", fused_count);
-  changed = pre.earliest_fuse_vsetvl_info ();
+  changed = pre.earliest_fuse_vsetvl_info (fused_count);
   fused_count += 1;
   } while (changed);
 
-- 
2.34.1



RE: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow all subreg [PR113221]

2024-01-17 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Alex Coplan 
> Sent: Wednesday, January 17, 2024 12:59 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to
> allow all subreg [PR113221]
> 
> Hi Andrew,
> 
> On 16/01/2024 19:29, Andrew Pinski wrote:
> > So the problem here is that aarch64_ldp_reg_operand will all subreg even
> subreg of lo_sum.
> > When LRA tries to fix that up, all things break. So the fix is to
> > change the check to only allow reg and subreg of regs.
> 
> Thanks a lot for tracking this down, I really appreciate having some help with
> the bug-fixing.  Sorry for not getting to it sooner myself, I'm working on
> PR113089 which ended up taking longer than expected to fix.
> 
> >
> > Note the tendancy here is to use register_operand but that checks the
> > mode of the register but we need to allow a mismatch modes for this
> predicate for now.
> 
> Yeah, due to the design of the patterns using special predicates we need to
> allow a mode mismatch with the contextual mode.
> 
> The patch broadly LGTM (although I can't approve), but I've left a couple of
> minor comments below.
> 
> >
> > Built and tested for aarch64-linux-gnu with no regressions (Also
> > tested with the LD/ST pair pass back on).
> >
> > PR target/113221
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/predicates.md (aarch64_ldp_reg_operand): For
> subreg,
> > only allow REG operands isntead of allowing all.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/compile/pr113221-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/config/aarch64/predicates.md |  8 +++-
> >  gcc/testsuite/gcc.c-torture/compile/pr113221-1.c | 12 
> >  2 files changed, 19 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> >
> > diff --git a/gcc/config/aarch64/predicates.md
> > b/gcc/config/aarch64/predicates.md
> > index 8a204e48bb5..256268517d8 100644
> > --- a/gcc/config/aarch64/predicates.md
> > +++ b/gcc/config/aarch64/predicates.md
> > @@ -313,7 +313,13 @@ (define_predicate "pmode_plus_operator"
> >
> >  (define_special_predicate "aarch64_ldp_reg_operand"
> >(and
> > -(match_code "reg,subreg")
> > +(ior
> > +  (match_code "reg")
> > +  (and
> > +   (match_code "subreg")
> > +   (match_test "GET_CODE (SUBREG_REG (op)) == REG")
> 
> This could be just REG_P (SUBREG_REG (op)) in the match_test.
> 
> > +  )
> > +)
> 
> I think it would be more in keeping with the style in the rest of the file to 
> have
> the closing parens on the same line as the SUBREG_REG match_test.
> 
> >  (match_test "aarch64_ldpstp_operand_mode_p (GET_MODE (op))")
> >  (ior
> >(match_test "mode == VOIDmode") diff --git
> > a/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> > b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> > new file mode 100644
> > index 000..152a510786e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-options "-fno-move-loop-invariants -funroll-all-loops" } */
> 
> Does this need to be dg-additional-options?  Naively I would expect the dg-
> options clause to override the torture options (and potentially any options
> provided in RUNTESTFLAGS, e.g. to re-enable the ldp/stp pass).

I just checked my testsuite run and the answer for this is no it does not need 
to be dg-additional-options in this case.
dg-options does not override the torture options but rather puts them after 
those ones. 
As far as I understand it, dg-additional-options makes it easier to have 
different options added per target but in this case we don't need that.

Will update the patch with the rest of the changes and push it in next few 
hours.
I did notice an issue with the testcase though, I need to cast to __SIZE_TYPE__ 
instead of long to allow it to work with targets that are not ILP32 and LP54. I 
will fix that too.

Thanks,
Andrew Pinski

> 
> Thanks again for the patch, and apologies for the oversight on my part: I'd
> missed that register_operand also checks the code inside the subreg.
> 
> Alex
> 
> > +/* PR target/113221 */
> > +/* This used to ICE after the `load/store pair fusion pass` was added
> > +   due to the predicate aarch64_ldp_reg_operand allowing too much. */
> > +
> > +
> > +void bar();
> > +void foo(int* b) {
> > +  for (;;)
> > +*b++ = (long)bar;
> > +}
> > +
> > --
> > 2.39.3
> >


Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread H.J. Lu
On Wed, Jan 17, 2024 at 7:02 AM Richard Biener
 wrote:
>
> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>  wrote:
> >
> > > On Jan 17, 2024, at 10:51, Richard Biener  
> > > wrote:
> > >
> > > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
> > >>
> > >>
> > >>
> > >> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> > >>> Hi Vladimir,
> > >>> Hi Jeff,
> > >>>
> > >>> Richard and Alexander have reviewed this patch and [I assume] have no
> > >>> further comments.  OK to merge?
> > >> I think the question is whether or not we're too late.  I know that
> > >> Richard S has held off on his late-combine pass and I'm holding off on
> > >> the ext-dce work due to the fact that we're well past stage1 close.
> > >>
> > >> I think the release managers ought to have the final say on this.
> > >
> > > I'm fine with this now, it doesn't change code generation.
> >
> > Thanks, Richard.
> >
> > I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
> > cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
> > stage1 opens.
>
> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>
> gcc/fortran/f95-lang.o differs
>
> does n_mem_deps or n_inc_deps include debug insns?
>
> Richard.

FWIW, I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113456

-- 
H.J.


[PATCH] sra: Disqualify bases of operands of asm gotos

2024-01-17 Thread Martin Jambor
Hi,

PR 110422 shows that SRA can ICE assuming there is a single edge
outgoing from a block terminated with an asm goto.  We need that for
BB-terminating statements so that any adjustments they make to the
aggregates can be copied over to their replacements.  Because we can't
have that after ASM gotos, we need to punt.

Bootstrapped and tested on x86_64-linux, OK for master?  It will need
some tweaking for release branches, is it in principle OK for them too
(after testing)?

Thanks,

Martin


gcc/ChangeLog:

2024-01-17  Martin Jambor  

PR tree-optimization/110422
* tree-sra.cc (scan_function): Disqualify bases of operands of asm
gotos.

gcc/testsuite/ChangeLog:

2024-01-17  Martin Jambor  

PR tree-optimization/110422
* gcc.dg/torture/pr110422.c: New test.
---
 gcc/testsuite/gcc.dg/torture/pr110422.c | 10 +
 gcc/tree-sra.cc | 29 -
 2 files changed, 33 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110422.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110422.c 
b/gcc/testsuite/gcc.dg/torture/pr110422.c
new file mode 100644
index 000..2e171a7a19e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110422.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+struct T { int x; };
+int foo(void) {
+  struct T v;
+  asm goto("" : "+r"(v.x) : : : lab);
+  return 0;
+lab:
+  return -5;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 6a1141b7377..f8e71ec48b9 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1559,15 +1559,32 @@ scan_function (void)
case GIMPLE_ASM:
  {
gasm *asm_stmt = as_a  (stmt);
-   for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
+   if (stmt_ends_bb_p (asm_stmt)
+   && !single_succ_p (gimple_bb (asm_stmt)))
  {
-   t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
-   ret |= build_access_from_expr (t, asm_stmt, false);
+   for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
+   disqualify_base_of_expr (t, "OP of asm goto.");
+ }
+   for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
+   disqualify_base_of_expr (t, "OP of asm goto.");
+ }
  }
-   for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
+   else
  {
-   t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
-   ret |= build_access_from_expr (t, asm_stmt, true);
+   for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
+   ret |= build_access_from_expr (t, asm_stmt, false);
+ }
+   for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
+ {
+   t = TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
+   ret |= build_access_from_expr (t, asm_stmt, true);
+ }
  }
  }
  break;
-- 
2.43.0



Remove accidental hack in ipa_polymorphic_call_context::set_by_invariant

2024-01-17 Thread Jan Hubicka
Hi,
I managed to commit a hack setting offset to 0 in
ipa_polymorphic_call_context::set_by_invariant.  This makes it to give up on 
multiple
inheritance, but most likely won't give bad code since the ohter base will be of
different type.  

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* ipa-polymorphic-call.cc 
(ipa_polymorphic_call_context::set_by_invariant): Remove
accidental hack reseting offset.

diff --git a/gcc/ipa-polymorphic-call.cc b/gcc/ipa-polymorphic-call.cc
index 8667059abee..81de6d7fc33 100644
--- a/gcc/ipa-polymorphic-call.cc
+++ b/gcc/ipa-polymorphic-call.cc
@@ -766,7 +766,6 @@ ipa_polymorphic_call_context::set_by_invariant (tree cst,
   tree base;
 
   invalid = false;
-  off = 0;
   clear_outer_type (otr_type);
 
   if (TREE_CODE (cst) != ADDR_EXPR)


Fix handling of X86_TUNE_AVOID_512FMA_CHAINS

2024-01-17 Thread Jan Hubicka
Hi,
I have noticed quite bad pasto in handling of X86_TUNE_AVOID_512FMA_CHAINS.  At 
the
moment it is ignored, but X86_TUNE_AVOID_256FMA_CHAINS controls 512FMA too.
This patch fixes it, we may want to re-check how that works on AVX512 machines.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_option_override_internal): Fix
handling of X86_TUNE_AVOID_512FMA_CHAINS.

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 3605c2c53fb..b6f634e9a32 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3248,7 +3248,7 @@ ix86_option_override_internal (bool main_args_p,
   = (cf_protection_level) (opts->x_flag_cf_protection | CF_SET);
 }
 
-  if (ix86_tune_features [X86_TUNE_AVOID_256FMA_CHAINS])
+  if (ix86_tune_features [X86_TUNE_AVOID_512FMA_CHAINS])
 SET_OPTION_IF_UNSET (opts, opts_set, param_avoid_fma_max_bits, 512);
   else if (ix86_tune_features [X86_TUNE_AVOID_256FMA_CHAINS])
 SET_OPTION_IF_UNSET (opts, opts_set, param_avoid_fma_max_bits, 256);


Re: Disable FMADD in chains for Zen4 and generic

2024-01-17 Thread Jan Hubicka
> Can we backport the patch(at least the generic part) to
> GCC11/GCC12/GCC13 release branch?

Yes, the periodic testers has took the change and as far as I can tell,
there are no surprises.

Thanks,
Honza
> > > >
> > > >  /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 
> > > > 512bit or
> > > > smaller FMA chain.  */
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> 
> 
> 
> -- 
> BR,
> Hongtao


[PATCH] Avoid ICE on m68k -fzero-call-used-regs -fpic [PR110934]

2024-01-17 Thread Mikael Pettersson
PR110934 is a problem on m68k where -fzero-call-used-regs -fpic ICEs
when clearing an FP register.

The generic code generates an XFmode move of zero to that register,
which becomes an XFmode load from initialized data, which due to -fpic
uses a non-constant address, which the backend rejects.  The
zero-call-used-regs pass runs very late, after register allocation and
frame layout, and at that point we can't allow new uses of the PIC
register or new pseudos.

To clear an FP register on m68k it's enough to do the move in SFmode,
but the generic code can't be told to do that, so this patch updates
m68k to use its own TARGET_ZERO_CALL_USED_REGS.

Bootstrapped and regression tested on m68k-linux-gnu.

Ok for master? (I don't have commit rights.)

gcc/

PR target/110934
* config/m68k/m68k.cc (m68k_zero_call_used_regs): New function.
(TARGET_ZERO_CALL_USED_REGS): Define.

gcc/testsuite/

PR target/110934
* gcc.target/m68k/pr110934.c: New test.
---
 gcc/config/m68k/m68k.cc  | 46 
 gcc/testsuite/gcc.target/m68k/pr110934.c |  9 +
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/m68k/pr110934.c

diff --git a/gcc/config/m68k/m68k.cc b/gcc/config/m68k/m68k.cc
index e9325686b92..72a29d772ea 100644
--- a/gcc/config/m68k/m68k.cc
+++ b/gcc/config/m68k/m68k.cc
@@ -197,6 +197,7 @@ static bool m68k_modes_tieable_p (machine_mode, 
machine_mode);
 static machine_mode m68k_promote_function_mode (const_tree, machine_mode,
int *, const_tree, int);
 static void m68k_asm_final_postscan_insn (FILE *, rtx_insn *insn, rtx [], int);
+static HARD_REG_SET m68k_zero_call_used_regs (HARD_REG_SET 
need_zeroed_hardregs);
 
 /* Initialize the GCC target structure.  */
 
@@ -361,6 +362,9 @@ static void m68k_asm_final_postscan_insn (FILE *, rtx_insn 
*insn, rtx [], int);
 #undef TARGET_ASM_FINAL_POSTSCAN_INSN
 #define TARGET_ASM_FINAL_POSTSCAN_INSN m68k_asm_final_postscan_insn
 
+#undef TARGET_ZERO_CALL_USED_REGS
+#define TARGET_ZERO_CALL_USED_REGS m68k_zero_call_used_regs
+
 TARGET_GNU_ATTRIBUTES (m68k_attribute_table,
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
@@ -7166,4 +7170,46 @@ m68k_promote_function_mode (const_tree type, 
machine_mode mode,
   return mode;
 }
 
+/* Implement TARGET_ZERO_CALL_USED_REGS.  */
+
+static HARD_REG_SET
+m68k_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  rtx zero_fpreg = NULL_RTX;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+  {
+   rtx reg, zero;
+
+   if (INT_REGNO_P (regno))
+ {
+   reg = regno_reg_rtx[regno];
+   zero = CONST0_RTX (SImode);
+ }
+   else if (FP_REGNO_P (regno))
+ {
+   reg = gen_raw_REG (SFmode, regno);
+   if (zero_fpreg == NULL_RTX)
+ {
+   /* On the 040/060 clearing an FP reg loads a large
+  immediate.  To reduce code size use the first
+  cleared FP reg to clear remaing ones.  Don't do
+  this on cores which use fmovecr.  */
+   zero = CONST0_RTX (SFmode);
+   if (TUNE_68040_60)
+ zero_fpreg = reg;
+ }
+   else
+ zero = zero_fpreg;
+ }
+   else
+ gcc_unreachable ();
+
+   emit_move_insn (reg, zero);
+  }
+
+  return need_zeroed_hardregs;
+}
+
 #include "gt-m68k.h"
diff --git a/gcc/testsuite/gcc.target/m68k/pr110934.c 
b/gcc/testsuite/gcc.target/m68k/pr110934.c
new file mode 100644
index 000..8c21d46f660
--- /dev/null
+++ b/gcc/testsuite/gcc.target/m68k/pr110934.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { do-options "-fzero-call-used-regs=used -fpic -O2" } */
+
+extern double clobber_fp0 (void);
+
+void foo (void)
+{
+  clobber_fp0 ();
+}
-- 
2.43.0



RE: [PATCH v2 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2024-01-17 Thread Kyrylo Tkachov
Hi Andre,

> -Original Message-
> From: Andre Vieira 
> Sent: Friday, January 5, 2024 5:52 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Stam Markianos-Wright
> 
> Subject: [PATCH v2 2/2] arm: Add support for MVE Tail-Predicated Low Overhead
> Loops
> 
> Respin after comments on first version.

I think I'm nitpicking some code style and implementation points rather than 
diving deep into the algorithms, I think those were okay last time I looked at 
this some time ago.

+/* Return true if INSN is a MVE instruction that is VPT-predicable, but in
+   its unpredicated form, or if it is predicated, but on a predicate other
+   than VPR_REG.  */
+
+static bool
+arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate (rtx_insn *insn,
+ rtx vpr_reg)
+{
+  rtx insn_vpr_reg_operand;
+  if (MVE_VPT_UNPREDICATED_INSN_P (insn)
+  || (MVE_VPT_PREDICATED_INSN_P (insn)
+ && (insn_vpr_reg_operand = arm_get_required_vpr_reg_param (insn))
+ && !rtx_equal_p (vpr_reg, insn_vpr_reg_operand)))
+return true;
+  else
+return false;
+}
+
+/* Return true if INSN is a MVE instruction that is VPT-predicable and is
+   predicated on VPR_REG.  */
+
+static bool
+arm_mve_vec_insn_is_predicated_with_this_predicate (rtx_insn *insn,
+   rtx vpr_reg)
+{
+  rtx insn_vpr_reg_operand;
+  if (MVE_VPT_PREDICATED_INSN_P (insn)
+  && (insn_vpr_reg_operand = arm_get_required_vpr_reg_param (insn))
+  && rtx_equal_p (vpr_reg, insn_vpr_reg_operand))
+return true;
+  else
+return false;
+}

These two functions seem to have an "if (condition) return true; else return 
false;" structure that we try to avoid. How about:
rtx_insn vpr_reg_operand = MVE_VPT_PREDICATED_INSN_P (insn)  ? 
arm_get_required_vpr_reg_param (insn) : NULL_RTX;
return vpr_reg_operand && rtx_equal_p (vpr_reg, insn_vpr_reg_operand);


+static bool
+arm_is_mve_across_vector_insn (rtx_insn* insn)
+{
+  df_ref insn_defs = NULL;
+  if (!MVE_VPT_PREDICABLE_INSN_P (insn))
+return false;
+
+  bool is_across_vector = false;
+  FOR_EACH_INSN_DEF (insn_defs, insn)
+if (!VALID_MVE_MODE (GET_MODE (DF_REF_REG (insn_defs)))
+   && !arm_get_required_vpr_reg_ret_val (insn))
+  is_across_vector = true;
+

You can just return true here immediately, no need to set is_across_vector

+  return is_across_vector;

... and you can return false here, avoiding the need for is_across_vector 
entirely
+}

+static bool
+arm_mve_check_reg_origin_is_num_elems (basic_block body, rtx reg, rtx 
vctp_step)
+{
+  /* Ok, we now know the loop starts from zero and increments by one.
+ Now just show that the max value of the counter came from an
+ appropriate ASHIFRT expr of the correct amount.  */
+  basic_block pre_loop_bb = body->prev_bb;
+  while (pre_loop_bb && BB_END (pre_loop_bb)
+&& !df_bb_regno_only_def_find (pre_loop_bb, REGNO (reg)))
+pre_loop_bb = pre_loop_bb->prev_bb;
+
+  df_ref counter_max_last_def = df_bb_regno_only_def_find (pre_loop_bb, REGNO 
(reg));
+  if (!counter_max_last_def)
+return false;
+  rtx counter_max_last_set = single_set (DF_REF_INSN (counter_max_last_def));
+  if (!counter_max_last_set)
+return false;
+
+  /* If we encounter a simple SET from a REG, follow it through.  */
+  if (REG_P (SET_SRC (counter_max_last_set)))
+return arm_mve_check_reg_origin_is_num_elems
+(pre_loop_bb->next_bb, SET_SRC (counter_max_last_set), vctp_step);
+
+  /* If we encounter a SET from an IF_THEN_ELSE where one of the operands is a
+ constant and the other is a REG, follow through to that REG.  */
+  if (GET_CODE (SET_SRC (counter_max_last_set)) == IF_THEN_ELSE
+  && REG_P (XEXP (SET_SRC (counter_max_last_set), 1))
+  && CONST_INT_P (XEXP (SET_SRC (counter_max_last_set), 2)))
+return arm_mve_check_reg_origin_is_num_elems
+(pre_loop_bb->next_bb, XEXP (SET_SRC (counter_max_last_set), 1), 
vctp_step);
+
+  if (GET_CODE (SET_SRC (counter_max_last_set)) == ASHIFTRT
+  && CONST_INT_P (XEXP (SET_SRC (counter_max_last_set), 1))
+  && ((1 << INTVAL (XEXP (SET_SRC (counter_max_last_set), 1)))
+  == abs (INTVAL (vctp_step

I'm a bit concerned here with using abs() for HOST_WIDE_INT values that are 
compared to other HOST_WIDE_INT values.
abs () will implicitly cast the argument and return an int. We should use the 
abs_hwi function defined in hwint.h. It may not cause problems in practice 
given the ranges involved, but better safe than sorry at this stage.

Looks decent to me otherwise, and an impressive piece of work, thanks.
I'd give Richard an opportunity to comment next week when he's back before 
committing though.
Thanks,
Kyrill


Re: [PATCH v1] Fix compare-debug bootstrap failure

2024-01-17 Thread Jakub Jelinek
On Wed, Jan 17, 2024 at 03:40:20PM +, Maxim Kuvyrkov wrote:
> ... caused by scheduler fix for PR96388 and PR111554.
> 
> This patch adjusts decision sched-deps.cc:find_inc() to use
> length of dependency lists sans any DEBUG_INSN instructions.
> 
> gcc/ChangeLog:
> 

Please mention
PR bootstrap/113445
here

>   * haifa-sched.cc (dep_list_size): Make global.
>   * sched-deps.cc (find_inc): Use instead of sd_lists_size().
>   * sched-int.h (dep_list_size): Declare.

and include some testcase from the PR into the testsuite.
Otherwise LGTM.

Jakub



Re: [PATCH] c++: address of NTTP object as targ [PR113242]

2024-01-17 Thread Jason Merrill

On 1/17/24 10:43, Patrick Palka wrote:

On Mon, 15 Jan 2024, Jason Merrill wrote:

On 1/5/24 11:50, Patrick Palka wrote:


invalid_tparm_referent_p was rejecting using the address of a class NTTP
object as a template argument, but this should be fine.


Hmm, I suppose so; https://eel.is/c++draft/temp#param-8 saying "No two
template parameter objects are template-argument-equivalent" suggests there
can be only one.  And clang/msvc allow it.


+   else if (VAR_P (decl) && !DECL_NTTP_OBJECT_P (decl)
+&& DECL_ARTIFICIAL (decl))


If now some artificial variables are OK and others are not, perhaps we should
enumerate them either way and abort if it's one we haven't specifically
considered.


Sounds good, like so?  Shall we backport this patch or the original
patch to the 13 branch?


Hmm, looks like this patch changes the non-checking default behavior 
from reject to accept; maybe just add a checking_assert (tinfo || fname) 
to your original patch?  OK with that change, for trunk and 13.



-- >8 --

Subject: [PATCH] c++: address of class NTTP object as targ [PR113242]

invalid_tparm_referent_p was rejecting using the address of a class NTTP
object as a template argument, but this should be fine.

This patch fixes this by refining the DECL_ARTIFICIAL rejection test to
check specifically for the kinds of artificial variables we want to
exclude.

PR c++/113242

gcc/cp/ChangeLog:

* pt.cc (invalid_tparm_referent_p) : Refine
DECL_ARTIFICIAL rejection test.  Assert that C++20 template
parameter objects are the only artificial variables we accept.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class61.C: New test.
---
  gcc/cp/pt.cc | 13 +++---
  gcc/testsuite/g++.dg/cpp2a/nontype-class61.C | 27 
  2 files changed, 37 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index b6117231de1..885c297450e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7212,12 +7212,14 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
/* C++17: For a non-type template-parameter of reference or pointer
   type, the value of the constant expression shall not refer to (or
   for a pointer type, shall not be the address of):
-  * a subobject (4.5),
+  * a subobject (4.5), (relaxed in C++20)
   * a temporary object (15.2),
-  * a string literal (5.13.5),
+  * a string literal (5.13.5), (we diagnose this early in
+convert_nontype_argument)
   * the result of a typeid expression (8.2.8), or
   * a predefined __func__ variable (11.4.1).  */
-   else if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
+   else if (VAR_P (decl)
+&& (DECL_TINFO_P (decl) || DECL_FNAME_P (decl)))
  {
if (complain & tf_error)
  error ("the address of %qD is not a valid template argument",
@@ -7242,6 +7244,11 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
 decl);
return true;
  }
+
+   /* The only artificial variables we do accept are C++20
+  template parameter objects.   */
+   if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
+ gcc_checking_assert (DECL_NTTP_OBJECT_P (decl));
}
break;
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C

new file mode 100644
index 000..90805a05ecf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
@@ -0,0 +1,27 @@
+// PR c++/113242
+// { dg-do compile { target c++20 } }
+
+struct wrapper {
+  int n;
+};
+
+template
+void f1() {
+  static_assert(X.n == 42);
+}
+
+template
+void f2() {
+  static_assert(X->n == 42);
+}
+
+template
+void g() {
+  f1();
+  f2<>();
+}
+
+int main() {
+  constexpr wrapper X = {42};
+  g();
+}




Re: [PATCH] c++/modules: Prevent overwriting arguments for duplicates [PR112588]

2024-01-17 Thread Jason Merrill

On 1/8/24 12:04, Patrick Palka wrote:

On Mon, 8 Jan 2024, Nathaniel Shead wrote:


On Sat, Jan 06, 2024 at 05:32:37PM -0500, Nathan Sidwell wrote:

I;m not sure about this, there was clearly a reason I did it the way it is,
but perhaps that reasoning became obsolete -- something about an existing
declaration and reading in a definition maybe?


So I took a bit of a closer look and this is actually a regression,
seeming to start with r13-3134-g09df0d8b14dda6. I haven't looked more
closely at the actual change though to see whether this implies a
different fix yet though.


Interesting..  FWIW I applied your patch to the gcc 12 release branch,
which doesn't have r13-3134, and there were no modules testsuite
regressions there either, which at least suggests that this maybe_dup
logic isn't directly related to the optimization that r13-3134 removed.

Your patch also seems to fix PR99244 (which AFAICT is not a regression)


It seems to me we always want the DECL_ARGUMENTS corresponding to the 
actual definition we're using, which since "installing" is true, is the 
new definition.  In duplicate_decls when we merge a new definition into 
an old declaration, we give the old declaration the new DECL_ARGUMENTS.


The patch is OK.


On 11/22/23 06:33, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.

-- >8 --

When merging duplicate instantiations of function templates, currently
read_function_def overwrites the arguments with that of the existing
duplicate. This is problematic, however, since this means that the
PARM_DECLs in the body of the function definition no longer match with
the PARM_DECLs in the argument list, which causes issues when it comes
to generating RTL.

There doesn't seem to be any reason to do this replacement, so this
patch removes that logic.

PR c++/112588

gcc/cp/ChangeLog:

* module.cc (trees_in::read_function_def): Don't overwrite
arguments.

gcc/testsuite/ChangeLog:

* g++.dg/modules/merge-16.h: New test.
* g++.dg/modules/merge-16_a.C: New test.
* g++.dg/modules/merge-16_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/module.cc  |  2 --
   gcc/testsuite/g++.dg/modules/merge-16.h   | 10 ++
   gcc/testsuite/g++.dg/modules/merge-16_a.C |  7 +++
   gcc/testsuite/g++.dg/modules/merge-16_b.C |  5 +
   4 files changed, 22 insertions(+), 2 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16.h
   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_a.C
   create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 4f5b6e2747a..2520ab659cc 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11665,8 +11665,6 @@ trees_in::read_function_def (tree decl, tree 
maybe_template)
 DECL_RESULT (decl) = result;
 DECL_INITIAL (decl) = initial;
 DECL_SAVED_TREE (decl) = saved;
-  if (maybe_dup)
-   DECL_ARGUMENTS (decl) = DECL_ARGUMENTS (maybe_dup);
 if (context)
SET_DECL_FRIEND_CONTEXT (decl, context);
diff --git a/gcc/testsuite/g++.dg/modules/merge-16.h 
b/gcc/testsuite/g++.dg/modules/merge-16.h
new file mode 100644
index 000..fdb38551103
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/merge-16.h
@@ -0,0 +1,10 @@
+// PR c++/112588
+
+void f(int*);
+
+template 
+struct S {
+  void g(int n) { f(); }
+};
+
+template struct S;


If we use a partial specialization here instead (which would have disabled
the removed optimization, demonstrating how fragile/inconsistent it was)

   void f(int*);

   template 
   struct S { };

   template
   struct S {
 void g(int n) { f(); }
   };

   template struct S;

then the ICE appears earlier, since GCC 12 instead of 13.


diff --git a/gcc/testsuite/g++.dg/modules/merge-16_a.C 
b/gcc/testsuite/g++.dg/modules/merge-16_a.C
new file mode 100644
index 000..c243224c875
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/merge-16_a.C
@@ -0,0 +1,7 @@
+// PR c++/112588
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi merge16 }
+
+module;
+#include "merge-16.h"
+export module merge16;
diff --git a/gcc/testsuite/g++.dg/modules/merge-16_b.C 
b/gcc/testsuite/g++.dg/modules/merge-16_b.C
new file mode 100644
index 000..8c7b1f0511f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/merge-16_b.C
@@ -0,0 +1,5 @@
+// PR c++/112588
+// { dg-additional-options "-fmodules-ts" }
+
+#include "merge-16.h"
+import merge16;


--
Nathan Sidwell










Re: Add -falign-all-functions

2024-01-17 Thread Jan Hubicka
> On Wed, 17 Jan 2024, Jan Hubicka wrote:
> 
> > > 
> > > I meant the new option might be named -fmin-function-alignment=
> > > rather than -falign-all-functions because of how it should
> > > override all other options.
> > 
> > I was also pondering about both names.  -falign-all-functions has the
> > advantage that it is similar to all the other alignment flags that are
> > all called -falign-XXX
> > 
> > but both options are finte for me.
> > > 
> > > Otherwise is there an updated patch to look at?
> > 
> > I will prepare one.  So shall I drop the max-skip support for alignment
> > and rename the flag?
> 
> Yes.
OK, here is updated version.
Bootstrapped/regtested on x86_64-linux, OK?

gcc/ChangeLog:

* common.opt (flimit-function-alignment): Reorder so file is
alphabetically ordered.
(flimit-function-alignment): New flag.
* doc/invoke.texi (-fmin-function-alignment): Document
(-falign-jumps,-falign-labels): Document that this is an optimization
bypassed in cold code.
* varasm.cc (assemble_start_function): Honor -fmin-function-alignment.

diff --git a/gcc/common.opt b/gcc/common.opt
index 5f0a101bccb..6e85853f086 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1040,9 +1040,6 @@ Align the start of functions.
 falign-functions=
 Common RejectNegative Joined Var(str_align_functions) Optimization
 
-flimit-function-alignment
-Common Var(flag_limit_function_alignment) Optimization Init(0)
-
 falign-jumps
 Common Var(flag_align_jumps) Optimization
 Align labels which are only reached by jumping.
@@ -2277,6 +2274,10 @@ fmessage-length=
 Common RejectNegative Joined UInteger
 -fmessage-length=  Limit diagnostics to  characters per 
line.  0 suppresses line-wrapping.
 
+fmin-function-alignment=
+Common Joined RejectNegative UInteger Var(flag_min_function_alignment) 
Optimization
+Align the start of every function.
+
 fmodulo-sched
 Common Var(flag_modulo_sched) Optimization
 Perform SMS based modulo scheduling before the first scheduling pass.
@@ -2601,6 +2602,9 @@ starts and when the destructor finishes.
 flifetime-dse=
 Common Joined RejectNegative UInteger Var(flag_lifetime_dse) Optimization 
IntegerRange(0, 2)
 
+flimit-function-alignment
+Common Var(flag_limit_function_alignment) Optimization Init(0)
+
 flive-patching
 Common RejectNegative Alias(flive-patching=,inline-clone) Optimization
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 43fd3c3a3cd..456374d9446 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -546,6 +546,7 @@ Objective-C and Objective-C++ Dialects}.
 -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
 -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
 -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
+-fmin-function-alignment=[@var{n}]
 -fno-allocation-dce -fallow-store-data-races
 -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}]
 -fauto-inc-dec  -fbranch-probabilities
@@ -14177,6 +14178,9 @@ Align the start of functions to the next power-of-two 
greater than or
 equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
 least the first @var{m} bytes of the function can be fetched by the CPU
 without crossing an @var{n}-byte alignment boundary.
+This is an optimization of code performance and alignment is ignored for
+functions considered cold.  If alignment is required for all functions,
+use @option{-fmin-function-alignment}.
 
 If @var{m} is not specified, it defaults to @var{n}.
 
@@ -14240,6 +14244,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
 Align loops to a power-of-two boundary.  If the loops are executed
 many times, this makes up for any execution of the dummy padding
 instructions.
+This is an optimization of code performance and alignment is ignored for
+loops considered cold.
 
 If @option{-falign-labels} is greater than this value, then its value
 is used instead.
@@ -14262,6 +14268,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
 Align branch targets to a power-of-two boundary, for branch targets
 where the targets can only be reached by jumping.  In this case,
 no dummy operations need be executed.
+This is an optimization of code performance and alignment is ignored for
+jumps considered cold.
 
 If @option{-falign-labels} is greater than this value, then its value
 is used instead.
@@ -14275,6 +14283,14 @@ The maximum allowed @var{n} option value is 65536.
 
 Enabled at levels @option{-O2}, @option{-O3}.
 
+@opindex fmin-function-alignment=@var{n}
+@item -fmin-function-alignment
+Specify minimal alignment of functions to the next power-of-two greater than or
+equal to @var{n}. Unlike @option{-falign-functions} this alignment is applied
+also to all functions (even those considered cold).  The alignment is also not
+affected by @option{-flimit-function-alignment}
+
+
 @opindex fno-allocation-dce
 @item -fno-allocation-dce
 Do not remove unused C++ allocations in dead code elimination.
@@ -14371,7 +14387,7 @@ To use the 

Re: [PATCH v3] c++/modules: Fix handling of extern templates in modules [PR112820]

2024-01-17 Thread Jason Merrill

On 1/17/24 01:33, Nathaniel Shead wrote:

On Mon, Jan 15, 2024 at 06:10:55PM -0500, Jason Merrill wrote:

Under what circumstances does it make sense for CLASSTYPE_INTERFACE_ONLY to
be set in the context of modules, anyway?  We probably want to propagate it
for things in the global module so that various libstdc++ explicit
instantiations work the same with import std.

For an class imported from a named module, this ties into the earlier
discussion about vtables and inlines that hasn't resolved yet in the ABI
committee.  But it's certainly significantly interface-like.  And I would
expect maybe_suppress_debug_info to suppress the debug info for such a class
on the assumption that the module unit has the needed debug info.

Jason



Here's another approach for this patch. This still only fixes the
specific issues in the PR, I think vtable handling etc. should wait till
stage 1 because it involves a lot of messing around in decl2.cc.

As mentioned in the commit message, after thinking more about it I don't
think we (in general) want to propagate CLASSTYPE_INTERFACE_ONLY, even
for declarations in the GMF. This makes sense to me because typically it
can only be accurately determined at the end of the TU, which we haven't
yet arrived at after importing. For instance, for a polymorphic class in
the GMF without a key method, that we import from a module and then
proceed to define the key method later on in this TU.


That sounds right for a module implementation unit or the GMF.


Bootstrapped and partially regtested on x86_64-pc-linux-gnu (so far only
modules.exp): OK for trunk if full regtesting passes?


Please add a reference to ABI issue 170 
(https://github.com/itanium-cxx-abi/cxx-abi/issues/170).  OK with that 
change if Nathan doesn't have any further comments this week.



-- >8 --

Currently, extern templates are detected by looking for the
DECL_EXTERNAL flag on a TYPE_DECL. However, this is incorrect:
TYPE_DECLs don't actually set this flag, and it happens to work by
coincidence due to TYPE_DECL_SUPPRESS_DEBUG happening to use the same
underlying bit. This however causes issues with other TYPE_DECLs that
also happen to have suppressed debug information.

Instead, this patch reworks the logic so CLASSTYPE_INTERFACE_ONLY is
always emitted into the module BMI and can then be used to check for an
extern template correctly.

Otherwise, for other declarations we want to redetermine this: even for
declarations from the GMF, we may change our mind on whether to import
or export depending on decisions made later in the TU after importing so
we shouldn't decide this now, or necessarily reuse what the module we'd
imported had decided.

PR c++/112820
PR c++/102607

gcc/cp/ChangeLog:

* module.cc (trees_out::lang_type_bools): Write interface_only
and interface_unknown.
(trees_in::lang_type_bools): Read the above flags.
(trees_in::decl_value): Reset CLASSTYPE_INTERFACE_* except for
extern templates.
(trees_in::read_class_def): Remove buggy extern template
handling.

gcc/testsuite/ChangeLog:

* g++.dg/modules/debug-2_a.C: New test.
* g++.dg/modules/debug-2_b.C: New test.
* g++.dg/modules/debug-2_c.C: New test.
* g++.dg/modules/debug-3_a.C: New test.
* g++.dg/modules/debug-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc | 36 +---
  gcc/testsuite/g++.dg/modules/debug-2_a.C |  9 ++
  gcc/testsuite/g++.dg/modules/debug-2_b.C |  8 ++
  gcc/testsuite/g++.dg/modules/debug-2_c.C |  9 ++
  gcc/testsuite/g++.dg/modules/debug-3_a.C |  8 ++
  gcc/testsuite/g++.dg/modules/debug-3_b.C |  9 ++
  6 files changed, 63 insertions(+), 16 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_b.C
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_c.C
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 350ad15dc62..efc1d532a6e 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5806,10 +5806,8 @@ trees_out::lang_type_bools (tree t)
  
WB ((lang->gets_delete >> 0) & 1);

WB ((lang->gets_delete >> 1) & 1);
-  // Interfaceness is recalculated upon reading.  May have to revisit?
-  // How do dllexport and dllimport interact across a module?
-  // lang->interface_only
-  // lang->interface_unknown
+  WB (lang->interface_only);
+  WB (lang->interface_unknown);
WB (lang->contains_empty_class_p);
WB (lang->anon_aggr);
WB (lang->non_zero_init);
@@ -5877,9 +5875,8 @@ trees_in::lang_type_bools (tree t)
v = b () << 0;
v |= b () << 1;
lang->gets_delete = v;
-  // lang->interface_only
-  // lang->interface_unknown
-  lang->interface_unknown = true; // Redetermine interface
+  RB (lang->interface_only);
+  RB 

Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread Maxim Kuvyrkov
> On Jan 17, 2024, at 19:05, Maxim Kuvyrkov  wrote:
> 
>> On Jan 17, 2024, at 19:02, Richard Biener  wrote:
>> 
>> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>>  wrote:
>>> 
 On Jan 17, 2024, at 10:51, Richard Biener  
 wrote:
 
 On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
> 
> 
> 
> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>> Hi Vladimir,
>> Hi Jeff,
>> 
>> Richard and Alexander have reviewed this patch and [I assume] have no
>> further comments.  OK to merge?
> I think the question is whether or not we're too late.  I know that
> Richard S has held off on his late-combine pass and I'm holding off on
> the ext-dce work due to the fact that we're well past stage1 close.
> 
> I think the release managers ought to have the final say on this.
 
 I'm fine with this now, it doesn't change code generation.
>>> 
>>> Thanks, Richard.
>>> 
>>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
>>> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
>>> stage1 opens.
>> 
>> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>> 
>> gcc/fortran/f95-lang.o differs
>> 
>> does n_mem_deps or n_inc_deps include debug insns?
> 
> Thanks, investigating.

Hi Richard,

Yes, both n_mem_deps or n_inc_deps include debug insns, I posted a patch for 
this in https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643267.html .  
Testing it now.

If you prefer, I can revert the fix for PR96388 and PR111554.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org




Re: [PATCH] c++: address of NTTP object as targ [PR113242]

2024-01-17 Thread Patrick Palka
On Mon, 15 Jan 2024, Jason Merrill wrote:

> On 1/5/24 11:50, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > for trunk and perhaps 13?
> > 
> > -- >8 --
> > 
> > invalid_tparm_referent_p was rejecting using the address of a class NTTP
> > object as a template argument, but this should be fine.
> 
> Hmm, I suppose so; https://eel.is/c++draft/temp#param-8 saying "No two
> template parameter objects are template-argument-equivalent" suggests there
> can be only one.  And clang/msvc allow it.
> 
> > PR c++/113242
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (invalid_tparm_referent_p) : Suppress
> > DECL_ARTIFICIAL rejection test for class NTTP objects.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/nontype-class61.C: New test.
> > ---
> >   gcc/cp/pt.cc |  3 ++-
> >   gcc/testsuite/g++.dg/cpp2a/nontype-class61.C | 27 
> >   2 files changed, 29 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 154ac76cb65..8c7d178328d 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -7219,7 +7219,8 @@ invalid_tparm_referent_p (tree type, tree expr,
> > tsubst_flags_t complain)
> >* a string literal (5.13.5),
> >* the result of a typeid expression (8.2.8), or
> >* a predefined __func__ variable (11.4.1).  */
> > -   else if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
> > +   else if (VAR_P (decl) && !DECL_NTTP_OBJECT_P (decl)
> > +&& DECL_ARTIFICIAL (decl))
> 
> If now some artificial variables are OK and others are not, perhaps we should
> enumerate them either way and abort if it's one we haven't specifically
> considered.

Sounds good, like so?  Shall we backport this patch or the original
patch to the 13 branch?

-- >8 --

Subject: [PATCH] c++: address of class NTTP object as targ [PR113242]

invalid_tparm_referent_p was rejecting using the address of a class NTTP
object as a template argument, but this should be fine.

This patch fixes this by refining the DECL_ARTIFICIAL rejection test to
check specifically for the kinds of artificial variables we want to
exclude.

PR c++/113242

gcc/cp/ChangeLog:

* pt.cc (invalid_tparm_referent_p) : Refine
DECL_ARTIFICIAL rejection test.  Assert that C++20 template
parameter objects are the only artificial variables we accept.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class61.C: New test.
---
 gcc/cp/pt.cc | 13 +++---
 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C | 27 
 2 files changed, 37 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class61.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index b6117231de1..885c297450e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7212,12 +7212,14 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
/* C++17: For a non-type template-parameter of reference or pointer
   type, the value of the constant expression shall not refer to (or
   for a pointer type, shall not be the address of):
-  * a subobject (4.5),
+  * a subobject (4.5), (relaxed in C++20)
   * a temporary object (15.2),
-  * a string literal (5.13.5),
+  * a string literal (5.13.5), (we diagnose this early in
+convert_nontype_argument)
   * the result of a typeid expression (8.2.8), or
   * a predefined __func__ variable (11.4.1).  */
-   else if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
+   else if (VAR_P (decl)
+&& (DECL_TINFO_P (decl) || DECL_FNAME_P (decl)))
  {
if (complain & tf_error)
  error ("the address of %qD is not a valid template argument",
@@ -7242,6 +7244,11 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
 decl);
return true;
  }
+
+   /* The only artificial variables we do accept are C++20
+  template parameter objects.   */
+   if (VAR_P (decl) && DECL_ARTIFICIAL (decl))
+ gcc_checking_assert (DECL_NTTP_OBJECT_P (decl));
   }
   break;
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
new file mode 100644
index 000..90805a05ecf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class61.C
@@ -0,0 +1,27 @@
+// PR c++/113242
+// { dg-do compile { target c++20 } }
+
+struct wrapper {
+  int n;
+};
+
+template
+void f1() {
+  static_assert(X.n == 42);
+}
+
+template
+void f2() {
+  static_assert(X->n == 42);
+}
+
+template
+void g() {
+  f1();
+  f2<>();
+}
+
+int main() {
+  constexpr wrapper X = {42};
+  g();
+}
-- 
2.43.0.367.g186b115d30



[PATCH v1] Fix compare-debug bootstrap failure

2024-01-17 Thread Maxim Kuvyrkov
... caused by scheduler fix for PR96388 and PR111554.

This patch adjusts decision sched-deps.cc:find_inc() to use
length of dependency lists sans any DEBUG_INSN instructions.

gcc/ChangeLog:

* haifa-sched.cc (dep_list_size): Make global.
* sched-deps.cc (find_inc): Use instead of sd_lists_size().
* sched-int.h (dep_list_size): Declare.
---
 gcc/haifa-sched.cc | 8 ++--
 gcc/sched-deps.cc  | 6 +++---
 gcc/sched-int.h| 2 ++
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 49ee589aed7..1bc610f9a5f 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -1560,8 +1560,7 @@ contributes_to_priority_p (dep_t dep)
 }
 
 /* Compute the number of nondebug deps in list LIST for INSN.  */
-
-static int
+int
 dep_list_size (rtx_insn *insn, sd_list_types_def list)
 {
   sd_iterator_def sd_it;
@@ -1571,6 +1570,11 @@ dep_list_size (rtx_insn *insn, sd_list_types_def list)
   if (!MAY_HAVE_DEBUG_INSNS)
 return sd_lists_size (insn, list);
 
+  /* TODO: We should split normal and debug insns into separate SD_LIST_*
+ sub-lists, and then we'll be able to use something like
+ sd_lists_size(insn, list & SD_LIST_NON_DEBUG)
+ instead of walking dependencies below.  */
+
   FOR_EACH_DEP (insn, list, sd_it, dep)
 {
   if (DEBUG_INSN_P (DEP_CON (dep)))
diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 0615007c560..5034e664e5e 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4791,7 +4791,7 @@ find_inc (struct mem_inc_info *mii, bool backwards)
   sd_iterator_def sd_it;
   dep_t dep;
   sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
-  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
+  int n_mem_deps = dep_list_size (mii->mem_insn, mem_deps);
 
   sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (_it, ))
@@ -4808,12 +4808,12 @@ find_inc (struct mem_inc_info *mii, bool backwards)
   if (backwards)
{
  inc_cand = pro;
- n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+ n_inc_deps = dep_list_size (inc_cand, SD_LIST_BACK);
}
   else
{
  inc_cand = con;
- n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+ n_inc_deps = dep_list_size (inc_cand, SD_LIST_FORW);
}
 
   /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index ab784fe0d17..4df092013e9 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1677,6 +1677,8 @@ extern void sd_copy_back_deps (rtx_insn *, rtx_insn *, 
bool);
 extern void sd_delete_dep (sd_iterator_def);
 extern void sd_debug_lists (rtx, sd_list_types_def);
 
+extern int dep_list_size (rtx_insn *, sd_list_types_def);
+
 /* Macros and declarations for scheduling fusion.  */
 #define FUSION_MAX_PRIORITY (INT_MAX)
 extern bool sched_fusion;
-- 
2.34.1



Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread Maxim Kuvyrkov
> On Jan 17, 2024, at 19:02, Richard Biener  wrote:
> 
> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>  wrote:
>> 
>>> On Jan 17, 2024, at 10:51, Richard Biener  
>>> wrote:
>>> 
>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
 
 
 
 On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> Hi Vladimir,
> Hi Jeff,
> 
> Richard and Alexander have reviewed this patch and [I assume] have no
> further comments.  OK to merge?
 I think the question is whether or not we're too late.  I know that
 Richard S has held off on his late-combine pass and I'm holding off on
 the ext-dce work due to the fact that we're well past stage1 close.
 
 I think the release managers ought to have the final say on this.
>>> 
>>> I'm fine with this now, it doesn't change code generation.
>> 
>> Thanks, Richard.
>> 
>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
>> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
>> stage1 opens.
> 
> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
> 
> gcc/fortran/f95-lang.o differs
> 
> does n_mem_deps or n_inc_deps include debug insns?

Thanks, investigating.

--
Maxim Kuvyrkov
https://www.linaro.org



Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2024-01-17 Thread Richard Biener
On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
 wrote:
>
> > On Jan 17, 2024, at 10:51, Richard Biener  
> > wrote:
> >
> > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> >>> Hi Vladimir,
> >>> Hi Jeff,
> >>>
> >>> Richard and Alexander have reviewed this patch and [I assume] have no
> >>> further comments.  OK to merge?
> >> I think the question is whether or not we're too late.  I know that
> >> Richard S has held off on his late-combine pass and I'm holding off on
> >> the ext-dce work due to the fact that we're well past stage1 close.
> >>
> >> I think the release managers ought to have the final say on this.
> >
> > I'm fine with this now, it doesn't change code generation.
>
> Thanks, Richard.
>
> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit 
> cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when 
> stage1 opens.

This seems to have caused a compare-debug bootstrap issue on x86_64-linux,

gcc/fortran/f95-lang.o differs

does n_mem_deps or n_inc_deps include debug insns?

Richard.

> Regards,
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>


[PATCH] aarch64: Check the ldp/stp policy model correctly when mem ops are reversed.

2024-01-17 Thread Manos Anagnostakis
The current ldp/stp policy framework implementation was missing cases, where
the memory operands were reversed. Therefore the call to the framework function
is moved after the lower mem check with the suitable parameters. Also removes
the mode of aarch64_operands_ok_for_ldpstp, which becomes unused and triggers
a warning on bootstrap.

gcc/ChangeLog:

* config/aarch64/aarch64-ldpstp.md: Remove unused mode.
* config/aarch64/aarch64-protos.h (aarch64_operands_ok_for_ldpstp):
Likewise.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Call on framework moved later.

Signed-off-by: Manos Anagnostakis 
Co-Authored-By: Manolis Tsamis 
---
 gcc/config/aarch64/aarch64-ldpstp.md | 22 +++---
 gcc/config/aarch64/aarch64-protos.h  |  2 +-
 gcc/config/aarch64/aarch64.cc| 18 +-
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldpstp.md 
b/gcc/config/aarch64/aarch64-ldpstp.md
index b668fa8e2a6..b7c0bf05cd1 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -23,7 +23,7 @@
(match_operand:GPI 1 "memory_operand" ""))
(set (match_operand:GPI 2 "register_operand" "")
(match_operand:GPI 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true);
@@ -35,7 +35,7 @@
(match_operand:GPI 1 "aarch64_reg_or_zero" ""))
(set (match_operand:GPI 2 "memory_operand" "")
(match_operand:GPI 3 "aarch64_reg_or_zero" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, false)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, false);
@@ -47,7 +47,7 @@
(match_operand:GPF 1 "memory_operand" ""))
(set (match_operand:GPF 2 "register_operand" "")
(match_operand:GPF 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true);
@@ -59,7 +59,7 @@
(match_operand:GPF 1 "aarch64_reg_or_fp_zero" ""))
(set (match_operand:GPF 2 "memory_operand" "")
(match_operand:GPF 3 "aarch64_reg_or_fp_zero" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, false)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, false);
@@ -71,7 +71,7 @@
(match_operand:DREG 1 "memory_operand" ""))
(set (match_operand:DREG2 2 "register_operand" "")
(match_operand:DREG2 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true);
@@ -83,7 +83,7 @@
(match_operand:DREG 1 "register_operand" ""))
(set (match_operand:DREG2 2 "memory_operand" "")
(match_operand:DREG2 3 "register_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
+  "aarch64_operands_ok_for_ldpstp (operands, false)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, false);
@@ -96,7 +96,7 @@
(set (match_operand:VQ2 2 "register_operand" "")
(match_operand:VQ2 3 "memory_operand" ""))]
   "TARGET_FLOAT
-   && aarch64_operands_ok_for_ldpstp (operands, true, mode)
+   && aarch64_operands_ok_for_ldpstp (operands, true)
&& (aarch64_tune_params.extra_tuning_flags
& AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
   [(const_int 0)]
@@ -111,7 +111,7 @@
(set (match_operand:VQ2 2 "memory_operand" "")
(match_operand:VQ2 3 "register_operand" ""))]
   "TARGET_FLOAT
-   && aarch64_operands_ok_for_ldpstp (operands, false, mode)
+   && aarch64_operands_ok_for_ldpstp (operands, false)
&& (aarch64_tune_params.extra_tuning_flags
& AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
   [(const_int 0)]
@@ -128,7 +128,7 @@
(sign_extend:DI (match_operand:SI 1 "memory_operand" "")))
(set (match_operand:DI 2 "register_operand" "")
(sign_extend:DI (match_operand:SI 3 "memory_operand" "")))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, SImode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true, SIGN_EXTEND);
@@ -140,7 +140,7 @@
(zero_extend:DI (match_operand:SI 1 "memory_operand" "")))
(set (match_operand:DI 2 "register_operand" "")
(zero_extend:DI (match_operand:SI 3 "memory_operand" "")))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, SImode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(const_int 0)]
 {
   aarch64_finish_ldpstp_peephole (operands, true, ZERO_EXTEND);
@@ -162,7 +162,7 @@
(match_operand:DSX 1 "aarch64_reg_zero_or_fp_zero" ""))

Re: [PATCH] sra: Partial fix for BITINT_TYPEs [PR113120]

2024-01-17 Thread Jakub Jelinek
On Wed, Jan 17, 2024 at 03:46:44PM +0100, Martin Jambor wrote:
> > Note, it would be good if we were able to punt on the optimization
> > (but this code doesn't seem to be able to punt, so it needs to be done
> > somewhere earlier) at least in cases where building it would be invalid.
> > E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive),
> > but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION).
> > I've tried to replace 513 with 65532 in the testcase and it didn't ICE,
> > so maybe it ran into some other SRA limit.
> 
> Thank you very much for the patch.  Regarding punting, did you mean for
> all BITINT_TYPEs or just for big ones, like you did when you fixed PR
> 11333 (thanks for that too) or something entirely else?

I meant what I did in PR113330, but still wonder if we really need to use
a root->size which is multiple of BITS_PER_UNIT (or words or whatever it
actually is), at least on little endian if the _BitInt starts at the start
of a memory.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113408#c1
for more details, wonder if it just couldn't use _BitInt(713) in there
directly rather than _BitInt(768).

Jakub



Re: [PATCH 1/4] rtl-ssa: Run finalize_new_accesses forwards [PR113070]

2024-01-17 Thread Alex Coplan
On 17/01/2024 07:42, Jeff Law wrote:
> 
> 
> On 1/13/24 08:43, Alex Coplan wrote:
> > The next patch in this series exposes an interface for creating new uses
> > in RTL-SSA.  The intent is that new user-created uses can consume new
> > user-created defs in the same change group.  This is so that we can
> > correctly update uses of memory when inserting a new store pair insn in
> > the aarch64 load/store pair fusion pass (the affected uses need to
> > consume the new store pair insn).
> > 
> > As it stands, finalize_new_accesses is called as part of the backwards
> > insn placement loop within change_insns, but if we want new uses to be
> > able to depend on new defs in the same change group, we need
> > finalize_new_accesses to be called on earlier insns first.  This is so
> > that when we process temporary uses and turn them into permanent uses,
> > we can follow the last_def link on the temporary def to ensure we end up
> > with a permanent use consuming a permanent def.
> > 
> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> > 
> > Thanks,
> > Alex
> > 
> > gcc/ChangeLog:
> > 
> > PR target/113070
> > * rtl-ssa/changes.cc (function_info::change_insns): Split out the call
> > to finalize_new_accesses from the backwards placement loop, run it
> > forwards in a separate loop.
> So just to be explicit -- given this is adjusting the rtl-ssa
> infrastructure, I was going to let Richard S. own the review side -- he
> knows that code better than I.

Yeah, that's fine, thanks.  Richard is away this week but back on Monday, so
hopefully he can take a look at it then.

Alex

> 
> Jeff


Re: [PATCH] sra: Partial fix for BITINT_TYPEs [PR113120]

2024-01-17 Thread Martin Jambor
Hi,
On Wed, Jan 10 2024, Jakub Jelinek wrote:
> Hi!
>
> As changed in other parts of the compiler, using
> build_nonstandard_integer_type is not appropriate for arbitrary precisions,
> especially if the precision comes from a BITINT_TYPE or something based on
> that, build_nonstandard_integer_type relies on some integral mode being
> supported that can support the precision.
>
> The following patch uses build_bitint_type instead for BITINT_TYPE
> precisions.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Note, it would be good if we were able to punt on the optimization
> (but this code doesn't seem to be able to punt, so it needs to be done
> somewhere earlier) at least in cases where building it would be invalid.
> E.g. right now BITINT_TYPE can support precisions up to 65535 (inclusive),
> but 65536 will not work anymore (we can't have > 16-bit TYPE_PRECISION).
> I've tried to replace 513 with 65532 in the testcase and it didn't ICE,
> so maybe it ran into some other SRA limit.

Thank you very much for the patch.  Regarding punting, did you mean for
all BITINT_TYPEs or just for big ones, like you did when you fixed PR
11333 (thanks for that too) or something entirely else?

Martin

>
> 2024-01-10  Jakub Jelinek  
>
>   PR tree-optimization/113120
>   * tree-sra.cc (analyze_access_subtree): For BITINT_TYPE
>   with root->size TYPE_PRECISION don't build anything new.
>   Otherwise, if root->type is a BITINT_TYPE, use build_bitint_type
>   rather than build_nonstandard_integer_type.
>
>   * gcc.dg/bitint-63.c: New test.


Re: [PATCH 1/4] rtl-ssa: Run finalize_new_accesses forwards [PR113070]

2024-01-17 Thread Jeff Law




On 1/13/24 08:43, Alex Coplan wrote:

The next patch in this series exposes an interface for creating new uses
in RTL-SSA.  The intent is that new user-created uses can consume new
user-created defs in the same change group.  This is so that we can
correctly update uses of memory when inserting a new store pair insn in
the aarch64 load/store pair fusion pass (the affected uses need to
consume the new store pair insn).

As it stands, finalize_new_accesses is called as part of the backwards
insn placement loop within change_insns, but if we want new uses to be
able to depend on new defs in the same change group, we need
finalize_new_accesses to be called on earlier insns first.  This is so
that when we process temporary uses and turn them into permanent uses,
we can follow the last_def link on the temporary def to ensure we end up
with a permanent use consuming a permanent def.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/113070
* rtl-ssa/changes.cc (function_info::change_insns): Split out the call
to finalize_new_accesses from the backwards placement loop, run it
forwards in a separate loop.
So just to be explicit -- given this is adjusting the rtl-ssa 
infrastructure, I was going to let Richard S. own the review side -- he 
knows that code better than I.


Jeff


Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Jeff Law




On 1/17/24 05:14, Richard Biener wrote:

On Wed, 17 Jan 2024, Monk Chiang wrote:


This allows the backend to generate movcc instructions, if target
machine has movcc pattern.

branchless-cond.c needs to be updated since some target machines have
conditional move instructions, and the experssion will not change to
branchless expression.


While I agree this pattern should possibly be applied during RTL
expansion or instruction selection on x86 which also has movcc
the multiplication is cheaper.  So I don't think this isn't the way to go.

I'd rather revert the change than trying to "fix" it this way?
WRT reverting -- the patch in question's sole purpose was to enable 
branchless sequences for that very same code.  Reverting would regress 
performance on a variety of micro-architectures.  IIUC, the issue is 
that the SiFive part in question has a fusion which allows it to do the 
branchy sequence cheaply.


ISTM this really needs to be addressed during expansion and most likely 
with a RISC-V target twiddle for the micro-archs which have 
short-forward-branch optimizations.


jeff


[PATCH] aarch64: Fix eh_return for -mtrack-speculation [PR112987]

2024-01-17 Thread Szabolcs Nagy
Recent commit introduced a conditional branch in eh_return epilogues
that is not compatible with speculation tracking:

  commit 426fddcbdad6746fe70e031f707fb07f55dfb405
  Author: Szabolcs Nagy 
  CommitDate: 2023-11-27 15:52:48 +

  aarch64: Use br instead of ret for eh_return

gcc/ChangeLog:

PR target/112987
* config/aarch64/aarch64.cc (aarch64_expand_epilogue): Use
explicit compare and separate jump with speculation tracking.
---
 gcc/config/aarch64/aarch64.cc | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e6bd3fd0bb4..e6de62dc02a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -9879,7 +9879,17 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall)
 is just as correct as retaining the CFA from the body
 of the function.  Therefore, do nothing special.  */
   rtx label = gen_label_rtx ();
-  rtx x = gen_rtx_EQ (VOIDmode, EH_RETURN_TAKEN_RTX, const0_rtx);
+  rtx x;
+  if (aarch64_track_speculation)
+   {
+ /* Emit an explicit compare, so cc can be tracked.  */
+ rtx cc_reg = aarch64_gen_compare_reg (EQ,
+   EH_RETURN_TAKEN_RTX,
+   const0_rtx);
+ x = gen_rtx_EQ (GET_MODE (cc_reg), cc_reg, const0_rtx);
+   }
+  else
+   x = gen_rtx_EQ (VOIDmode, EH_RETURN_TAKEN_RTX, const0_rtx);
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   rtx jump = emit_jump_insn (gen_rtx_SET (pc_rtx, x));
-- 
2.25.1



Re: [PATCH V2] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Robin Dapp
OK.

Regards
 Robin



[PATCH V2] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Juzhe-Zhong
This patch fixes SPEC2017 cam4 mismatch issue due to we miss has compatible 
check
for conflict vsetvl fusion.

Buggy assembler before this patch:

.L69:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
j   .L37
.L68:
vsetvli a5,s1,e8,mf4,ta,ma  -> buggy vsetvl
vsetivlizero,8,e8,mf2,ta,ma
addia3,a5,8
vmv.v.i v1,0
vse8.v  v1,0(a5)
vse8.v  v1,0(a3)
addia4,a4,-16
li  a3,8
bltua4,a3,.L37
j   .L69
.L67:
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
vse8.v  v1,0(a3)
addia4,a4,-32
li  a3,16
bltua4,a3,.L36
j   .L68

After this patch:

.L63:
ble s1,zero,.L49
sllia4,s1,3
li  a3,32
addia5,sp,48
bltua4,a3,.L62
vsetivlizero,8,e8,mf2,ta,ma
vmv.v.i v1,0
vse8.v  v1,0(a5)
addia5,sp,56
vse8.v  v1,0(a5)
addis4,sp,64
addia3,sp,72
vse8.v  v1,0(s4)
addia4,a4,-32
addia5,sp,80
vse8.v  v1,0(a3)
.L35:
li  a3,16
bltua4,a3,.L36
addia3,a5,8
vmv.v.i v1,0
addia4,a4,-16
vse8.v  v1,0(a5)
addia5,a5,16
vse8.v  v1,0(a3)
.L36:
li  a3,8
bltua4,a3,.L37
vmv.v.i v1,0
vse8.v  v1,0(a5)

Tested on both RV32/RV64 no regression, Ok for trunk ?

PR target/113429

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): 
Fix conflict vsetvl fusion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-4.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-5.c: Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 39 +++
 .../riscv/rvv/vsetvl/vlmax_conflict-4.c   |  5 +--
 .../riscv/rvv/vsetvl/vlmax_conflict-5.c   | 10 ++---
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index df7ed149388..76e3d2eb471 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2254,6 +2254,22 @@ private:
 return true;
   }
 
+  bool has_compatible_reaching_vsetvl_p (vsetvl_info info)
+  {
+unsigned int index;
+sbitmap_iterator sbi;
+EXECUTE_IF_SET_IN_BITMAP (m_vsetvl_def_in[info.get_bb ()->index ()], 0,
+ index, sbi)
+  {
+   const auto prev_info = *m_vsetvl_def_exprs[index];
+   if (!prev_info.valid_p ())
+ continue;
+   if (m_dem.compatible_p (prev_info, info))
+ return true;
+  }
+return false;
+  }
+
   bool preds_all_same_avl_and_ratio_p (const vsetvl_info _info)
   {
 gcc_assert (
@@ -3075,22 +3091,8 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
{
  vsetvl_info new_curr_info = curr_info;
  new_curr_info.set_bb (crtl->ssa->bb (eg->dest));
- bool has_compatible_p = false;
- unsigned int def_expr_index;
- sbitmap_iterator sbi2;
- EXECUTE_IF_SET_IN_BITMAP (
-   m_vsetvl_def_in[new_curr_info.get_bb ()->index ()], 0,
-   def_expr_index, sbi2)
-   {
- vsetvl_info _info = *m_vsetvl_def_exprs[def_expr_index];
- if (!prev_info.valid_p ())
-   continue;
- if (m_dem.compatible_p (prev_info, new_curr_info))
-   {
- has_compatible_p = true;
- break;
-   }
-   }
+ bool has_compatible_p
+   = has_compatible_reaching_vsetvl_p (new_curr_info);
  if (!has_compatible_p)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3146,7 +3148,10 @@ pre_vsetvl::earliest_fuse_vsetvl_info ()
   && !m_dem.compatible_p (prev_info, curr_info))
{
  /* Cancel lift up if probabilities are equal.  */
- if (successors_probability_equal_p (eg->src))
+ if (successors_probability_equal_p (eg->src)
+ || (dest_block_info.probability
+   > src_block_info.probability
+ && !has_compatible_reaching_vsetvl_p (curr_info)))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
{
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-4.c 

Re: Add -falign-all-functions

2024-01-17 Thread Richard Biener
On Wed, 17 Jan 2024, Jan Hubicka wrote:

> > 
> > I meant the new option might be named -fmin-function-alignment=
> > rather than -falign-all-functions because of how it should
> > override all other options.
> 
> I was also pondering about both names.  -falign-all-functions has the
> advantage that it is similar to all the other alignment flags that are
> all called -falign-XXX
> 
> but both options are finte for me.
> > 
> > Otherwise is there an updated patch to look at?
> 
> I will prepare one.  So shall I drop the max-skip support for alignment
> and rename the flag?

Yes.

> Honza
> > 
> > Richard.
> > 
> > > > -flimit-function-alignment should not have an effect on it
> > > > and even very small functions should be aligned.
> > > 
> > > I write that it is not affected by limit-function-alignment
> > > @opindex falign-all-functions=@var{n}
> > > @item -falign-all-functions
> > > Specify minimal alignment for function entry. Unlike 
> > > @option{-falign-functions}
> > > this alignment is applied also to all functions (even those considered 
> > > cold).
> > > The alignment is also not affected by @option{-flimit-function-alignment}
> > > 
> > > Because indeed that would break the atomicity of updates.
> > 
> > 
> > 
> > > Honza
> > > > 
> > > > Richard.
> > > > 
> > > > > +}
> > > > > +
> > > > >/* Handle a user-specified function alignment.
> > > > >   Note that we still need to align to DECL_ALIGN, as above,
> > > > >   because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at 
> > > > > all.  */
> > > > > 
> > > > 
> > > > -- 
> > > > Richard Biener 
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG 
> > > > Nuernberg)
> > > 
> > 
> > -- 
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-17 Thread Michael Matz
Hello,

On Wed, 17 Jan 2024, Ajit Agarwal wrote:

> > first is even, since OOmode is only ok for even vsx register and its
> > size makes it take two consecutive vsx registers.
> > 
> > Hi Peter, is my understanding correct?
> > 
> 
> I tried all the combination in the past RA is not allocating sequential 
> register. I dont see any such code in RA that generates sequential 
> registers.

See HARD_REGNO_NREGS.  If you form a pseudo of a mode that's larger than a 
native-sized hardreg (and the target is correctly set up) then the RA will 
allocate the correct number of hardregs (consecutively) for this pseudo.  
This is what Kewen was referring to by mentioning the OOmode for the new 
hypothetical pseudo.  The individual parts of such pseudo will then need 
to use subreg to access them.

So, when you work before RA you simply will transform this (I'm going to 
use SImode and DImode for demonstration):

   (set (reg:SI x) (mem:SI (addr)))
   (set (reg:SI y) (mem:SI (addr+4)))
   ...
   ( ...use1... (reg:SI x))
   ( ...use2... (reg:SI y))

into this:

   (set (reg:DI z) (mem:DI (addr)))
   ...
   ( ...use1... (subreg:SI (reg:DI z) 0))
   ( ...use2... (subreg:SI (reg:DI z) 4))

For this to work the target needs to accept the (subreg...) in certain 
operands of instruction patterns, which I assume was what Kewen also 
referred to.  The register allocator will then assign hardregs X and X+1 
to the pseudo-reg 'z'.  (Assuming that DImode is okay for hardreg X, and 
HARD_REGNO_NREGS says that it needs two hardregs to hold DImode).

It will also replace the subregs by their appropriate concrete hardreg.

It seems your problems stem from trying to place your new pass somewhere 
within the register-allocation pipeline, rather than simply completely 
before.


Ciao,
Michael.


Re: [PATCH] RISC-V: Add has compatible check for conflict vsetvl fusion

2024-01-17 Thread Robin Dapp
Hi Juzhe,

the change itself is OK but I don't think we should add binary
files like this.  Even if not ideal, if you want to go forward
IMHO let's skip the test for now and add it at a (not much) later
time.

> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/fortran/spec2017_cam4/ppgrid.mod 
> b/gcc/testsuite/gcc.target/riscv/rvv/fortran/spec2017_cam4/ppgrid.mod
> new file mode 100644
> index 
> ..cb021390ccd758e75c3ad11b33da93e5fba9dd25
> GIT binary patch
> literal 296
> zcmV+@0oVQ?iwFP!01J#p3PlGTRhVT6q@2zl{=@`s-tfVf)tq@kHSA}j8Hz3S;
> z@Yh?$U?jR7j0cyt>GwAM+UHHbPVT~3#av=jq`S4ohpx6+k%JCBiloxd?>fb@DmEy~
> zRh6Yz%d*TqwV7`iu`C;Z(McQF#DqT#2lPd+lGk1SMnM}A6Hp9cSqmNq{B|nvAn#@P
> zCGE%0NIhX~f}
> z%QY-XvEF`Xig?UtLYWiJK(`p`0sj4z|eo?bN0F=wJgq8=k>8<#-V;obgE;
> u

Re: Add -falign-all-functions

2024-01-17 Thread Jan Hubicka
> 
> I meant the new option might be named -fmin-function-alignment=
> rather than -falign-all-functions because of how it should
> override all other options.

I was also pondering about both names.  -falign-all-functions has the
advantage that it is similar to all the other alignment flags that are
all called -falign-XXX

but both options are finte for me.
> 
> Otherwise is there an updated patch to look at?

I will prepare one.  So shall I drop the max-skip support for alignment
and rename the flag?

Honza
> 
> Richard.
> 
> > > -flimit-function-alignment should not have an effect on it
> > > and even very small functions should be aligned.
> > 
> > I write that it is not affected by limit-function-alignment
> > @opindex falign-all-functions=@var{n}
> > @item -falign-all-functions
> > Specify minimal alignment for function entry. Unlike 
> > @option{-falign-functions}
> > this alignment is applied also to all functions (even those considered 
> > cold).
> > The alignment is also not affected by @option{-flimit-function-alignment}
> > 
> > Because indeed that would break the atomicity of updates.
> 
> 
> 
> > Honza
> > > 
> > > Richard.
> > > 
> > > > +}
> > > > +
> > > >/* Handle a user-specified function alignment.
> > > >   Note that we still need to align to DECL_ALIGN, as above,
> > > >   because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at 
> > > > all.  */
> > > > 
> > > 
> > > -- 
> > > Richard Biener 
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] lower-bitint: Avoid overlap between destinations and sources in libgcc calls [PR113421]

2024-01-17 Thread Richard Biener
On Wed, 17 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled because the bitint lowering emits a
>   .MULBITINT (, 1024, , 1024, , 1024);
> call.  The bug is in the overlap between the destination and source, that is
> something the libgcc routines don't handle, they use the source arrays
> during the entire algorithms which computes the destination array(s).
> For the mapping of SSA_NAMEs to VAR_DECLs the code already supports that
> correctly, but the checking whether a load from memory can be used directly
> without a temporary even when earlier we decided to merge the
> multiplication/division/modulo etc. with a store didn't.
> 
> The following patch implements that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-01-17  Jakub Jelinek  
> 
>   PR tree-optimization/113421
>   * gimple-lower-bitint.cc (stmt_needs_operand_addr): Adjust function
>   comment.
>   (bitint_dom_walker::before_dom_children): Add g temporary to simplify
>   formatting.  Start at vop rather than cvop even if stmt is a store
>   and needs_operand_addr.
> 
>   * gcc.dg/torture/bitint-50.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-16 12:32:56.617721208 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-16 17:33:04.046476302 +0100
> @@ -5455,7 +5455,8 @@ vuse_eq (ao_ref *, tree vuse1, void *dat
>  
>  /* Return true if STMT uses a library function and needs to take
> address of its inputs.  We need to avoid bit-fields in those
> -   cases.  */
> +   cases.  Similarly, we need to avoid overlap between destination
> +   and source limb arrays.  */
>  
>  bool
>  stmt_needs_operand_addr (gimple *stmt)
> @@ -5574,7 +5575,8 @@ bitint_dom_walker::before_dom_children (
> else if (!bitmap_bit_p (m_loads, SSA_NAME_VERSION (s)))
>   continue;
>  
> -   tree rhs1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s));
> +   gimple *g = SSA_NAME_DEF_STMT (s);
> +   tree rhs1 = gimple_assign_rhs1 (g);
> if (needs_operand_addr
> && TREE_CODE (rhs1) == COMPONENT_REF
> && DECL_BIT_FIELD_TYPE (TREE_OPERAND (rhs1, 1)))
> @@ -5596,15 +5598,14 @@ bitint_dom_walker::before_dom_children (
>  
> ao_ref ref;
> ao_ref_init (, rhs1);
> -   tree lvop = gimple_vuse (SSA_NAME_DEF_STMT (s));
> +   tree lvop = gimple_vuse (g);
> unsigned limit = 64;
> tree vuse = cvop;
> if (vop != cvop
> && is_gimple_assign (stmt)
> && gimple_store_p (stmt)
> -   && !operand_equal_p (lhs,
> -gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s)),
> -0))
> +   && (needs_operand_addr
> +   || !operand_equal_p (lhs, gimple_assign_rhs1 (g), 0)))
>   vuse = vop;
> if (vuse != lvop
> && walk_non_aliased_vuses (, vuse, false, vuse_eq,
> --- gcc/testsuite/gcc.dg/torture/bitint-50.c.jj   2024-01-16 
> 17:35:16.084622119 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-50.c  2024-01-16 17:35:06.701753879 
> +0100
> @@ -0,0 +1,31 @@
> +/* PR tree-optimization/113421 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23 -pedantic-errors" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 1024
> +unsigned _BitInt(1024) a = -5wb;
> +
> +__attribute__((noipa)) void
> +foo (unsigned _BitInt(1024) x)
> +{
> +  a *= x;
> +}
> +#else
> +int a = 30;
> +
> +void
> +foo (int)
> +{
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +  foo (-6wb);
> +  if (a != 30wb)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: Add -falign-all-functions

2024-01-17 Thread Richard Biener
On Wed, 17 Jan 2024, Jan Hubicka wrote:

> > > +falign-all-functions
> > > +Common Var(flag_align_all_functions) Optimization
> > > +Align the start of functions.
> > 
> > all functions
> > 
> > or maybe "of every function."?
> 
> Fixed, thanks!
> > > +@opindex falign-all-functions=@var{n}
> > > +@item -falign-all-functions
> > > +Specify minimal alignment for function entry. Unlike 
> > > @option{-falign-functions}
> > > +this alignment is applied also to all functions (even those considered 
> > > cold).
> > > +The alignment is also not affected by @option{-flimit-function-alignment}
> > > +
> > 
> > For functions with two entries (like on powerpc), which entry does this
> > apply to?  I suppose the external ABI entry, not the local one?  But
> > how does this then help to align the patchable entry (the common
> > local entry should be aligned?).  Should we align _both_ entries?
> 
> To be honest I did not really know we actually would like to patch
> alternative entry points.
> The function alignent is always produced before the start of function,
> so the first entry point wins and the other entry point is not aligned.
> 
> Aligning later labels needs to go using the label align code, since
> theoretically some targets need to do relaxation over it.
> 
> In final.cc we do no function alignments on labels those labels.
> I guess this makes sense because if we align for performance, we
> probably do not want the altenrate entry point to be aligned since it
> appears close to original one.  I can add that to compute_alignment:
> test if label is alternative entry point and add alignment.
> I wonder if that is a desired behaviour though and is this code
> path even used?
> 
> I know this was originally added to support i386 register passing
> conventions and stack alignment via alternative entry point, but it was
> never really used that way.  Also there was plan to support Fortran
> alternative entry point.
> 
> Looking at what rs6000 does, it seems to not use the RTL representation
> of alternative entry points.  it seems that:
>  1) call assemble_start_functions which
> a) outputs function alignment
> b) outputs start label
> c) calls print_patchable_function_entry
>  2) call final_start_functions which calls output_function_prologue.
> In rs6000 there is second call to
> rs6000_print_patchable_function_entry
> So there is no target-independent place where alignment can be added,
> so I would say it is up to rs6000 maintainers to decide what is right
> here :)

Fair enough ...

> > 
> > >  @opindex falign-labels
> > >  @item -falign-labels
> > >  @itemx -falign-labels=@var{n}
> > > @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> > >  Align loops to a power-of-two boundary.  If the loops are executed
> > >  many times, this makes up for any execution of the dummy padding
> > >  instructions.
> > > +This is an optimization of code performance and alignment is ignored for
> > > +loops considered cold.
> > >  
> > >  If @option{-falign-labels} is greater than this value, then its value
> > >  is used instead.
> > > @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> > >  Align branch targets to a power-of-two boundary, for branch targets
> > >  where the targets can only be reached by jumping.  In this case,
> > >  no dummy operations need be executed.
> > > +This is an optimization of code performance and alignment is ignored for
> > > +jumps considered cold.
> > >  
> > >  If @option{-falign-labels} is greater than this value, then its value
> > >  is used instead.
> > > @@ -14371,7 +14385,7 @@ To use the link-time optimizer, @option{-flto} 
> > > and optimization
> > >  options should be specified at compile time and during the final link.
> > >  It is recommended that you compile all the files participating in the
> > >  same link with the same options and also specify those options at
> > > -link time.  
> > > +link time.
> > >  For example:
> > >  
> > >  @smallexample
> > > diff --git a/gcc/flags.h b/gcc/flags.h
> > > index e4bafa310d6..ecf4fb9e846 100644
> > > --- a/gcc/flags.h
> > > +++ b/gcc/flags.h
> > > @@ -89,6 +89,7 @@ public:
> > >align_flags x_align_jumps;
> > >align_flags x_align_labels;
> > >align_flags x_align_functions;
> > > +  align_flags x_align_all_functions;
> > >  };
> > >  
> > >  extern class target_flag_state default_target_flag_state;
> > > @@ -98,10 +99,11 @@ extern class target_flag_state 
> > > *this_target_flag_state;
> > >  #define this_target_flag_state (_target_flag_state)
> > >  #endif
> > >  
> > > -#define align_loops   (this_target_flag_state->x_align_loops)
> > > -#define align_jumps   (this_target_flag_state->x_align_jumps)
> > > -#define align_labels  (this_target_flag_state->x_align_labels)
> > > -#define align_functions   (this_target_flag_state->x_align_functions)
> > > +#define align_loops  (this_target_flag_state->x_align_loops)
> > > +#define align_jumps  

Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2024-01-17 Thread Lipeng Zhu




On 1/3/2024 5:14 PM, Lipeng Zhu wrote:



On 2023/12/21 19:42, Thomas Schwinge wrote:

Hi!

On 2023-12-13T21:52:29+0100, I wrote:

On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:

On 2023/12/12 1:45, H.J. Lu wrote:
On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  
wrote:

On 2023/12/9 23:23, Jakub Jelinek wrote:

On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the
percentage to step into the insert_unit function is around 30%, in
most instances, we can get the unit in the phase of reading the
unit_cache or unit_root tree. So split the read/write phase by
rwlock would be an approach to make it more parallel.

BTW, the IPC metrics can gain around 9x in our test server with
220 cores. The benchmark we used is
https://github.com/rwesson/NEAT



Ok for trunk, thanks.



Thanks! Looking forward to landing to trunk.



Pushed for you.



I've just filed 
"'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' 
execution test timeouts".

Would you be able to look into that?


See my update in there.


Grüße
  Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 
201, 80634 München; Gesellschaft mit beschränkter Haftung; 
Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: 
München; Registergericht München, HRB 106955




Updated in https://gcc.gnu.org/PR113005. Could you help to verify if the 
draft patch would fix the execution test timeout issue on your side?




Hi Thomas,

Any feedback from your side?

Regards,
Lipeng Zhu





Re: Fix merging of value predictors

2024-01-17 Thread Jan Hubicka
> 
> Please fill in what has changed, both for predict-18.c and predict.{cc,def}
> changes.

Sorry, I re-generated the patch after fixing some typos and forgot to
copy over the changelog.
> 
> > @@ -2613,24 +2658,40 @@ expr_expected_value_1 (tree type, tree op0, enum 
> > tree_code code,
> >   if (!nop1)
> > nop1 = op1;
> >  }
> > +  /* We already checked if folding one of arguments to constant is good
> > +enough.  Consequently failing to fold both means that we will not
> > +succeed determinging the value.  */
> 
> s/determinging/determining/
Fixed.  I am re-testing the following and will commit if it succeeds (on
x86_64-linux)

2024-01-17  Jan Hubicka  
Jakub Jelinek  

PR tree-optimization/110852

gcc/ChangeLog:

* predict.cc (expr_expected_value_1): Fix profile merging of PHI and
binary operations
(get_predictor_value): Handle PRED_COMBINED_VALUE_PREDICTIONS and
PRED_COMBINED_VALUE_PREDICTIONS_PHI
* predict.def (PRED_COMBINED_VALUE_PREDICTIONS): New predictor.
(PRED_COMBINED_VALUE_PREDICTIONS_PHI): New predictor.

gcc/testsuite/ChangeLog:

* gcc.dg/predict-18.c: Update template to expect combined value 
predictor.
* gcc.dg/predict-23.c: New test.
* gcc.dg/tree-ssa/predict-1.c: New test.
* gcc.dg/tree-ssa/predict-2.c: New test.
* gcc.dg/tree-ssa/predict-3.c: New test.

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 84cbe3ffc61..c1c48bf3df1 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -2404,44 +2404,78 @@ expr_expected_value_1 (tree type, tree op0, enum 
tree_code code,
   if (!bitmap_set_bit (visited, SSA_NAME_VERSION (op0)))
return NULL;
 
-  if (gimple_code (def) == GIMPLE_PHI)
+  if (gphi *phi = dyn_cast  (def))
{
  /* All the arguments of the PHI node must have the same constant
 length.  */
- int i, n = gimple_phi_num_args (def);
- tree val = NULL, new_val;
+ int i, n = gimple_phi_num_args (phi);
+ tree val = NULL;
+ bool has_nonzero_edge = false;
+
+ /* If we already proved that given edge is unlikely, we do not need
+to handle merging of the probabilities.  */
+ for (i = 0; i < n && !has_nonzero_edge; i++)
+   {
+ tree arg = PHI_ARG_DEF (phi, i);
+ if (arg == PHI_RESULT (phi))
+   continue;
+ profile_count cnt = gimple_phi_arg_edge (phi, i)->count ();
+ if (!cnt.initialized_p () || cnt.nonzero_p ())
+   has_nonzero_edge = true;
+   }
 
  for (i = 0; i < n; i++)
{
- tree arg = PHI_ARG_DEF (def, i);
+ tree arg = PHI_ARG_DEF (phi, i);
  enum br_predictor predictor2;
 
- /* If this PHI has itself as an argument, we cannot
-determine the string length of this argument.  However,
-if we can find an expected constant value for the other
-PHI args then we can still be sure that this is
-likely a constant.  So be optimistic and just
-continue with the next argument.  */
- if (arg == PHI_RESULT (def))
+ /* Skip self-referring parameters, since they does not change
+expected value.  */
+ if (arg == PHI_RESULT (phi))
continue;
 
+ /* Skip edges which we already predicted as executing
+zero times.  */
+ if (has_nonzero_edge)
+   {
+ profile_count cnt = gimple_phi_arg_edge (phi, i)->count ();
+ if (cnt.initialized_p () && !cnt.nonzero_p ())
+   continue;
+   }
  HOST_WIDE_INT probability2;
- new_val = expr_expected_value (arg, visited, ,
-);
+ tree new_val = expr_expected_value (arg, visited, ,
+ );
+ /* If we know nothing about value, give up.  */
+ if (!new_val)
+   return NULL;
 
- /* It is difficult to combine value predictors.  Simply assume
-that later predictor is weaker and take its prediction.  */
- if (*predictor < predictor2)
+ /* If this is a first edge, trust its prediction.  */
+ if (!val)
{
+ val = new_val;
  *predictor = predictor2;
  *probability = probability2;
+ continue;
}
- if (!new_val)
-   return NULL;
- if (!val)
-   val = new_val;
- else if (!operand_equal_p (val, new_val, false))
+ /* If there are two different values, give up.  */
+ if (!operand_equal_p (val, new_val, false))
return NULL;
+
+ 

Re: Add -falign-all-functions

2024-01-17 Thread Jan Hubicka
> > +falign-all-functions
> > +Common Var(flag_align_all_functions) Optimization
> > +Align the start of functions.
> 
> all functions
> 
> or maybe "of every function."?

Fixed, thanks!
> > +@opindex falign-all-functions=@var{n}
> > +@item -falign-all-functions
> > +Specify minimal alignment for function entry. Unlike 
> > @option{-falign-functions}
> > +this alignment is applied also to all functions (even those considered 
> > cold).
> > +The alignment is also not affected by @option{-flimit-function-alignment}
> > +
> 
> For functions with two entries (like on powerpc), which entry does this
> apply to?  I suppose the external ABI entry, not the local one?  But
> how does this then help to align the patchable entry (the common
> local entry should be aligned?).  Should we align _both_ entries?

To be honest I did not really know we actually would like to patch
alternative entry points.
The function alignent is always produced before the start of function,
so the first entry point wins and the other entry point is not aligned.

Aligning later labels needs to go using the label align code, since
theoretically some targets need to do relaxation over it.

In final.cc we do no function alignments on labels those labels.
I guess this makes sense because if we align for performance, we
probably do not want the altenrate entry point to be aligned since it
appears close to original one.  I can add that to compute_alignment:
test if label is alternative entry point and add alignment.
I wonder if that is a desired behaviour though and is this code
path even used?

I know this was originally added to support i386 register passing
conventions and stack alignment via alternative entry point, but it was
never really used that way.  Also there was plan to support Fortran
alternative entry point.

Looking at what rs6000 does, it seems to not use the RTL representation
of alternative entry points.  it seems that:
 1) call assemble_start_functions which
a) outputs function alignment
b) outputs start label
c) calls print_patchable_function_entry
 2) call final_start_functions which calls output_function_prologue.
In rs6000 there is second call to
rs6000_print_patchable_function_entry
So there is no target-independent place where alignment can be added,
so I would say it is up to rs6000 maintainers to decide what is right
here :)
> 
> >  @opindex falign-labels
> >  @item -falign-labels
> >  @itemx -falign-labels=@var{n}
> > @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> >  Align loops to a power-of-two boundary.  If the loops are executed
> >  many times, this makes up for any execution of the dummy padding
> >  instructions.
> > +This is an optimization of code performance and alignment is ignored for
> > +loops considered cold.
> >  
> >  If @option{-falign-labels} is greater than this value, then its value
> >  is used instead.
> > @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> >  Align branch targets to a power-of-two boundary, for branch targets
> >  where the targets can only be reached by jumping.  In this case,
> >  no dummy operations need be executed.
> > +This is an optimization of code performance and alignment is ignored for
> > +jumps considered cold.
> >  
> >  If @option{-falign-labels} is greater than this value, then its value
> >  is used instead.
> > @@ -14371,7 +14385,7 @@ To use the link-time optimizer, @option{-flto} and 
> > optimization
> >  options should be specified at compile time and during the final link.
> >  It is recommended that you compile all the files participating in the
> >  same link with the same options and also specify those options at
> > -link time.  
> > +link time.
> >  For example:
> >  
> >  @smallexample
> > diff --git a/gcc/flags.h b/gcc/flags.h
> > index e4bafa310d6..ecf4fb9e846 100644
> > --- a/gcc/flags.h
> > +++ b/gcc/flags.h
> > @@ -89,6 +89,7 @@ public:
> >align_flags x_align_jumps;
> >align_flags x_align_labels;
> >align_flags x_align_functions;
> > +  align_flags x_align_all_functions;
> >  };
> >  
> >  extern class target_flag_state default_target_flag_state;
> > @@ -98,10 +99,11 @@ extern class target_flag_state *this_target_flag_state;
> >  #define this_target_flag_state (_target_flag_state)
> >  #endif
> >  
> > -#define align_loops (this_target_flag_state->x_align_loops)
> > -#define align_jumps (this_target_flag_state->x_align_jumps)
> > -#define align_labels(this_target_flag_state->x_align_labels)
> > -#define align_functions (this_target_flag_state->x_align_functions)
> > +#define align_loops(this_target_flag_state->x_align_loops)
> > +#define align_jumps(this_target_flag_state->x_align_jumps)
> > +#define align_labels   (this_target_flag_state->x_align_labels)
> > +#define align_functions(this_target_flag_state->x_align_functions)
> > +#define align_all_functions 

Re: Fix merging of value predictors

2024-01-17 Thread Jakub Jelinek
On Wed, Jan 17, 2024 at 01:45:18PM +0100, Jan Hubicka wrote:
> Hi,
> expr_expected_value is doing some guesswork when it is merging two or more
> independent value predictions either in PHI node or in binary operation.
> Since we do not know how the predictions interact with each other, we can
> not really merge the values precisely.
> 
> The previous logic merged the prediciton and picked the later predictor
> (since predict.def is sorted by reliability). This however leads to troubles
> with __builtin_expect_with_probability since it is special cased as a 
> predictor
> with custom probabilities.  If this predictor is downgraded to something else,
> we ICE since we have prediction given by predictor that is not expected
> to have customprobability.
> 
> This patch fixies it by inventing new predictors 
> PRED_COMBINED_VALUE_PREDICTIONS
> and PRED_COMBINED_VALUE_PREDICTIONS_PHI which also allows custom values but
> are considered less reliable then __builtin_expect_with_probability (they
> are combined by ds theory rather then by first match).  This is less likely
> going to lead to very stupid decisions if combining does not work as expected.
> 
> I also updated the code to be bit more careful about merging values and do not
> downgrade the precision when unnecesary (as tested by new testcases).
> 
> Bootstrapped/regtested x86_64-linux, will commit it tomorrow if there are
> no complains.
> 
> 2024-01-17  Jan Hubicka 
>   Jakub Jelinek 

2 spaces before < rather than 1.
> 
>   PR tree-optimization/110852
> 
> gcc/ChangeLog:
> 
>   * predict.cc (expr_expected_value_1):
>   (get_predictor_value):
>   * predict.def (PRED_COMBINED_VALUE_PREDICTIONS):
>   (PRED_COMBINED_VALUE_PREDICTIONS_PHI):
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/predict-18.c:

Please fill in what has changed, both for predict-18.c and predict.{cc,def}
changes.

> @@ -2613,24 +2658,40 @@ expr_expected_value_1 (tree type, tree op0, enum 
> tree_code code,
> if (!nop1)
>   nop1 = op1;
>}
> +  /* We already checked if folding one of arguments to constant is good
> +  enough.  Consequently failing to fold both means that we will not
> +  succeed determinging the value.  */

s/determinging/determining/

Otherwise LGTM.

Jakub



Fix merging of value predictors

2024-01-17 Thread Jan Hubicka
Hi,
expr_expected_value is doing some guesswork when it is merging two or more
independent value predictions either in PHI node or in binary operation.
Since we do not know how the predictions interact with each other, we can
not really merge the values precisely.

The previous logic merged the prediciton and picked the later predictor
(since predict.def is sorted by reliability). This however leads to troubles
with __builtin_expect_with_probability since it is special cased as a predictor
with custom probabilities.  If this predictor is downgraded to something else,
we ICE since we have prediction given by predictor that is not expected
to have customprobability.

This patch fixies it by inventing new predictors PRED_COMBINED_VALUE_PREDICTIONS
and PRED_COMBINED_VALUE_PREDICTIONS_PHI which also allows custom values but
are considered less reliable then __builtin_expect_with_probability (they
are combined by ds theory rather then by first match).  This is less likely
going to lead to very stupid decisions if combining does not work as expected.

I also updated the code to be bit more careful about merging values and do not
downgrade the precision when unnecesary (as tested by new testcases).

Bootstrapped/regtested x86_64-linux, will commit it tomorrow if there are
no complains.

2024-01-17  Jan Hubicka 
Jakub Jelinek 

PR tree-optimization/110852

gcc/ChangeLog:

* predict.cc (expr_expected_value_1):
(get_predictor_value):
* predict.def (PRED_COMBINED_VALUE_PREDICTIONS):
(PRED_COMBINED_VALUE_PREDICTIONS_PHI):

gcc/testsuite/ChangeLog:

* gcc.dg/predict-18.c:
* gcc.dg/predict-23.c: New test.
* gcc.dg/tree-ssa/predict-1.c: New test.
* gcc.dg/tree-ssa/predict-2.c: New test.
* gcc.dg/tree-ssa/predict-3.c: New test.

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 84cbe3ffc61..f9d73c5eb1a 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -2404,44 +2404,78 @@ expr_expected_value_1 (tree type, tree op0, enum 
tree_code code,
   if (!bitmap_set_bit (visited, SSA_NAME_VERSION (op0)))
return NULL;
 
-  if (gimple_code (def) == GIMPLE_PHI)
+  if (gphi *phi = dyn_cast  (def))
{
  /* All the arguments of the PHI node must have the same constant
 length.  */
- int i, n = gimple_phi_num_args (def);
- tree val = NULL, new_val;
+ int i, n = gimple_phi_num_args (phi);
+ tree val = NULL;
+ bool has_nonzero_edge = false;
+
+ /* If we already proved that given edge is unlikely, we do not need
+to handle merging of the probabilities.  */
+ for (i = 0; i < n && !has_nonzero_edge; i++)
+   {
+ tree arg = PHI_ARG_DEF (phi, i);
+ if (arg == PHI_RESULT (phi))
+   continue;
+ profile_count cnt = gimple_phi_arg_edge (phi, i)->count ();
+ if (!cnt.initialized_p () || cnt.nonzero_p ())
+   has_nonzero_edge = true;
+   }
 
  for (i = 0; i < n; i++)
{
- tree arg = PHI_ARG_DEF (def, i);
+ tree arg = PHI_ARG_DEF (phi, i);
  enum br_predictor predictor2;
 
- /* If this PHI has itself as an argument, we cannot
-determine the string length of this argument.  However,
-if we can find an expected constant value for the other
-PHI args then we can still be sure that this is
-likely a constant.  So be optimistic and just
-continue with the next argument.  */
- if (arg == PHI_RESULT (def))
+ /* Skip self-referring parameters, since they does not change
+expected value.  */
+ if (arg == PHI_RESULT (phi))
continue;
 
+ /* Skip edges which we already predicted as executing
+zero times.  */
+ if (has_nonzero_edge)
+   {
+ profile_count cnt = gimple_phi_arg_edge (phi, i)->count ();
+ if (cnt.initialized_p () && !cnt.nonzero_p ())
+   continue;
+   }
  HOST_WIDE_INT probability2;
- new_val = expr_expected_value (arg, visited, ,
-);
+ tree new_val = expr_expected_value (arg, visited, ,
+ );
+ /* If we know nothing about value, give up.  */
+ if (!new_val)
+   return NULL;
 
- /* It is difficult to combine value predictors.  Simply assume
-that later predictor is weaker and take its prediction.  */
- if (*predictor < predictor2)
+ /* If this is a first edge, trust its prediction.  */
+ if (!val)
{
+ val = new_val;
  *predictor = predictor2;
  

Re: [PATCH] gimple-ssa-warn-access: Cast huge params to sizetype before using them in maybe_check_access_sizes [PR113410]

2024-01-17 Thread Richard Biener
On Wed, 17 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> WHen a VLA is created with some very high precision size expression
> (say __int128, or _BitInt(65535) etc.), we cast it to sizetype, because
> we can't have arrays longer than what can be expressed in sizetype.
> 
> But the maybe_check_access_sizes code when trying to determine ranges
> wasn't doing this but was using fixed buffers for the sizes.  While
> __int128 could still be handled (fit into the buffers), obviously
> arbitrary _BitInt parameter ranges can't, they can be in the range of
> up to almost 20KB per number.  It doesn't make sense to print such
> ranges though, no array can be larger than sizetype precision, and
> ranger's range_of_expr can handle NOP_EXPRs/CONVERT_EXPRs wrapping a
> PARM_DECL just fine, so the following patch just casts the excessively
> large counters for the range determination purposes to sizetype.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2024-01-17  Jakub Jelinek  
> 
>   PR middle-end/113410
>   * gimple-ssa-warn-access.cc (pass_waccess::maybe_check_access_sizes):
>   If access_nelts is integral with larger precision than sizetype,
>   fold_convert it to sizetype.
> 
>   * gcc.dg/bitint-72.c: New test.
> 
> --- gcc/gimple-ssa-warn-access.cc.jj  2024-01-03 11:51:30.087751231 +0100
> +++ gcc/gimple-ssa-warn-access.cc 2024-01-16 19:25:35.408958088 +0100
> @@ -3406,6 +3406,15 @@ pass_waccess::maybe_check_access_sizes (
>else
>   access_nelts = rwm->get (sizidx)->size;
>  
> +  /* If access_nelts is e.g. a PARM_DECL with larger precision than
> +  sizetype, such as __int128 or _BitInt(34123) parameters,
> +  cast it to sizetype.  */
> +  if (access_nelts
> +   && INTEGRAL_TYPE_P (TREE_TYPE (access_nelts))
> +   && (TYPE_PRECISION (TREE_TYPE (access_nelts))
> +   > TYPE_PRECISION (sizetype)))
> + access_nelts = fold_convert (sizetype, access_nelts);
> +
>/* Format the value or range to avoid an explosion of messages.  */
>char sizstr[80];
>tree sizrng[2] = { size_zero_node, build_all_ones_cst (sizetype) };
> --- gcc/testsuite/gcc.dg/bitint-72.c.jj   2024-01-16 19:31:33.839938120 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-72.c  2024-01-16 19:31:06.000328741 +0100
> @@ -0,0 +1,16 @@
> +/* PR middle-end/113410 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 905
> +void bar (_BitInt(905) n, int[n]);
> +#else
> +void bar (int n, int[n]);
> +#endif
> +
> +void
> +foo (int n)
> +{
> +  int buf[n];
> +  bar (n, buf);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Richard Biener
On Wed, 17 Jan 2024, Monk Chiang wrote:

> This allows the backend to generate movcc instructions, if target
> machine has movcc pattern.
> 
> branchless-cond.c needs to be updated since some target machines have
> conditional move instructions, and the experssion will not change to
> branchless expression.

While I agree this pattern should possibly be applied during RTL
expansion or instruction selection on x86 which also has movcc
the multiplication is cheaper.  So I don't think this isn't the way to go.

I'd rather revert the change than trying to "fix" it this way?

Thanks,
Richard.

> gcc/ChangeLog:
>   PR target/113095
>   * match.pd (`(zero_one == 0) ? y : z  y`,
>   `(zero_one != 0) ? z  y : y`): Do not match to branchless
>   expression, if target machine has conditional move pattern.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
> ---
>  gcc/match.pd  | 30 +--
>  .../gcc.dg/tree-ssa/branchless-cond.c |  6 ++--
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index e42ecaf9ec7..a1f90b1cd41 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4231,7 +4231,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (INTEGRAL_TYPE_P (type)
> && TYPE_PRECISION (type) > 1
> && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> -   (op (mult (convert:type @0) @2) @1
> +   (with {
> +  bool can_movecc_p = false;
> +  if (can_conditionally_move_p (TYPE_MODE (type)))
> + can_movecc_p = true;
> +
> +  /* Some target only support word_mode for movcc pattern, if type can
> +  extend to word_mode then use conditional move. Even if there is a
> +  extend instruction, the cost is lower than branchless.  */
> +  if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
> +   && can_conditionally_move_p (word_mode))
> + can_movecc_p = true;
> +}
> +(if (!can_movecc_p)
> + (op (mult (convert:type @0) @2) @1))
>  
>  /* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
>  (for op (bit_xor bit_ior plus)
> @@ -4243,7 +4256,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (INTEGRAL_TYPE_P (type)
> && TYPE_PRECISION (type) > 1
> && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> -   (op (mult (convert:type @0) @2) @1
> +   (with {
> +  bool can_movecc_p = false;
> +  if (can_conditionally_move_p (TYPE_MODE (type)))
> + can_movecc_p = true;
> +
> +  /* Some target only support word_mode for movcc pattern, if type can
> +  extend to word_mode then use conditional move. Even if there is a
> +  extend instruction, the cost is lower than branchless.  */
> +  if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
> +   && can_conditionally_move_p (word_mode))
> + can_movecc_p = true;
> +}
> +(if (!can_movecc_p)
> + (op (mult (convert:type @0) @2) @1))
>  
>  /* ?: Value replacement. */
>  /* a == 0 ? b : b + a  -> b + a */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> index e063dc4bb5f..c002ed97364 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> @@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z)
>return ((x & 1) != 0) ? z | y : y;
>  }
>  
> -/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
> -/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { 
> "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* 
> nds32*-*-*" } } } } */
> +/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { 
> "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* 
> nds32*-*-*" } } } } */
> +/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* 
> alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[committed v5] libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]

2024-01-17 Thread Jonathan Wakely
Here's the final version that I pushed.

Tested aarch64-linux, x86_64-linux.
commit df0a668b784556fe4317317d58961652d93d53de
Author: Jonathan Wakely 
Date:   Mon Jan 15 15:42:50 2024

libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]

This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.

The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).

Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values.  They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.

This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory).  Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled).  In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.

Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.

libstdc++-v3/ChangeLog:

PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.

Reviewed-by: Ulrich Drepper 
Reviewed-by: Patrick Palka 

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index e7cbf0fcf96..f9ba7ef744b 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
   # Keep these sync'd with the list in Makefile.am.  The first provides an
   # expandable list at autoconf time; the second provides an expandable list
   # (i.e., shell variable) at configure time.
-  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/c++23 src/filesystem src/libbacktrace src/experimental 
doc po testsuite python])
+  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/c++23 src/c++26 src/filesystem src/libbacktrace 
src/experimental doc po testsuite python])
   SUBDIRS='glibcxx_SUBDIRS'
 
   # These need to be absolute paths, yet at the same time need to
@@ -5821,6 +5821,34 @@ AC_LANG_SAVE
   AC_LANG_RESTORE
 ])
 
+dnl
+dnl 

Re: [PATCH] lower-bitint: Fix up VIEW_CONVERT_EXPR handling [PR113408]

2024-01-17 Thread Richard Biener
On Wed, 17 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> Unlike NOP_EXPR/CONVERT_EXPR which are GIMPLE_UNARY_RHS, VIEW_CONVERT_EXPR
> is GIMPLE_SINGLE_RHS and so gimple_assign_rhs1 contains the operand wrapped
> in VIEW_CONVERT_EXPR tree.
> 
> So, to handle it like other casts we need to look through it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-01-17  Jakub Jelinek  
> 
>   PR tree-optimization/113408
>   * gimple-lower-bitint.cc (bitint_large_huge::handle_stmt): For
>   VIEW_CONVERT_EXPR, pass TREE_OPERAND (rhs1, 0) rather than rhs1
>   to handle_cast.
> 
>   * gcc.dg/bitint-71.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-15 17:34:00.0 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-16 12:32:56.617721208 +0100
> @@ -1975,9 +1975,12 @@ bitint_large_huge::handle_stmt (gimple *
>   case INTEGER_CST:
> return handle_operand (gimple_assign_rhs1 (stmt), idx);
>   CASE_CONVERT:
> - case VIEW_CONVERT_EXPR:
> return handle_cast (TREE_TYPE (gimple_assign_lhs (stmt)),
> gimple_assign_rhs1 (stmt), idx);
> + case VIEW_CONVERT_EXPR:
> +   return handle_cast (TREE_TYPE (gimple_assign_lhs (stmt)),
> +   TREE_OPERAND (gimple_assign_rhs1 (stmt), 0),
> +   idx);
>   default:
> break;
>   }
> --- gcc/testsuite/gcc.dg/bitint-71.c.jj   2024-01-16 12:38:16.679239526 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-71.c  2024-01-16 12:37:24.724967020 +0100
> @@ -0,0 +1,18 @@
> +/* PR tree-optimization/113408 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 713
> +struct A { _BitInt(713) b; } g;
> +#else
> +struct A { _BitInt(49) b; } g;
> +#endif
> +int f;
> +
> +void
> +foo (void)
> +{
> +  struct A j = g;
> +  if (j.b)
> +f = 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] ipa-strub: Fix handling of _BitInt returns [PR113406]

2024-01-17 Thread Richard Biener
On Wed, 17 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> Seems pass_ipa_strub::execute contains a copy of the expand_thunk
> code I've changed for _BitInt in r14-6805 PR112941 - larger _BitInts
> are aggregate_value_p even when they are is_gimple_reg_type.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk?

OK.

> 2024-01-17  Jakub Jelinek  
> 
>   PR middle-end/113406
>   * ipa-strub.cc (pass_ipa_strub::execute): Check aggregate_value_p
>   regardless of whether is_gimple_reg_type (restype) or not.
> 
>   * gcc.dg/bitint-70.c: New test.
> 
> --- gcc/ipa-strub.cc.jj   2024-01-03 11:51:28.374775006 +0100
> +++ gcc/ipa-strub.cc  2024-01-16 10:51:03.987463928 +0100
> @@ -3174,21 +3174,16 @@ pass_ipa_strub::execute (function *)
>  resdecl,
>  build_int_cst (TREE_TYPE (resdecl), 0));
> }
> - else if (!is_gimple_reg_type (restype))
> + else if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
> {
> - if (aggregate_value_p (resdecl, TREE_TYPE (thunk_fndecl)))
> -   {
> - restmp = resdecl;
> + restmp = resdecl;
>  
> - if (VAR_P (restmp))
> -   {
> - add_local_decl (cfun, restmp);
> - BLOCK_VARS (DECL_INITIAL (current_function_decl))
> -   = restmp;
> -   }
> + if (VAR_P (restmp))
> +   {
> + add_local_decl (cfun, restmp);
> + BLOCK_VARS (DECL_INITIAL (current_function_decl))
> +   = restmp;
> }
> - else
> -   restmp = create_tmp_var (restype, "retval");
> }
>   else
> restmp = create_tmp_reg (restype, "retval");
> --- gcc/testsuite/gcc.dg/bitint-70.c.jj   2024-01-16 11:01:48.300524130 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-70.c  2024-01-16 11:01:19.456924333 +0100
> @@ -0,0 +1,14 @@
> +/* PR middle-end/113406 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -fstrub=internal" } */
> +/* { dg-require-effective-target strub } */
> +
> +#if __BITINT_MAXWIDTH__ >= 146
> +_BitInt(146)
> +#else
> +_BitInt(16)
> +#endif
> +foo (void)
> +{
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] libstdc++: Do not use CTAD for _Utf32_view alias template

2024-01-17 Thread Jonathan Wakely
On Tue, 16 Jan 2024 at 21:28, Jonathan Wakely wrote:
>
> Tested aarch64-linux. I plan to push this to fix an error when using
> trunk with Clang.

Pushed.

>
> -- >8 --
>
> We were relying on P1814R0 (CTAD for alias templates) which isn't
> supported by Clang. We can just not use CTAD and provide an explicit
> template argument list for _Utf32_view.
>
> Ideally we'd define a deduction guide for _Grapheme_cluster_view that
> uses views::all_t to properly convert non-views to views, but all_t is
> defined in  and we don't want to include all of that in
> . So just make it require a view for now, which can be
> cheaply copied.
>
> Although it's not needed yet, it would also be more correct to
> specialize enable_borrowed_range for the views in .
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/unicode.h (_Grapheme_cluster_view): Require view.
> Do not use CTAD for _Utf32_view.
> (__format_width, __truncate): Do not use CTAD.
> (enable_borrowed_range<_Utf_view>): Define specialization.
> (enable_borrowed_range<_Grapheme_cluster_view>): Likewise.
> ---
>  libstdc++-v3/include/bits/unicode.h | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/unicode.h 
> b/libstdc++-v3/include/bits/unicode.h
> index f1b2b359bdf..d35c83d0090 100644
> --- a/libstdc++-v3/include/bits/unicode.h
> +++ b/libstdc++-v3/include/bits/unicode.h
> @@ -714,15 +714,15 @@ inline namespace __v15_1_0
>};
>
>// Split a range into extended grapheme clusters.
> -  template
> +  template requires ranges::view<_View>
>  class _Grapheme_cluster_view
>  : public ranges::view_interface<_Grapheme_cluster_view<_View>>
>  {
>  public:
>
>constexpr
> -  _Grapheme_cluster_view(const _View& __v)
> -  : _M_begin(_Utf32_view(__v).begin())
> +  _Grapheme_cluster_view(_View __v)
> +  : _M_begin(_Utf32_view<_View>(std::move(__v)).begin())
>{ }
>
>constexpr auto begin() const { return _M_begin; }
> @@ -946,7 +946,7 @@ inline namespace __v15_1_0
>  {
>if (__s.empty()) [[unlikely]]
> return 0;
> -  _Grapheme_cluster_view __gc(__s);
> +  _Grapheme_cluster_view> __gc(__s);
>auto __it = __gc.begin();
>const auto __end = __gc.end();
>size_t __n = __it.width();
> @@ -964,7 +964,7 @@ inline namespace __v15_1_0
>if (__s.empty()) [[unlikely]]
> return 0;
>
> -  _Grapheme_cluster_view __gc(__s);
> +  _Grapheme_cluster_view> __gc(__s);
>auto __it = __gc.begin();
>const auto __end = __gc.end();
>size_t __n = __it.width();
> @@ -1058,6 +1058,19 @@ inline namespace __v15_1_0
>
>  } // namespace __unicode
>
> +namespace ranges
> +{
> +  template
> +inline constexpr bool
> +enable_borrowed_range>
> +  = enable_borrowed_range<_Range>;
> +
> +  template
> +inline constexpr bool
> +enable_borrowed_range>
> +  = enable_borrowed_range<_Range>;
> +} // namespace ranges
> +
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace std
>  #endif // C++20
> --
> 2.43.0
>



[PATCH] libgcc: fix SEH C++ rethrow semantics [PR113337]

2024-01-17 Thread Matteo Italia
SEH _Unwind_Resume_or_Rethrow invokes abort directly if
_Unwind_RaiseException doesn't manage to find a handler for the rethrown
exception; this is incorrect, as in this case std::terminate should be
invoked, allowing an application-provided terminate handler to handle
the situation instead of straight crashing the application through
abort.

The bug can be demonstrated with this simple test case:
===
static void custom_terminate_handler() {
fprintf(stderr, "custom_terminate_handler invoked\n");
std::exit(1);
}

int main(int argc, char *argv[]) {
std::set_terminate(_terminate_handler);
if (argc < 2) return 1;
const char *mode = argv[1];
fprintf(stderr, "%s\n", mode);
if (strcmp(mode, "throw") == 0) {
throw std::exception();
} else if (strcmp(mode, "rethrow") == 0) {
try {
throw std::exception();
} catch (...) {
throw;
}
} else {
return 1;
}
return 0;
}
===

On all gcc builds with non-SEH exceptions, this will print
"custom_terminate_handler invoked" both if launched as ./a.out throw or
as ./a.out rethrow, on SEH builds instead if will work as expected only
with ./a.exe throw, but will crash with the "built-in" abort message
with ./a.exe rethrow.

This patch fixes the problem, forwarding back the error code to the
caller (__cxa_rethrow), that calls std::terminate if
_Unwind_Resume_or_Rethrow returns.

The change makes the code path coherent with SEH _Unwind_RaiseException,
and with the generic _Unwind_Resume_or_Rethrow from libgcc/unwind.inc
(used for SjLj and Dw2 exception backend).

libgcc/ChangeLog:

* unwind-seh.c (_Unwind_Resume_or_Rethrow): forward
_Unwind_RaiseException return code back to caller instead of
calling abort, allowing __cxa_rethrow to invoke std::terminate
in case of uncaught rethrown exception
---
 libgcc/unwind-seh.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgcc/unwind-seh.c b/libgcc/unwind-seh.c
index 8ef0257b616..f1b8f5a8519 100644
--- a/libgcc/unwind-seh.c
+++ b/libgcc/unwind-seh.c
@@ -395,9 +395,9 @@ _Unwind_Reason_Code
 _Unwind_Resume_or_Rethrow (struct _Unwind_Exception *exc)
 {
   if (exc->private_[0] == 0)
-_Unwind_RaiseException (exc);
-  else
-_Unwind_ForcedUnwind_Phase2 (exc);
+return _Unwind_RaiseException (exc);
+
+  _Unwind_ForcedUnwind_Phase2 (exc);
   abort ();
 }
 
-- 
2.34.1



Re: [PATCH 01/14] c++: Implement __is_integral built-in trait

2024-01-17 Thread Joseph Myers
On Wed, 17 Jan 2024, Jonathan Wakely wrote:

> So we can remove the dependency on __STRICT_ISO__ for 128-bit integer
> types, and implementing std::is_integral with a built-in seems like
> the perfect time to do that. But that seems like stage 1 material, as
> we need to go through the library and see what needs to change.

As noted on IRC, for C23 there would also be library issues in making 
__int128 an extended integer type.  If it's an extended integer type, then 
C23 would require  to define int128_t, uint128_t, int_least128_t 
and uint_least128_t, along with the macros INT128_WIDTH, UINT128_WIDTH, 
INT_LEAST128_WIDTH, UINT_LEAST128_WIDTH (trivial), and INT128_C and 
UINT128_C (require an integer constant suffix), and INT128_MAX, 
INT128_MIN, UINT128_MAX, INT_LEAST128_MAX, INT_LEAST128_MIN, 
UINT_LEAST128_MAX (most simply defined using an integer constant suffix, 
though don't strictly require one).  And  would have to define 
all the printf and scanf format macros for int128_t, uint128_t, 
int_least128_t and uint_least128_t - so library support would be needed 
for those (the format macros themselves should probably expand to "w128d" 
and similar, a C23 feature already supported for narrower types by glibc 
and by GCC format checking, rather than inventing new features there).

So because an extended integer type (without padding bits) in C23 is 
expected to have all the library support from  and , 
you need integer constant suffixes and printf/scanf support before you can 
declare __int128 an extended integer type for C23.

(If adding printf and scanf support for int128_t to glibc, it probably 
makes sense to add user-visible functions such as strtoi128 at the same 
time - no such functions are in the standard, but something like them 
would be needed internally as part of the scanf implementation, and it's 
likely they would be useful for users as well.)

-- 
Joseph S. Myers
josmy...@redhat.com



[committed] testsuite: Add testcase for already fixed PR [PR110251]

2024-01-17 Thread Jakub Jelinek
Hi!

This testcase started to hang at -O3 with r13-4208 and got fixed
with r14-2097.

Regtested on x86_64-linux and i686-linux, committed to trunk as obvious.

2024-01-17  Jakub Jelinek  

PR tree-optimization/110251
* gcc.c-torture/compile/pr110251.c: New test.

--- gcc/testsuite/gcc.c-torture/compile/pr110251.c.jj   2024-01-16 
20:39:50.605210933 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr110251.c  2024-01-16 
20:39:43.568310057 +0100
@@ -0,0 +1,27 @@
+/* PR tree-optimization/110251 */
+
+int a, b;
+signed char c;
+
+int
+foo (int e)
+{
+  if (e >= 'a')
+return e;
+}
+
+int
+bar (unsigned short e)
+{
+  for (; e; a++)
+e &= e - 1;
+}
+
+void
+baz (void)
+{
+  while (c < 1)
+;
+  for (; bar (c - 1); b = foo (c))
+;
+}

Jakub



Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin

2024-01-17 Thread Huanghui Nie
I'm sorry for CC the gcc@ list. Waiting for your review results there.
Thanks.

2024年1月17日(水) 16:18 Jonathan Wakely :

>
>
> On Wed, 17 Jan 2024, 08:14 Huanghui Nie via Gcc,  wrote:
>
>> Thanks. Done.
>>
>
> And don't CC the main gcc@ list, that's not for patch discussion. And if
> you CC the right list, you don't need to CC the individual maintainers.
>
> Anyway, it's on the right list now so we'll review it there, thanks.
>
>
>
>> 2024年1月17日(水) 12:39 Sam James :
>>
>> >
>> > Huanghui Nie  writes:
>> >
>> > > Hi.
>> >
>> > Please CC the libstdc++ LM for libstdc++ patches, per
>> >
>> >
>> https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html#list.patches
>> > .
>> >
>> > > [...]
>> >
>> >
>>
>


[PATCH] gimple-ssa-warn-access: Cast huge params to sizetype before using them in maybe_check_access_sizes [PR113410]

2024-01-17 Thread Jakub Jelinek
Hi!

WHen a VLA is created with some very high precision size expression
(say __int128, or _BitInt(65535) etc.), we cast it to sizetype, because
we can't have arrays longer than what can be expressed in sizetype.

But the maybe_check_access_sizes code when trying to determine ranges
wasn't doing this but was using fixed buffers for the sizes.  While
__int128 could still be handled (fit into the buffers), obviously
arbitrary _BitInt parameter ranges can't, they can be in the range of
up to almost 20KB per number.  It doesn't make sense to print such
ranges though, no array can be larger than sizetype precision, and
ranger's range_of_expr can handle NOP_EXPRs/CONVERT_EXPRs wrapping a
PARM_DECL just fine, so the following patch just casts the excessively
large counters for the range determination purposes to sizetype.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-17  Jakub Jelinek  

PR middle-end/113410
* gimple-ssa-warn-access.cc (pass_waccess::maybe_check_access_sizes):
If access_nelts is integral with larger precision than sizetype,
fold_convert it to sizetype.

* gcc.dg/bitint-72.c: New test.

--- gcc/gimple-ssa-warn-access.cc.jj2024-01-03 11:51:30.087751231 +0100
+++ gcc/gimple-ssa-warn-access.cc   2024-01-16 19:25:35.408958088 +0100
@@ -3406,6 +3406,15 @@ pass_waccess::maybe_check_access_sizes (
   else
access_nelts = rwm->get (sizidx)->size;
 
+  /* If access_nelts is e.g. a PARM_DECL with larger precision than
+sizetype, such as __int128 or _BitInt(34123) parameters,
+cast it to sizetype.  */
+  if (access_nelts
+ && INTEGRAL_TYPE_P (TREE_TYPE (access_nelts))
+ && (TYPE_PRECISION (TREE_TYPE (access_nelts))
+ > TYPE_PRECISION (sizetype)))
+   access_nelts = fold_convert (sizetype, access_nelts);
+
   /* Format the value or range to avoid an explosion of messages.  */
   char sizstr[80];
   tree sizrng[2] = { size_zero_node, build_all_ones_cst (sizetype) };
--- gcc/testsuite/gcc.dg/bitint-72.c.jj 2024-01-16 19:31:33.839938120 +0100
+++ gcc/testsuite/gcc.dg/bitint-72.c2024-01-16 19:31:06.000328741 +0100
@@ -0,0 +1,16 @@
+/* PR middle-end/113410 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23" } */
+
+#if __BITINT_MAXWIDTH__ >= 905
+void bar (_BitInt(905) n, int[n]);
+#else
+void bar (int n, int[n]);
+#endif
+
+void
+foo (int n)
+{
+  int buf[n];
+  bar (n, buf);
+}

Jakub



[committed] Fix comment typos

2024-01-17 Thread Jakub Jelinek
Hi!

When looking at PR113410, I found a comment typo and just searched for
the same typo elsewhere and found some typos in the comments which had
that typo as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to
trunk as obvious.

2024-01-17  Jakub Jelinek  

* tree-into-ssa.cc (pass_build_ssa::gate): Fix comment typo,
funcions -> functions, and use were instead of was.
* gengtype.cc (dump_typekind): Fix comment typos, funcion -> function
and guaranteee -> guarantee.
* attribs.h (struct attr_access): Fix comment typo funcion -> function.

--- gcc/tree-into-ssa.cc.jj 2024-01-03 11:51:34.128695146 +0100
+++ gcc/tree-into-ssa.cc2024-01-16 18:57:38.136438943 +0100
@@ -2499,7 +2499,7 @@ public:
   /* opt_pass methods: */
   bool gate (function *fun) final override
 {
-  /* Do nothing for funcions that was produced already in SSA form.  */
+  /* Do nothing for functions that were produced already in SSA form.  */
   return !(fun->curr_properties & PROP_ssa);
 }
 
--- gcc/gengtype.cc.jj  2024-01-03 11:51:23.314845233 +0100
+++ gcc/gengtype.cc 2024-01-16 18:56:57.383009291 +0100
@@ -4718,8 +4718,8 @@ write_roots (pair_p variables, bool emit
 }
 
 /* Prints not-as-ugly version of a typename of T to OF.  Trades the uniquness
-   guaranteee for somewhat increased readability.  If name conflicts do happen,
-   this funcion will have to be adjusted to be more like
+   guarantee for somewhat increased readability.  If name conflicts do happen,
+   this function will have to be adjusted to be more like
output_mangled_typename.  */
 
 #define INDENT 2
--- gcc/attribs.h.jj2024-01-03 11:51:24.200832936 +0100
+++ gcc/attribs.h   2024-01-16 19:08:27.507350364 +0100
@@ -324,7 +324,7 @@ struct attr_access
  in TREE_VALUE and their positions in the argument list (stored
  in TREE_PURPOSE).  Each expression may be a PARM_DECL or some
  other DECL (for ordinary variables), or an EXPR for other
- expressions (e.g., funcion calls).  */
+ expressions (e.g., function calls).  */
   tree size;
 
   /* The zero-based position of each of the formal function arguments.

Jakub



RE: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow all subreg [PR113221]

2024-01-17 Thread Kyrylo Tkachov



> -Original Message-
> From: Andrew Pinski 
> Sent: Wednesday, January 17, 2024 3:29 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Alex Coplan ; Andrew Pinski
> 
> Subject: [PATCH] aarch64: Fix aarch64_ldp_reg_operand predicate not to allow
> all subreg [PR113221]
> 
> So the problem here is that aarch64_ldp_reg_operand will all subreg even 
> subreg
> of lo_sum.
> When LRA tries to fix that up, all things break. So the fix is to change the 
> check to
> only
> allow reg and subreg of regs.
> 
> Note the tendancy here is to use register_operand but that checks the mode of
> the register
> but we need to allow a mismatch modes for this predicate for now.
> 
> Built and tested for aarch64-linux-gnu with no regressions
> (Also tested with the LD/ST pair pass back on).

Ok with the comments from Alex addressed.
Thanks,
Kyrill

> 
>   PR target/113221
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/predicates.md (aarch64_ldp_reg_operand): For subreg,
>   only allow REG operands isntead of allowing all.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.c-torture/compile/pr113221-1.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/predicates.md |  8 +++-
>  gcc/testsuite/gcc.c-torture/compile/pr113221-1.c | 12 
>  2 files changed, 19 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> 
> diff --git a/gcc/config/aarch64/predicates.md
> b/gcc/config/aarch64/predicates.md
> index 8a204e48bb5..256268517d8 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -313,7 +313,13 @@ (define_predicate "pmode_plus_operator"
> 
>  (define_special_predicate "aarch64_ldp_reg_operand"
>(and
> -(match_code "reg,subreg")
> +(ior
> +  (match_code "reg")
> +  (and
> +   (match_code "subreg")
> +   (match_test "GET_CODE (SUBREG_REG (op)) == REG")
> +  )
> +)
>  (match_test "aarch64_ldpstp_operand_mode_p (GET_MODE (op))")
>  (ior
>(match_test "mode == VOIDmode")
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> new file mode 100644
> index 000..152a510786e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr113221-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-options "-fno-move-loop-invariants -funroll-all-loops" } */
> +/* PR target/113221 */
> +/* This used to ICE after the `load/store pair fusion pass` was added
> +   due to the predicate aarch64_ldp_reg_operand allowing too much. */
> +
> +
> +void bar();
> +void foo(int* b) {
> +  for (;;)
> +*b++ = (long)bar;
> +}
> +
> --
> 2.39.3



[PATCH] libstdc++: Update baseline symbols for riscv64-linux

2024-01-17 Thread Andreas Schwab
* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: Update.
---
 .../abi/post/riscv64-linux-gnu/baseline_symbols.txt  | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/libstdc++-v3/config/abi/post/riscv64-linux-gnu/baseline_symbols.txt 
b/libstdc++-v3/config/abi/post/riscv64-linux-gnu/baseline_symbols.txt
index 5ee7f5a0460..a37a0b9a0c9 100644
--- a/libstdc++-v3/config/abi/post/riscv64-linux-gnu/baseline_symbols.txt
+++ b/libstdc++-v3/config/abi/post/riscv64-linux-gnu/baseline_symbols.txt
@@ -497,7 +497,12 @@ FUNC:_ZNKSt11__timepunctIwE7_M_daysEPPKw@@GLIBCXX_3.4
 FUNC:_ZNKSt11__timepunctIwE8_M_am_pmEPPKw@@GLIBCXX_3.4
 FUNC:_ZNKSt11__timepunctIwE9_M_monthsEPPKw@@GLIBCXX_3.4
 FUNC:_ZNKSt11logic_error4whatEv@@GLIBCXX_3.4
+FUNC:_ZNKSt12__basic_fileIcE13native_handleEv@@GLIBCXX_3.4.32
 FUNC:_ZNKSt12__basic_fileIcE7is_openEv@@GLIBCXX_3.4
+FUNC:_ZNKSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE1EEcvbEv@@GLIBCXX_3.4.31
+FUNC:_ZNKSt12__shared_ptrINSt10filesystem4_DirELN9__gnu_cxx12_Lock_policyE1EEcvbEv@@GLIBCXX_3.4.31
+FUNC:_ZNKSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE1EEcvbEv@@GLIBCXX_3.4.31
+FUNC:_ZNKSt12__shared_ptrINSt10filesystem7__cxx114_DirELN9__gnu_cxx12_Lock_policyE1EEcvbEv@@GLIBCXX_3.4.31
 FUNC:_ZNKSt12bad_weak_ptr4whatEv@@GLIBCXX_3.4.15
 FUNC:_ZNKSt12future_error4whatEv@@GLIBCXX_3.4.14
 FUNC:_ZNKSt12strstreambuf6pcountEv@@GLIBCXX_3.4
@@ -3210,6 +3215,7 @@ 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_disposeEv@@GLIBCX
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_replaceEmmPKcm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_S_compareEmm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_M_capacityEm@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE11_S_allocateERS3_m@@GLIBCXX_3.4.32
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcOS3_@@GLIBCXX_3.4.23
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcRKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC2EPcOS3_@@GLIBCXX_3.4.23
@@ -3362,6 +3368,7 @@ 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_M_disposeEv@@GLIBCX
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_M_replaceEmmPKwm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_S_compareEmm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_M_capacityEm@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE11_S_allocateERS3_m@@GLIBCXX_3.4.32
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC1EPwOS3_@@GLIBCXX_3.4.23
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC1EPwRKS3_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE12_Alloc_hiderC2EPwOS3_@@GLIBCXX_3.4.23
@@ -4523,6 +4530,7 @@ FUNC:__cxa_allocate_exception@@CXXABI_1.3
 FUNC:__cxa_bad_cast@@CXXABI_1.3
 FUNC:__cxa_bad_typeid@@CXXABI_1.3
 FUNC:__cxa_begin_catch@@CXXABI_1.3
+FUNC:__cxa_call_terminate@@CXXABI_1.3.15
 FUNC:__cxa_call_unexpected@@CXXABI_1.3
 FUNC:__cxa_current_exception_type@@CXXABI_1.3
 FUNC:__cxa_deleted_virtual@@CXXABI_1.3.6
@@ -4566,6 +4574,7 @@ OBJECT:0:CXXABI_1.3.11
 OBJECT:0:CXXABI_1.3.12
 OBJECT:0:CXXABI_1.3.13
 OBJECT:0:CXXABI_1.3.14
+OBJECT:0:CXXABI_1.3.15
 OBJECT:0:CXXABI_1.3.2
 OBJECT:0:CXXABI_1.3.3
 OBJECT:0:CXXABI_1.3.4
-- 
2.43.0


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH] lower-bitint: Avoid overlap between destinations and sources in libgcc calls [PR113421]

2024-01-17 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled because the bitint lowering emits a
  .MULBITINT (, 1024, , 1024, , 1024);
call.  The bug is in the overlap between the destination and source, that is
something the libgcc routines don't handle, they use the source arrays
during the entire algorithms which computes the destination array(s).
For the mapping of SSA_NAMEs to VAR_DECLs the code already supports that
correctly, but the checking whether a load from memory can be used directly
without a temporary even when earlier we decided to merge the
multiplication/division/modulo etc. with a store didn't.

The following patch implements that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-17  Jakub Jelinek  

PR tree-optimization/113421
* gimple-lower-bitint.cc (stmt_needs_operand_addr): Adjust function
comment.
(bitint_dom_walker::before_dom_children): Add g temporary to simplify
formatting.  Start at vop rather than cvop even if stmt is a store
and needs_operand_addr.

* gcc.dg/torture/bitint-50.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-16 12:32:56.617721208 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-16 17:33:04.046476302 +0100
@@ -5455,7 +5455,8 @@ vuse_eq (ao_ref *, tree vuse1, void *dat
 
 /* Return true if STMT uses a library function and needs to take
address of its inputs.  We need to avoid bit-fields in those
-   cases.  */
+   cases.  Similarly, we need to avoid overlap between destination
+   and source limb arrays.  */
 
 bool
 stmt_needs_operand_addr (gimple *stmt)
@@ -5574,7 +5575,8 @@ bitint_dom_walker::before_dom_children (
  else if (!bitmap_bit_p (m_loads, SSA_NAME_VERSION (s)))
continue;
 
- tree rhs1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s));
+ gimple *g = SSA_NAME_DEF_STMT (s);
+ tree rhs1 = gimple_assign_rhs1 (g);
  if (needs_operand_addr
  && TREE_CODE (rhs1) == COMPONENT_REF
  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (rhs1, 1)))
@@ -5596,15 +5598,14 @@ bitint_dom_walker::before_dom_children (
 
  ao_ref ref;
  ao_ref_init (, rhs1);
- tree lvop = gimple_vuse (SSA_NAME_DEF_STMT (s));
+ tree lvop = gimple_vuse (g);
  unsigned limit = 64;
  tree vuse = cvop;
  if (vop != cvop
  && is_gimple_assign (stmt)
  && gimple_store_p (stmt)
- && !operand_equal_p (lhs,
-  gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s)),
-  0))
+ && (needs_operand_addr
+ || !operand_equal_p (lhs, gimple_assign_rhs1 (g), 0)))
vuse = vop;
  if (vuse != lvop
  && walk_non_aliased_vuses (, vuse, false, vuse_eq,
--- gcc/testsuite/gcc.dg/torture/bitint-50.c.jj 2024-01-16 17:35:16.084622119 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-50.c2024-01-16 17:35:06.701753879 
+0100
@@ -0,0 +1,31 @@
+/* PR tree-optimization/113421 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 1024
+unsigned _BitInt(1024) a = -5wb;
+
+__attribute__((noipa)) void
+foo (unsigned _BitInt(1024) x)
+{
+  a *= x;
+}
+#else
+int a = 30;
+
+void
+foo (int)
+{
+}
+#endif
+
+int
+main ()
+{
+  foo (-6wb);
+  if (a != 30wb)
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-17 Thread chenglulu



在 2024/1/17 下午5:50, Xi Ruoyao 写道:

On Wed, 2024-01-17 at 17:38 +0800, chenglulu wrote:

在 2024/1/13 下午9:05, Xi Ruoyao 写道:

在 2024-01-13星期六的 15:01 +0800,chenglulu写道:

在 2024/1/12 下午7:42, Xi Ruoyao 写道:

在 2024-01-12星期五的 09:46 +0800,chenglulu写道:


I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS:
we need a target hook to tell the generic code
UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or we'll
see millions lines of messages like

../../gcc/gcc/tree.h:4171:1: note: non-delegitimized UNSPEC
UNSPEC_LA_PCREL_64_PART1 (42) found in variable location

I build GCC with -mcmodel=extreme in BOOT_CFLAGS, but I haven't reproduced the 
problem you mentioned.

   $ ../configure --host=loongarch64-linux-gnu 
--target=loongarch64-linux-gnu --build=loongarch64-linux-gnu \
   --with-arch=loongarch64 --with-abi=lp64d --enable-tls 
--enable-languages=c,c++,fortran,lto --enable-plugin \
   --disable-multilib --disable-host-shared --enable-bootstrap 
--enable-checking=release
   $ make BOOT_FLAGS="-mcmodel=extreme"

What did I do wrong?:-(

BOOT_CFLAGS, not BOOT_FLAGS :).


This is so strange. My compilation here stopped due to syntax problems,

and I still haven't reproduced the information you mentioned about
UNSPEC_LA_PCREL_64_PART1.

I used:

../gcc/configure --with-system-zlib --disable-fixincludes \
   --enable-default-ssp --enable-default-pie \
   --disable-werror --disable-multilib \
   --prefix=/home/xry111/gcc-dev

and then

make STAGE1_{C,CXX}FLAGS="-O2 -g" -j8 \
   BOOT_{C,CXX}FLAGS="-O2 -g -mcmodel=extreme" &| tee gcc-build.log

I guess "-g" is needed to reproduce the issue as well as the messages
were produced in dwarf generation.


I have reproduced this problem, and it can be solved by adding a hook.

But unfortunately, when using '-mcmodel=extreme -mexplicit-relocs=always'

to test spec2006 403.gcc, an error will occur. Others have not been
tested yet.

I roughly debugged it, and the problem should be this:

The problem is that the address of the instruction ‘ldx.d $r12, $r25,
$r6’ is wrong.

Wrong assembly:

     5826 pcalau12i   $r13,%got_pc_hi20(recog_data)
   5827 addi.d  $r12,$r0,%got_pc_lo12(recog_data)
   5828 lu32i.d $r12,%got64_pc_lo20(recog_data)
   5829 lu52i.d $r12,$r12,%got64_pc_hi12(recog_data)
   5830 ldx.d   $r12,$r13,$r12
   5831 ld.b    $r8,$r12,997
   5832 .loc 1 829 18 discriminator 1 view .LVU1527
   5833 ble $r8,$r0,.L476
   5834 ld.d    $r6,$r3,16
   5835 ld.d    $r9,$r3,88
   5836 .LBB189 = .
   5837 .loc 1 839 24 view .LVU1528
   5838 alsl.d  $r7,$r19,$r19,2
   5839 ldx.d   $r12,$r25,$r6
   5840 addi.d  $r17,$r3,120
   5841 .LBE189 = .
   5842 .loc 1 829 18 discriminator 1 view .LVU1529
   5843 or  $r13,$r0,$r0
   5844 addi.d  $r4,$r12,992

Assembly that works fine using macros:

3040 la.global   $r12,$r13,recog_data
3041 ld.b    $r9,$r12,997
3042 ble $r9,$r0,.L475
3043 alsl.d  $r5,$r16,$r16,2
3044 la.global   $r15,$r17,recog_data
3045 addi.d  $r4,$r12,992
3046 addi.d  $r18,$r3,48
3047 or  $r13,$r0,$r0

Comparing the assembly, we can see that lines 5844 and 3045 have the
same function,

but there is a problem with the base address register optimization at
line 5844.

regrename.c.283r.loop2_init:

(insn 6 497 2741 34 (set (reg:DI 180 [ ivtmp.713D.15724 ])
  (const_int 0 [0])) "regrename.c":829:18 discrim 1 156
{*movdi_64bit}
(nil))
(insn 2741 6 2744 34 (parallel [
  (set (reg:DI 1502)
  (unspec:DI [
  (symbol_ref:DI ("recog_data") [flags 0xc0]
)
  ] UNSPEC_LA_PCREL_64_PART1))
  (set (reg/f:DI 1479)
  (unspec:DI [
  (symbol_ref:DI ("recog_data") [flags 0xc0]
)
  ] UNSPEC_LA_PCREL_64_PART2))
  ]) -1
   (expr_list:REG_UNUSED (reg/f:DI 1479)
(nil)))
(insn 2744 2741 2745 34 (set (reg/f:DI 1503)
  (mem:DI (plus:DI (reg/f:DI 1479)
  (reg:DI 1502)) [0  S8 A8])) 156 {*movdi_64bit}
   (expr_list:REG_EQUAL (symbol_ref:DI ("recog_data") [flags 0xc0]
)
(nil)))


Virtual register 1479 will be used in insn 2744, but register 1479 was
assigned the REG_UNUSED attribute in the previous instruction.

The attached file is the wrong file.
The compilation command is as follows:

$ ./gcc/cc1 -fpreprocessed regrename.i -quiet -dp -dumpbase regrename.c
-dumpbase-ext .c -mno-relax -mabi=lp64d -march=loongarch64 -mfpu=64
-msimd=lasx -mcmodel=extreme -mtune=loongarch64 -g3 -O2
-Wno-int-conversion -Wno-implicit-int -Wno-implicit-function-declaration
-Wno-incompatible-pointer-types -version -o regrename.s
-mexplicit-relocs=always -fdump-rtl-all-all

I've seen some "guality" test 

[committed] openmp: Add OpenMP _BitInt support [PR113409]

2024-01-17 Thread Jakub Jelinek
Hi!

The following patch adds support for _BitInt iterators of OpenMP canonical
loops (with the preexisting limitation that when not using compile time
static scheduling the iterators in the library are at most unsigned long long
or signed long, so one can't in the runtime/dynamic/guided etc. cases iterate
more than what those types can represent, like is the case of e.g. __int128
iterators too) and the testcase also covers linear/reduction clauses for them.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2024-01-17  Jakub Jelinek  

PR middle-end/113409
* omp-general.cc (omp_adjust_for_condition): Handle BITINT_TYPE like
INTEGER_TYPE.
(omp_extract_for_data): Use build_bitint_type rather than
build_nonstandard_integer_type if either iter_type or loop->v type
is BITINT_TYPE.
* omp-expand.cc (expand_omp_for_generic,
expand_omp_taskloop_for_outer, expand_omp_taskloop_for_inner): Handle
BITINT_TYPE like INTEGER_TYPE.

* testsuite/libgomp.c/bitint-1.c: New test.

--- gcc/omp-general.cc.jj   2024-01-04 09:10:56.590914073 +0100
+++ gcc/omp-general.cc  2024-01-16 16:08:15.160663134 +0100
@@ -115,7 +115,8 @@ omp_adjust_for_condition (location_t loc
 
 case NE_EXPR:
   gcc_assert (TREE_CODE (step) == INTEGER_CST);
-  if (TREE_CODE (TREE_TYPE (v)) == INTEGER_TYPE)
+  if (TREE_CODE (TREE_TYPE (v)) == INTEGER_TYPE
+ || TREE_CODE (TREE_TYPE (v)) == BITINT_TYPE)
{
  if (integer_onep (step))
*cond_code = LT_EXPR;
@@ -409,6 +410,7 @@ omp_extract_for_data (gomp_for *for_stmt
   loop->v = gimple_omp_for_index (for_stmt, i);
   gcc_assert (SSA_VAR_P (loop->v));
   gcc_assert (TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE
+ || TREE_CODE (TREE_TYPE (loop->v)) == BITINT_TYPE
  || TREE_CODE (TREE_TYPE (loop->v)) == POINTER_TYPE);
   var = TREE_CODE (loop->v) == SSA_NAME ? SSA_NAME_VAR (loop->v) : loop->v;
   loop->n1 = gimple_omp_for_initial (for_stmt, i);
@@ -479,9 +481,17 @@ omp_extract_for_data (gomp_for *for_stmt
  else if (i == 0
   || TYPE_PRECISION (iter_type)
  < TYPE_PRECISION (TREE_TYPE (loop->v)))
-   iter_type
- = build_nonstandard_integer_type
- (TYPE_PRECISION (TREE_TYPE (loop->v)), 1);
+   {
+ if (TREE_CODE (iter_type) == BITINT_TYPE
+ || TREE_CODE (TREE_TYPE (loop->v)) == BITINT_TYPE)
+   iter_type
+ = build_bitint_type (TYPE_PRECISION (TREE_TYPE (loop->v)),
+  1);
+ else
+   iter_type
+ = build_nonstandard_integer_type
+   (TYPE_PRECISION (TREE_TYPE (loop->v)), 1);
+   }
}
   else if (iter_type != long_long_unsigned_type_node)
{
@@ -747,7 +757,8 @@ omp_extract_for_data (gomp_for *for_stmt
  if (t && integer_zerop (t))
count = build_zero_cst (long_long_unsigned_type_node);
  else if ((i == 0 || count != NULL_TREE)
-  && TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE
+  && (TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE
+  || TREE_CODE (TREE_TYPE (loop->v)) == BITINT_TYPE)
   && TREE_CONSTANT (loop->n1)
   && TREE_CONSTANT (loop->n2)
   && TREE_CODE (loop->step) == INTEGER_CST)
--- gcc/omp-expand.cc.jj2024-01-03 11:51:39.095626210 +0100
+++ gcc/omp-expand.cc   2024-01-16 13:17:47.367928336 +0100
@@ -4075,7 +4075,7 @@ expand_omp_for_generic (struct omp_regio
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
-  && TREE_CODE (type) == INTEGER_TYPE
+  && (TREE_CODE (type) == INTEGER_TYPE || TREE_CODE (type) == BITINT_TYPE)
   && !TYPE_UNSIGNED (type)
   && fd->ordered == 0)
 {
@@ -7191,7 +7191,7 @@ expand_omp_taskloop_for_outer (struct om
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
-  && TREE_CODE (type) == INTEGER_TYPE
+  && (TREE_CODE (type) == INTEGER_TYPE || TREE_CODE (type) == BITINT_TYPE)
   && !TYPE_UNSIGNED (type))
 {
   tree n1, n2;
@@ -7352,7 +7352,7 @@ expand_omp_taskloop_for_inner (struct om
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
-  && TREE_CODE (type) == INTEGER_TYPE
+  && (TREE_CODE (type) == INTEGER_TYPE || TREE_CODE (type) == BITINT_TYPE)
   && !TYPE_UNSIGNED (type))
 {
   tree n1, n2;
--- libgomp/testsuite/libgomp.c/bitint-1.c.jj   2024-01-16 13:47:24.880153301 
+0100
+++ libgomp/testsuite/libgomp.c/bitint-1.c  2024-01-16 16:04:43.242609845 
+0100
@@ -0,0 +1,65 @@
+/* PR middle-end/113409 */
+/* { dg-do run { target bitint } } */
+
+extern void abort 

[PATCH] match: Do not select to branchless expression when target has movcc pattern [PR113095]

2024-01-17 Thread Monk Chiang
This allows the backend to generate movcc instructions, if target
machine has movcc pattern.

branchless-cond.c needs to be updated since some target machines have
conditional move instructions, and the experssion will not change to
branchless expression.

gcc/ChangeLog:
PR target/113095
* match.pd (`(zero_one == 0) ? y : z  y`,
`(zero_one != 0) ? z  y : y`): Do not match to branchless
expression, if target machine has conditional move pattern.

gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
---
 gcc/match.pd  | 30 +--
 .../gcc.dg/tree-ssa/branchless-cond.c |  6 ++--
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index e42ecaf9ec7..a1f90b1cd41 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4231,7 +4231,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) > 1
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
-   (op (mult (convert:type @0) @2) @1
+   (with {
+  bool can_movecc_p = false;
+  if (can_conditionally_move_p (TYPE_MODE (type)))
+   can_movecc_p = true;
+
+  /* Some target only support word_mode for movcc pattern, if type can
+extend to word_mode then use conditional move. Even if there is a
+extend instruction, the cost is lower than branchless.  */
+  if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
+ && can_conditionally_move_p (word_mode))
+   can_movecc_p = true;
+}
+(if (!can_movecc_p)
+ (op (mult (convert:type @0) @2) @1))
 
 /* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
 (for op (bit_xor bit_ior plus)
@@ -4243,7 +4256,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) > 1
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
-   (op (mult (convert:type @0) @2) @1
+   (with {
+  bool can_movecc_p = false;
+  if (can_conditionally_move_p (TYPE_MODE (type)))
+   can_movecc_p = true;
+
+  /* Some target only support word_mode for movcc pattern, if type can
+extend to word_mode then use conditional move. Even if there is a
+extend instruction, the cost is lower than branchless.  */
+  if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
+ && can_conditionally_move_p (word_mode))
+   can_movecc_p = true;
+}
+(if (!can_movecc_p)
+ (op (mult (convert:type @0) @2) @1))
 
 /* ?: Value replacement. */
 /* a == 0 ? b : b + a  -> b + a */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
index e063dc4bb5f..c002ed97364 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z)
   return ((x & 1) != 0) ? z | y : y;
 }
 
-/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { 
"aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" 
} } } } */
+/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { 
"aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" 
} } } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* 
alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
-- 
2.40.1



  1   2   >