Re: [RFC] Summary of libgomp failures for offloading to nvptx from AArch64

2024-07-29 Thread Richard Biener via Gcc
On Mon, Jul 29, 2024 at 1:35 PM Prathamesh Kulkarni
 wrote:
>
>
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, July 26, 2024 6:51 PM
> > To: Prathamesh Kulkarni 
> > Cc: gcc@gcc.gnu.org
> > Subject: Re: [RFC] Summary of libgomp failures for offloading to nvptx
> > from AArch64
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, Jul 25, 2024 at 3:36 PM Prathamesh Kulkarni via Gcc
> >  wrote:
> > >
> > > Hi,
> > > I am working on enabling offloading to nvptx from AAarch64 host. As
> > > mentioned on wiki
> > > (https://gcc.gnu.org/wiki/Offloading#Running_.27make_check.27),
> > > I ran make check-target-libgomp on AAarch64 host (and no GPU) with
> > following results:
> > >
> > > === libgomp Summary ===
> > >
> > > # of expected passes14568
> > > # of unexpected failures1023
> > > # of expected failures  309
> > > # of untested testcases 54
> > > # of unresolved testcases   992
> > > # of unsupported tests  644
> > >
> > > It seems majority of the tests fail due to the following 4 issues:
> > >
> > > * Compiling a minimal test-case:
> > >
> > > int main()
> > > {
> > >   int x;
> > >   #pragma omp target map (to: x)
> > >   {
> > > x = 0;
> > >   }
> > >   return x;
> > > }
> > >
> > > Compiling with -fopenmp -foffload=nvptx-none results in following
> > issues:
> > >
> > > (1) Differing values of NUM_POLY_INT_COEFFS between host and
> > accelerator, which results in following ICE:
> > >
> > > 0x1a6e0a7 pp_quoted_string
> > > ../../gcc/gcc/pretty-print.cc:2277
> > >  0x1a6ffb3 pp_format(pretty_printer*, text_info*, urlifier const*)
> > > ../../gcc/gcc/pretty-print.cc:1634
> > >  0x1a4a3f3 diagnostic_context::report_diagnostic(diagnostic_info*)
> > > ../../gcc/gcc/diagnostic.cc:1612
> > >  0x1a4a727 diagnostic_impl
> > > ../../gcc/gcc/diagnostic.cc:1775  0x1a4e20b
> > > fatal_error(unsigned int, char const*, ...)
> > > ../../gcc/gcc/diagnostic.cc:2218  0xb3088f
> > > lto_input_mode_table(lto_file_decl_data*)
> > >  ../../gcc/gcc/lto-streamer-in.cc:2121
> > >  0x6f5cdf lto_file_finalize
> > > ../../gcc/gcc/lto/lto-common.cc:2285
> > >  0x6f5cdf lto_create_files_from_ids
> > > ../../gcc/gcc/lto/lto-common.cc:2309
> > >  0x6f5cdf lto_file_read
> > > ../../gcc/gcc/lto/lto-common.cc:2364
> > >  0x6f5cdf read_cgraph_and_symbols(unsigned int, char const**)
> > > ../../gcc/gcc/lto/lto-common.cc:2812
> > >  0x6cfb93 lto_main()
> > > ../../gcc/gcc/lto/lto.cc:658
> > >
> > > This is already tracked in https://gcc.gnu.org/PR96265 (and related
> > > PR's)
> > >
> > > Streaming out mode_table:
> > > mode = SI, mclass = 2, size = 4, prec = 32 mode = DI, mclass = 2,
> > size
> > > = 8, prec = 64
> > >
> > > Streaming in mode_table (in lto_input_mode_table):
> > > mclass = 2, size = 4, prec = 0
> > > (and then calculates the correct mode value by iterating over all
> > > modes of mclass starting from narrowest mode)
> > >
> > > The issue is that the value for prec is not getting streamed-in
> > > correctly for SImode as seen above. While streaming out from AArch64
> > host, it is 32, but while streaming in for nvptx, it is 0. This
> > happens because of differing values of NUM_POLY_INT_COEFFS between
> > AArch64 and nvptx backend.
> > >
> > > Since NUM_POLY_INT_COEFFS is 2 for aarch64, the streamed-out values
> > > for mode, precision would be <4, 0> and <32, 0> respectively
> > > (streamed-out in bp_pack_poly_value). Both zeros come from coeffs[1]
> > > of size and prec. While streaming in however, NUM_POLY_INT_COEFFS is
> > 1 for nvptx, and thus it incorrectly treats <4, 0> as size and
> > precision respectively, which is why precision gets streamed in as 0,
> > and thus it encounters the above ICE.
> > >
> > > Supporting non VLA code with offloading:
> > >
> > > In the general case, it's hard to support offloading for arbitrary
> > poly_ints when NUM_POLY_INT_COEFFS differs for host and accelerator.
> > > For example, it's not possible to represent a degree-2 poly_int like
> > 4 + 4x (as-is) on an accelerator with NUM_POLY_INT_COEFFS == 1.
> > >
> > > However, IIUC, we can support offloading for restricted set of
> > > poly_ints whose degree <= accel's NUM_POLY_INT_COEFFS, since they
> > can
> > > be represented on accelerator ? For a hypothetical example, if host
> > NUM_POLY_INT_COEFFS == 3 and accel NUM_POLY_INT_COEFFS == 2, then I
> > suppose we could represent a degree 2 poly_int on accelerator, but not
> > a degree 3 poly_int like 3+4x+5x^2 ?
> > >
> > > Based on that, I have come up with following approach in attached
> > "quick-and-dirty" patch (p-163-2.diff):
> > > Stream-out host NUM_POLY_INT_COEFFS, and while streaming-in during
> > lto1, compare it with accelerator's NUM_POLY_INT_COEFFS as follows:
> > >
> > > Stream in host_num_poly_int_coeffs;
> > > if (host_num_poly_int_coeffs == 

[gcc r14-10520] tree-optimization/116057 - wrong code with CCP and vector CTORs

2024-07-29 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:a7f1b00ed69810ce7f000d385a60e148d0228d48

commit r14-10520-ga7f1b00ed69810ce7f000d385a60e148d0228d48
Author: Richard Biener 
Date:   Wed Jul 24 13:16:35 2024 +0200

tree-optimization/116057 - wrong code with CCP and vector CTORs

The following fixes an issue with CCPs likely_value when faced with
a vector CTOR containing undef SSA names and constants.  This should
be classified as CONSTANT and not UNDEFINED.

PR tree-optimization/116057
* tree-ssa-ccp.cc (likely_value): Also walk CTORs in stmt
operands to look for constants.

* gcc.dg/torture/pr116057.c: New testcase.

(cherry picked from commit 1ea551514b9c285d801ac5ab8d78b22483ff65af)

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr116057.c | 20 
 gcc/tree-ssa-ccp.cc | 11 +++
 2 files changed, 31 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/torture/pr116057.c 
b/gcc/testsuite/gcc.dg/torture/pr116057.c
new file mode 100644
index ..a7021c8e746e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116057.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-additional-options "-Wno-psabi" } */
+
+#define vect8 __attribute__((vector_size(8)))
+
+vect8 int __attribute__((noipa))
+f(int a)
+{
+  int b;
+  vect8 int t={1,1};
+  if(a) return t;
+  return (vect8 int){0, b};
+}
+
+int main ()
+{
+  if (f(0)[0] != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index cc78ff20bb81..62f36367060e 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -762,6 +762,17 @@ likely_value (gimple *stmt)
continue;
   if (is_gimple_min_invariant (op))
has_constant_operand = true;
+  else if (TREE_CODE (op) == CONSTRUCTOR)
+   {
+ unsigned j;
+ tree val;
+ FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (op), j, val)
+   if (CONSTANT_CLASS_P (val))
+ {
+   has_constant_operand = true;
+   break;
+ }
+   }
 }
 
   if (has_constant_operand)


Re: GCC 14.2 Release Candidate available from gcc.gnu.org

2024-07-29 Thread Richard Biener via Gcc
On Sun, Jul 28, 2024 at 11:46 PM Jason Merrill via Gcc  wrote:
>
> Since the RC I've fixed a few 14/15 C++ regressions with extremely safe
> patches, and wonder what you think about pushing them to the branch at this
> point:
>
> 115583, 115986, 115561
>
> Sorry these came so late.

Those are all for C++20 or later (non-default, experimental?).

If we pick a few extra fixes we might want to do RC2 today and delay
14.2 a few days.

Richard.

> Jason
>
> On Tue, Jul 23, 2024 at 8:51 AM Jakub Jelinek via Gcc 
> wrote:
>
> > The first release candidate for GCC 14.2 is available from
> >
> >  https://gcc.gnu.org/pub/gcc/snapshots/14.2.0-RC-20240723/
> >  ftp://gcc.gnu.org/pub/gcc/snapshots/14.2.0-RC-20240723/
> >
> > and shortly its mirrors.  It has been generated from git commit
> > r14-10504-ga544898f6dd6a16.
> >
> > I have so far bootstrapped and tested the release candidate on
> > x86_64-linux.
> > Please test it and report any issues to bugzilla.
> >
> > If all goes well, we'd like to release 14.2 on Tuesday, Jul 30th.
> >
> >


Re: [RFC] Summary of libgomp failures for offloading to nvptx from AArch64

2024-07-26 Thread Richard Biener via Gcc
On Thu, Jul 25, 2024 at 3:36 PM Prathamesh Kulkarni via Gcc
 wrote:
>
> Hi,
> I am working on enabling offloading to nvptx from AAarch64 host. As mentioned 
> on wiki (https://gcc.gnu.org/wiki/Offloading#Running_.27make_check.27),
> I ran make check-target-libgomp on AAarch64 host (and no GPU) with following 
> results:
>
> === libgomp Summary ===
>
> # of expected passes14568
> # of unexpected failures1023
> # of expected failures  309
> # of untested testcases 54
> # of unresolved testcases   992
> # of unsupported tests  644
>
> It seems majority of the tests fail due to the following 4 issues:
>
> * Compiling a minimal test-case:
>
> int main()
> {
>   int x;
>   #pragma omp target map (to: x)
>   {
> x = 0;
>   }
>   return x;
> }
>
> Compiling with -fopenmp -foffload=nvptx-none results in following issues:
>
> (1) Differing values of NUM_POLY_INT_COEFFS between host and accelerator, 
> which results in following ICE:
>
> 0x1a6e0a7 pp_quoted_string
> ../../gcc/gcc/pretty-print.cc:2277
>  0x1a6ffb3 pp_format(pretty_printer*, text_info*, urlifier const*)
> ../../gcc/gcc/pretty-print.cc:1634
>  0x1a4a3f3 diagnostic_context::report_diagnostic(diagnostic_info*)
> ../../gcc/gcc/diagnostic.cc:1612
>  0x1a4a727 diagnostic_impl
> ../../gcc/gcc/diagnostic.cc:1775
>  0x1a4e20b fatal_error(unsigned int, char const*, ...)
> ../../gcc/gcc/diagnostic.cc:2218
>  0xb3088f lto_input_mode_table(lto_file_decl_data*)
>  ../../gcc/gcc/lto-streamer-in.cc:2121
>  0x6f5cdf lto_file_finalize
> ../../gcc/gcc/lto/lto-common.cc:2285
>  0x6f5cdf lto_create_files_from_ids
> ../../gcc/gcc/lto/lto-common.cc:2309
>  0x6f5cdf lto_file_read
> ../../gcc/gcc/lto/lto-common.cc:2364
>  0x6f5cdf read_cgraph_and_symbols(unsigned int, char const**)
> ../../gcc/gcc/lto/lto-common.cc:2812
>  0x6cfb93 lto_main()
> ../../gcc/gcc/lto/lto.cc:658
>
> This is already tracked in https://gcc.gnu.org/PR96265 (and related PR's)
>
> Streaming out mode_table:
> mode = SI, mclass = 2, size = 4, prec = 32
> mode = DI, mclass = 2, size = 8, prec = 64
>
> Streaming in mode_table (in lto_input_mode_table):
> mclass = 2, size = 4, prec = 0
> (and then calculates the correct mode value by iterating over all modes of 
> mclass starting from narrowest mode)
>
> The issue is that the value for prec is not getting streamed-in correctly for 
> SImode as seen above. While streaming out from AArch64 host,
> it is 32, but while streaming in for nvptx, it is 0. This happens because of 
> differing values of NUM_POLY_INT_COEFFS between AArch64 and nvptx backend.
>
> Since NUM_POLY_INT_COEFFS is 2 for aarch64, the streamed-out values for mode, 
> precision would be <4, 0> and <32, 0>
> respectively (streamed-out in bp_pack_poly_value). Both zeros come from 
> coeffs[1] of size and prec. While streaming in however,
> NUM_POLY_INT_COEFFS is 1 for nvptx, and thus it incorrectly treats <4, 0> as 
> size and precision respectively, which is why precision
> gets streamed in as 0, and thus it encounters the above ICE.
>
> Supporting non VLA code with offloading:
>
> In the general case, it's hard to support offloading for arbitrary poly_ints 
> when NUM_POLY_INT_COEFFS differs for host and accelerator.
> For example, it's not possible to represent a degree-2 poly_int like 4 + 4x 
> (as-is) on an accelerator with NUM_POLY_INT_COEFFS == 1.
>
> However, IIUC, we can support offloading for restricted set of poly_ints 
> whose degree <= accel's NUM_POLY_INT_COEFFS, since they can be
> represented on accelerator ? For a hypothetical example, if host 
> NUM_POLY_INT_COEFFS == 3 and accel NUM_POLY_INT_COEFFS == 2, then I suppose
> we could represent a degree 2 poly_int on accelerator, but not a degree 3 
> poly_int like 3+4x+5x^2 ?
>
> Based on that, I have come up with following approach in attached 
> "quick-and-dirty" patch (p-163-2.diff):
> Stream-out host NUM_POLY_INT_COEFFS, and while streaming-in during lto1, 
> compare it with accelerator's NUM_POLY_INT_COEFFS as follows:
>
> Stream in host_num_poly_int_coeffs;
> if (host_num_poly_int_coeffs == NUM_POLY_INT_COEFFS) // NUM_POLY_INT_COEFFS 
> represents accelerator's value here.
> {
> /* Both are equal, proceed to unpacking NUM_POLY_INT_COEFFS words from 
> bitstream.  */
> }
> else if (host_num_poly_int_coeffs < NUM_POLY_INT_COEFFS)
> {
> /* Unpack host_num_poly_int_coeffs words and zero out remaining higher 
> coeffs (similar to zero-extension).  */
> }
> else
> {
> /* Unpack host_num_poly_int_coeffs words and ensure that degree of 
> streamed-out poly_int <= NUM_POLY_INT_COEFFS.  */
> }
>
> For example, with host NUM_POLY_INT_COEFFS == 2 and accel NUM_POLY_INT_COEFFS 
> == 1, this will allow streaming of "degree-1" poly_ints
> like 4+0x (which will degenerate to constant 4), but give an error for 
> streaming degree-2 poly_int like 4+4x.
>
> Following this 

[gcc r15-2315] tree-optimization/116083 - improve behavior when SLP discovery limit is reached

2024-07-25 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:66240bfc1cc9c1f1b5a9d0ebf92be70a9ab1be5c

commit r15-2315-g66240bfc1cc9c1f1b5a9d0ebf92be70a9ab1be5c
Author: Richard Biener 
Date:   Thu Jul 25 13:39:49 2024 +0200

tree-optimization/116083 - improve behavior when SLP discovery limit is 
reached

The following avoids some useless work when the SLP discovery limit
is reached, for example allocating a node to cache the failure
and starting discovery on split store groups when analyzing BBs.

It does not address the issue in the PR which is a gratious budget
for discovery when the store group size approaches the number of
overall statements.

PR tree-optimization/116083
* tree-vect-slp.cc (vect_build_slp_tree): Do not allocate
a discovery fail node when we reached the discovery limit.
(vect_build_slp_instance): Terminate early when the
discovery limit is reached.

Diff:
---
 gcc/tree-vect-slp.cc | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 55ae496cbb2d..5f0d9e51c325 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1751,13 +1751,6 @@ vect_build_slp_tree (vec_info *vinfo,
   return NULL;
 }
 
-  /* Seed the bst_map with a stub node to be filled by vect_build_slp_tree_2
- so we can pick up backedge destinations during discovery.  */
-  slp_tree res = new _slp_tree;
-  SLP_TREE_DEF_TYPE (res) = vect_internal_def;
-  SLP_TREE_SCALAR_STMTS (res) = stmts;
-  bst_map->put (stmts.copy (), res);
-
   /* Single-lane SLP doesn't have the chance of run-away, do not account
  it to the limit.  */
   if (stmts.length () > 1)
@@ -1767,18 +1760,19 @@ vect_build_slp_tree (vec_info *vinfo,
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "SLP discovery limit exceeded\n");
- /* Mark the node invalid so we can detect those when still in use
-as backedge destinations.  */
- SLP_TREE_SCALAR_STMTS (res) = vNULL;
- SLP_TREE_DEF_TYPE (res) = vect_uninitialized_def;
- res->failed = XNEWVEC (bool, group_size);
- memset (res->failed, 0, sizeof (bool) * group_size);
  memset (matches, 0, sizeof (bool) * group_size);
  return NULL;
}
   --*limit;
 }
 
+  /* Seed the bst_map with a stub node to be filled by vect_build_slp_tree_2
+ so we can pick up backedge destinations during discovery.  */
+  slp_tree res = new _slp_tree;
+  SLP_TREE_DEF_TYPE (res) = vect_internal_def;
+  SLP_TREE_SCALAR_STMTS (res) = stmts;
+  bst_map->put (stmts.copy (), res);
+
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
 "starting SLP discovery for node %p\n", (void *) res);
@@ -3363,6 +3357,10 @@ vect_build_slp_instance (vec_info *vinfo,
 /* ???  We need stmt_info for group splitting.  */
 stmt_vec_info stmt_info_)
 {
+  /* If there's no budget left bail out early.  */
+  if (*limit == 0)
+return false;
+
   if (kind == slp_inst_kind_ctor)
 {
   if (dump_enabled_p ())
@@ -3520,7 +3518,7 @@ vect_build_slp_instance (vec_info *vinfo,
 
   stmt_vec_info stmt_info = stmt_info_;
   /* Try to break the group up into pieces.  */
-  if (kind == slp_inst_kind_store)
+  if (*limit > 0 && kind == slp_inst_kind_store)
 {
   /* ???  We could delay all the actual splitting of store-groups
 until after SLP discovery of the original group completed.


[gcc r15-2312] doc: Document -O1 as the preferred level for large machine-generated code

2024-07-25 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:634eae5ec3f3af2c4f6221d3ed2cf78d7f5c47f0

commit r15-2312-g634eae5ec3f3af2c4f6221d3ed2cf78d7f5c47f0
Author: Sam James 
Date:   Tue Jul 23 15:06:10 2024 +0100

doc: Document -O1 as the preferred level for large machine-generated code

At -O1, the intention is that we compile things in a "reasonable" amount
of time (ditto memory use). In particular, we try to especially avoid
optimizations which scale poorly on pathological cases, as is the case
for large machine-generated code.

Recommend -O1 for large machine-generated code, as has been informally
done on bugs for a while now.

This applies (broadly speaking) for both large machine-generated functions
but also to a lesser extent repetitive small-but-still-not-tiny functions
from a generator program.

gcc/ChangeLog:
PR middle-end/114855
* doc/invoke.texi (Optimize options): Mention machine-generated
code for -O1.

Diff:
---
 gcc/doc/invoke.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e0a641213ae4..9fb0925ed292 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12560,6 +12560,11 @@ With @option{-O}, the compiler tries to reduce code 
size and execution
 time, without performing any optimizations that take a great deal of
 compilation time.
 
+@option{-O} is the recommended optimization level for large machine-generated
+code as a sensible balance between time taken to compile and memory use:
+higher optimization levels perform optimizations with greater algorithmic
+complexity than at @option{-O}.
+
 @c Note that in addition to the default_options_table list in opts.cc,
 @c several optimization flags default to true but control optimization
 @c passes that are explicitly disabled at -O0.


[gcc r15-2311] tree-optimization/116081 - typedef vs. non-typedef in vectorization

2024-07-25 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:3f578dbac726d47b043b82606c47e5676c5d6a14

commit r15-2311-g3f578dbac726d47b043b82606c47e5676c5d6a14
Author: Richard Biener 
Date:   Thu Jul 25 12:46:30 2024 +0200

tree-optimization/116081 - typedef vs. non-typedef in vectorization

The following fixes the code generation difference when using
a typedef for the scalar type.  The issue is using a pointer
equality test for an INTEGER_CST which fails when the types
are different variants.

PR tree-optimization/116081
* tree-vect-loop.cc (get_initial_defs_for_reduction):
Use operand_equal_p for comparing the element with the
neutral op.

Diff:
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d7d628efa60f..856ce491c3ec 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5652,7 +5652,7 @@ get_initial_defs_for_reduction (loop_vec_info loop_vinfo,
  init = gimple_build_vector_from_val (_seq, vector_type,
   neutral_op);
  int k = nunits;
- while (k > 0 && elts[k - 1] == neutral_op)
+ while (k > 0 && operand_equal_p (elts[k - 1], neutral_op))
k -= 1;
  while (k > 0)
{


[gcc r15-2303] tree-optimization/116079 - store motion and clobbers

2024-07-25 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:3bf05516d9ffea2a39939b656f0e51052000653e

commit r15-2303-g3bf05516d9ffea2a39939b656f0e51052000653e
Author: Richard Biener 
Date:   Thu Jul 25 08:58:42 2024 +0200

tree-optimization/116079 - store motion and clobbers

When we move a store out of an inner loop and remove a clobber in
the process, analysis of the inner loop can run into the clobber
via the meta-data and crash when accessing its basic-block.  The
following avoids this by clearing the VDEF which is how it identifies
already processed stores.

PR tree-optimization/116079
* tree-ssa-loop-im.cc (hoist_memory_references): Clear
VDEF of elided clobbers.

* gcc.dg/torture/pr116079.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr116079.c | 20 
 gcc/tree-ssa-loop-im.cc |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/torture/pr116079.c 
b/gcc/testsuite/gcc.dg/torture/pr116079.c
new file mode 100644
index ..e9120969d918
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116079.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+char g_132;
+int g_701, g_1189, func_24___trans_tmp_15, func_24_l_2691;
+long func_24___trans_tmp_9;
+int *func_24_l_2684;
+void func_24() {
+  for (; g_1189;) {
+g_132 = 0;
+for (; g_132 < 6; ++g_132) {
+  func_24___trans_tmp_9 = *func_24_l_2684 = func_24_l_2691;
+  g_701 = 4;
+  for (; g_701; g_701 -= 1) {
+int l_2748[4];
+int si2 = l_2748[3];
+func_24___trans_tmp_15 = si2;
+  }
+}
+  }
+}
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index c53efbb8d597..ccc56dc42f61 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -2880,6 +2880,7 @@ hoist_memory_references (class loop *loop, bitmap 
mem_refs,
  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
  unlink_stmt_vdef (stmt);
  release_defs (stmt);
+ gimple_set_vdef (stmt, NULL_TREE);
  gsi_remove (, true);
}
 
@@ -3062,6 +3063,7 @@ hoist_memory_references (class loop *loop, bitmap 
mem_refs,
   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
   unlink_stmt_vdef (stmt);
   release_defs (stmt);
+  gimple_set_vdef (stmt, NULL_TREE);
   gsi_remove (, true);
 }


[gcc r15-2302] tree-optimization/116081 - typedef vs. non-typedef in vectorization

2024-07-25 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:cfd3f06b4c65e15d4f6af8bd4862b835efd61a72

commit r15-2302-gcfd3f06b4c65e15d4f6af8bd4862b835efd61a72
Author: Richard Biener 
Date:   Thu Jul 25 08:34:20 2024 +0200

tree-optimization/116081 - typedef vs. non-typedef in vectorization

The following addresses a behavioral difference in vector type
analysis for typedef vs. non-typedef.  It doesn't fix the issue
at hand but avoids a spurious difference in the dumps.

PR tree-optimization/116081
* tree-vect-stmts.cc (vect_get_vector_types_for_stmt):
Properly compare types.

Diff:
---
 gcc/tree-vect-stmts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d717704f57cc..20cae83e8206 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -14903,7 +14903,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
 vector size per vectorization).  */
   scalar_type = vect_get_smallest_scalar_type (stmt_info,
   TREE_TYPE (vectype));
-  if (scalar_type != TREE_TYPE (vectype))
+  if (!types_compatible_p (scalar_type, TREE_TYPE (vectype)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,


[gcc r15-2296] Maintain complex constraint vector order during PTA solving

2024-07-25 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:09de976f9bcab1d3018d5461ea2abb8a47f20528

commit r15-2296-g09de976f9bcab1d3018d5461ea2abb8a47f20528
Author: Richard Biener 
Date:   Tue Jul 23 14:05:47 2024 +0200

Maintain complex constraint vector order during PTA solving

There's a FIXME comment in the PTA constraint solver that the vector
of complex constraints can get unsorted which can lead to duplicate
entries piling up during node unification.  The following fixes this
with the assumption that delayed updates to constraints are uncommon
(otherwise re-sorting the whole vector would be more efficient).

* tree-ssa-structalias.cc (constraint_equal): Take const
reference to constraints.
(constraint_vec_find): Similar.
(solve_graph): Keep constraint vector sorted and verify
sorting with checking.

Diff:
---
 gcc/tree-ssa-structalias.cc | 73 +
 1 file changed, 61 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 65f9132a94fd..a32ef1d5cc0c 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -902,7 +902,7 @@ constraint_less (const constraint_t , const constraint_t 
)
 /* Return true if two constraints A and B are equal.  */
 
 static bool
-constraint_equal (struct constraint a, struct constraint b)
+constraint_equal (const constraint , const constraint )
 {
   return constraint_expr_equal (a.lhs, b.lhs)
 && constraint_expr_equal (a.rhs, b.rhs);
@@ -913,7 +913,7 @@ constraint_equal (struct constraint a, struct constraint b)
 
 static constraint_t
 constraint_vec_find (vec vec,
-struct constraint lookfor)
+constraint )
 {
   unsigned int place;
   constraint_t found;
@@ -2806,10 +2806,8 @@ solve_graph (constraint_graph_t graph)
 better visitation order in the next iteration.  */
  while (bitmap_clear_bit (changed, i))
{
- unsigned int j;
- constraint_t c;
  bitmap solution;
- vec complex = graph->complex[i];
+ vec  = graph->complex[i];
  varinfo_t vi = get_varinfo (i);
  bool solution_empty;
 
@@ -2845,23 +2843,73 @@ solve_graph (constraint_graph_t graph)
  solution_empty = bitmap_empty_p (solution);
 
  /* Process the complex constraints */
+ hash_set *cvisited = nullptr;
+ if (flag_checking)
+   cvisited = new hash_set;
  bitmap expanded_pts = NULL;
- FOR_EACH_VEC_ELT (complex, j, c)
+ for (unsigned j = 0; j < complex.length (); ++j)
{
- /* XXX: This is going to unsort the constraints in
-some cases, which will occasionally add duplicate
-constraints during unification.  This does not
-affect correctness.  */
- c->lhs.var = find (c->lhs.var);
- c->rhs.var = find (c->rhs.var);
+ constraint_t c = complex[j];
+ /* At unification time only the directly involved nodes
+will get their complex constraints updated.  Update
+our complex constraints now but keep the constraint
+vector sorted and clear of duplicates.  Also make
+sure to evaluate each prevailing constraint only once.  */
+ unsigned int new_lhs = find (c->lhs.var);
+ unsigned int new_rhs = find (c->rhs.var);
+ if (c->lhs.var != new_lhs || c->rhs.var != new_rhs)
+   {
+ constraint tem = *c;
+ tem.lhs.var = new_lhs;
+ tem.rhs.var = new_rhs;
+ unsigned int place
+   = complex.lower_bound (, constraint_less);
+ c->lhs.var = new_lhs;
+ c->rhs.var = new_rhs;
+ if (place != j)
+   {
+ complex.ordered_remove (j);
+ if (j < place)
+   --place;
+ if (place < complex.length ())
+   {
+ if (constraint_equal (*complex[place], *c))
+   {
+ j--;
+ continue;
+   }
+ else
+   complex.safe_insert (place, c);
+   }
+ else
+   complex.quick_push (c);
+ if (place > j)
+   {
+ j--;
+ continue;
+   }
+   }
+   }
 
 

[gcc r15-2255] tree-optimization/116057 - wrong code with CCP and vector CTORs

2024-07-24 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:1ea551514b9c285d801ac5ab8d78b22483ff65af

commit r15-2255-g1ea551514b9c285d801ac5ab8d78b22483ff65af
Author: Richard Biener 
Date:   Wed Jul 24 13:16:35 2024 +0200

tree-optimization/116057 - wrong code with CCP and vector CTORs

The following fixes an issue with CCPs likely_value when faced with
a vector CTOR containing undef SSA names and constants.  This should
be classified as CONSTANT and not UNDEFINED.

PR tree-optimization/116057
* tree-ssa-ccp.cc (likely_value): Also walk CTORs in stmt
operands to look for constants.

* gcc.dg/torture/pr116057.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr116057.c | 20 
 gcc/tree-ssa-ccp.cc | 11 +++
 2 files changed, 31 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/torture/pr116057.c 
b/gcc/testsuite/gcc.dg/torture/pr116057.c
new file mode 100644
index ..a7021c8e746e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116057.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-additional-options "-Wno-psabi" } */
+
+#define vect8 __attribute__((vector_size(8)))
+
+vect8 int __attribute__((noipa))
+f(int a)
+{
+  int b;
+  vect8 int t={1,1};
+  if(a) return t;
+  return (vect8 int){0, b};
+}
+
+int main ()
+{
+  if (f(0)[0] != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index de83d26d311a..44711018e0ef 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -762,6 +762,17 @@ likely_value (gimple *stmt)
continue;
   if (is_gimple_min_invariant (op))
has_constant_operand = true;
+  else if (TREE_CODE (op) == CONSTRUCTOR)
+   {
+ unsigned j;
+ tree val;
+ FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (op), j, val)
+   if (CONSTANT_CLASS_P (val))
+ {
+   has_constant_operand = true;
+   break;
+ }
+   }
 }
 
   if (has_constant_operand)


Re: Preparation for this weeks call

2024-07-24 Thread Richard Biener via Gcc
On Wed, Jul 24, 2024 at 9:59 AM Thor Preimesberger
 wrote:
>
> Sure - we actually already emit json in optinfo-emit-json.cc,  and there are 
> implementations of json and pretty-printing/dumping it out also. I got a 
> hacky version of our current raw dump working with json objects, but using 
> the functions and data structures in tree-dump.* "as is" would require 
> further processing of the dump's output (and I think some further 
> modification) in order to make linking related nodes possible - I think, at 
> least. This seems expensive computationally, so I'm currently re-implementing 
> the function dump_generic_nodes from tree-pretty-print.cc so that it emits 
> JSON instead. tree-dump.cc doesn't currently handle all the current tree 
> codes in dequeue_and_dump. This approach does unfortunately lead us to having 
> another spot in the code base that needs to be synced with the tree data 
> structure. (I didn't take a long look at tree-streamer* to see if anything 
> there would be helpful, but it looks like we'd have to enumerate over the 
> codes anyways to get JSON output.)
>
> No patch yet - I'll submit one once the JSON dumping is ready, and others 
> that process it as appropriate. I've been pushing to get as much done before 
> Wednesday, so I'll have to get around to pushing onto the git fork what I've 
> done so far tomorrow / later today.

I'd like to actually see some of the progress here, even if
incomplete;  even if there's two versions (Tree-dump and
dump_generic_nodes).

> I wanna emphasize that I started a bit late on this since my academic term 
> didn't end until around a month after the coding period began. We've already 
> extended the project to accommodate for that, but it places the mid point 
> review to three weeks in a project that is expected to take 12 weeks. We can 
> still address the rate of progress if you still feel trepidacious about the 
> pace of things.

There are 8 weeks left until final code submission, so we're 1/3rd
into the project.  That would be 4 weeks, but I understand
that it's more like three on your side.  At this point there should be
a clear path forward, thus the design should be set.  It's
a bit unfortunate that there's nothing written down in code or e-mail
from your side that shows progress there - I hope we
can sort this out later today.

Richard.

>
> Thor
>
>
>
> On Mon, Jul 22, 2024 at 3:26 AM Richard Biener  
> wrote:
> >
> > Hi,
> >
> > we're having our bi-weekly call this Wednesday;  I'd like to see you write a
> > summary of what you were doing for the first half of the project and post
> > that to me and the GCC mailing list.  Please also send, if appropriate, a
> > patch that shows what you have done sofar.
> >
> > Thanks a lot,
> > Richard.


[gcc r15-2223] tree-optimization/116002 - PTA solving slow with degenerate graph

2024-07-23 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:15d3b2dab9182eff036a604169b5e6f4ab3b2a40

commit r15-2223-g15d3b2dab9182eff036a604169b5e6f4ab3b2a40
Author: Richard Biener 
Date:   Tue Jul 23 10:29:58 2024 +0200

tree-optimization/116002 - PTA solving slow with degenerate graph

When the constraint graph consists of N nodes with only complex
constraints and no copy edges we have to be lucky to arrive at
a constraint solving order that requires the optimal number of
iterations.  What happens in the testcase is that we bottle-neck
on computing the visitation order but propagate changes only
very slowly.  Luckily the testcase complex constraints are
all copy-with-offset and those do provide a way to order
visitation.  The following adds this which reduces the iteration
count to one.

PR tree-optimization/116002
* tree-ssa-structalias.cc (topo_visit): Also consider
SCALAR = SCALAR complex constraints as edges.

Diff:
---
 gcc/tree-ssa-structalias.cc | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 330e64e65da1..65f9132a94fd 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -1908,6 +1908,18 @@ topo_visit (constraint_graph_t graph, vec 
_order,
  topo_visit (graph, topo_order, visited, k);
   }
 
+  /* Also consider copy with offset complex constraints as implicit edges.  */
+  for (auto c : graph->complex[n])
+{
+  /* Constraints are ordered so that SCALAR = SCALAR appear first.  */
+  if (c->lhs.type != SCALAR || c->rhs.type != SCALAR)
+   break;
+  gcc_checking_assert (c->rhs.var == n);
+  unsigned k = find (c->lhs.var);
+  if (!bitmap_bit_p (visited, k))
+   topo_visit (graph, topo_order, visited, k);
+}
+
   topo_order.quick_push (n);
 }


[gcc r15-2218] [v2] rtl-optimization/116002 - cselib hash is bad

2024-07-23 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:44e065a52fa6069d6c8cacebc8f876840d278dd0

commit r15-2218-g44e065a52fa6069d6c8cacebc8f876840d278dd0
Author: Richard Biener 
Date:   Fri Jul 19 16:23:51 2024 +0200

[v2] rtl-optimization/116002 - cselib hash is bad

The following addresses the bad hash function of cselib which uses
integer plus for merging.  This causes a huge number of collisions
for the testcase in the PR and thus very large compile-time.

The following rewrites it to use inchash, eliding duplicate mixing
of RTX code and mode in some cases and more consistently avoiding
a return value of zero as well as treating zero as fatal.  An
important part is to preserve mixing of hashes of commutative
operators as commutative.

For cselib_hash_plus_const_int this removes the apparent attempt
of making sure to hash the same as a PLUS as cselib_hash_rtx makes
sure to dispatch to cselib_hash_plus_const_int consistently.

This reduces compile-time for the testcase in the PR from unknown
to 22s and for a reduced testcase from 73s to 9s.  There's another
pending patchset to improve the speed of inchash mixing, but it's
not in the profile for this testcase (PTA pops up now).

The generated code is equal.  I've also compared cc1 builds
with and without the patch and they are now commparing equal
after retaining commutative hashing for commutative operators.

PR rtl-optimization/116002
* cselib.cc (cselib_hash_rtx): Use inchash to get proper mixing.
Consistently avoid a zero return value when hashing successfully.
Consistently treat a zero hash value from recursing as fatal.
Use hashval_t where appropriate.
(cselib_hash_plus_const_int): Likewise.
(new_cselib_val): Use hashval_t.
(cselib_lookup_1): Likewise.

Diff:
---
 gcc/cselib.cc | 224 --
 1 file changed, 122 insertions(+), 102 deletions(-)

diff --git a/gcc/cselib.cc b/gcc/cselib.cc
index cbaab7d515cc..7beaca424244 100644
--- a/gcc/cselib.cc
+++ b/gcc/cselib.cc
@@ -51,7 +51,7 @@ static void unchain_one_value (cselib_val *);
 static void unchain_one_elt_list (struct elt_list **);
 static void unchain_one_elt_loc_list (struct elt_loc_list **);
 static void remove_useless_values (void);
-static unsigned int cselib_hash_rtx (rtx, int, machine_mode);
+static hashval_t cselib_hash_rtx (rtx, int, machine_mode);
 static cselib_val *new_cselib_val (unsigned int, machine_mode, rtx);
 static void add_mem_for_addr (cselib_val *, cselib_val *, rtx);
 static cselib_val *cselib_lookup_mem (rtx, int);
@@ -1244,7 +1244,7 @@ cselib_redundant_set_p (rtx set)
 /* Helper function for cselib_hash_rtx.  Arguments like for cselib_hash_rtx,
except that it hashes (plus:P x c).  */
 
-static unsigned int
+static hashval_t
 cselib_hash_plus_const_int (rtx x, HOST_WIDE_INT c, int create,
machine_mode memmode)
 {
@@ -1266,14 +1266,13 @@ cselib_hash_plus_const_int (rtx x, HOST_WIDE_INT c, int 
create,
   if (c == 0)
 return e->hash;
 
-  unsigned hash = (unsigned) PLUS + (unsigned) GET_MODE (x);
-  hash += e->hash;
-  unsigned int tem_hash = (unsigned) CONST_INT + (unsigned) VOIDmode;
-  tem_hash += ((unsigned) CONST_INT << 7) + (unsigned HOST_WIDE_INT) c;
-  if (tem_hash == 0)
-tem_hash = (unsigned int) CONST_INT;
-  hash += tem_hash;
-  return hash ? hash : 1 + (unsigned int) PLUS;
+  inchash::hash hash;
+  hash.add_int (PLUS);
+  hash.add_int (GET_MODE (x));
+  hash.merge_hash (e->hash);
+  hash.add_hwi (c);
+
+  return hash.end () ? hash.end () : 1 + (unsigned int) PLUS;
 }
 
 /* Hash an rtx.  Return 0 if we couldn't hash the rtx.
@@ -1298,7 +1297,7 @@ cselib_hash_plus_const_int (rtx x, HOST_WIDE_INT c, int 
create,
If the mode is important in any context, it must be checked specifically
in a comparison anyway, since relying on hash differences is unsafe.  */
 
-static unsigned int
+static hashval_t
 cselib_hash_rtx (rtx x, int create, machine_mode memmode)
 {
   cselib_val *e;
@@ -1306,10 +1305,11 @@ cselib_hash_rtx (rtx x, int create, machine_mode 
memmode)
   int i, j;
   enum rtx_code code;
   const char *fmt;
-  unsigned int hash = 0;
+  inchash::hash hash;
 
   code = GET_CODE (x);
-  hash += (unsigned) code + (unsigned) GET_MODE (x);
+  hash.add_int (code);
+  hash.add_int (GET_MODE (x));
 
   switch (code)
 {
@@ -1326,19 +1326,16 @@ cselib_hash_rtx (rtx x, int create, machine_mode 
memmode)
   return e->hash;
 
 case DEBUG_EXPR:
-  hash += ((unsigned) DEBUG_EXPR << 7)
- + DEBUG_TEMP_UID (DEBUG_EXPR_TREE_DECL (x));
-  return hash ? hash : (unsigned int) DEBUG_EXPR;
+  hash.add_int (DEBUG_TEMP_UID (DEBUG_EXPR_TREE_DECL (x)));
+  return hash.end () ? hash.end() : (unsigned int) DEBUG_EXPR;
 
 case DEBUG_IMPLICIT_PTR:
-  hash += ((unsigned) DEBUG_IMPLICIT_PTR << 

[gcc r12-10636] Fixup unaligned load/store cost for znver4

2024-07-23 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:f78eb9524bd97679c8baa47a62e82147272719ae

commit r12-10636-gf78eb9524bd97679c8baa47a62e82147272719ae
Author: Richard Biener 
Date:   Mon Jul 15 13:01:24 2024 +0200

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

(cherry picked from commit 1e3aa9c9278db69d4bdb661a750a7268789188d6)

Diff:
---
 gcc/config/i386/x86-tune-costs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index f105d57cae79..d58827888994 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1894,8 +1894,8 @@ struct processor_costs znver4_cost = {
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
   {8, 8, 8, 12, 12},   /* cost of storing SSE register
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
-  {6, 6, 6, 6, 6}, /* cost of unaligned loads.  */
-  {8, 8, 8, 8, 8}, /* cost of unaligned stores.  */
+  {6, 6, 10, 10, 12},  /* cost of unaligned loads.  */
+  {8, 8, 8, 12, 12},   /* cost of unaligned stores.  */
   2, 2, 2, /* cost of moving XMM,YMM,ZMM
   register.  */
   6,   /* cost of moving SSE register to 
integer.  */


[gcc r13-8936] Fixup unaligned load/store cost for znver4

2024-07-23 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:b35276655e6767a6e037e58edfa4738317498337

commit r13-8936-gb35276655e6767a6e037e58edfa4738317498337
Author: Richard Biener 
Date:   Mon Jul 15 13:01:24 2024 +0200

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

(cherry picked from commit 1e3aa9c9278db69d4bdb661a750a7268789188d6)

Diff:
---
 gcc/config/i386/x86-tune-costs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 4f7a67ca5c5e..14c5507a601f 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1924,8 +1924,8 @@ struct processor_costs znver4_cost = {
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
   {8, 8, 8, 12, 12},   /* cost of storing SSE register
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
-  {6, 6, 6, 6, 6}, /* cost of unaligned loads.  */
-  {8, 8, 8, 8, 8}, /* cost of unaligned stores.  */
+  {6, 6, 10, 10, 12},  /* cost of unaligned loads.  */
+  {8, 8, 8, 12, 12},   /* cost of unaligned stores.  */
   2, 2, 2, /* cost of moving XMM,YMM,ZMM
   register.  */
   6,   /* cost of moving SSE register to 
integer.  */


[gcc r15-2195] Fix hash of WIDEN_*_EXPR

2024-07-22 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:a8e61cd71f0cda04d583722ca4c66301358b01e1

commit r15-2195-ga8e61cd71f0cda04d583722ca4c66301358b01e1
Author: Richard Biener 
Date:   Mon Jul 22 11:07:28 2024 +0200

Fix hash of WIDEN_*_EXPR

We're hashing operand 2 to the temporary hash.

* fold-const.cc (operand_compare::hash_operand): Fix hash
of WIDEN_*_EXPR.

Diff:
---
 gcc/fold-const.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 83c32dd10d4a..8908e7381e72 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -4123,7 +4123,7 @@ operand_compare::hash_operand (const_tree t, 
inchash::hash ,
hash_operand (TREE_OPERAND (t, 0), one, flags);
hash_operand (TREE_OPERAND (t, 1), two, flags);
hstate.add_commutative (one, two);
-   hash_operand (TREE_OPERAND (t, 2), two, flags);
+   hash_operand (TREE_OPERAND (t, 2), hstate, flags);
return;
  }


[gcc r15-2194] constify inchash

2024-07-22 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:1e32a8be69d3f91e45193a0a1aa0dcae7ebe0acb

commit r15-2194-g1e32a8be69d3f91e45193a0a1aa0dcae7ebe0acb
Author: Richard Biener 
Date:   Mon Jul 22 11:09:03 2024 +0200

constify inchash

The following constifies parts of inchash.

* inchash.h (inchash::end): Make const.
(inchash::merge): Take const reference hash argument.
(inchash::add_commutative): Likewise.

Diff:
---
 gcc/inchash.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/inchash.h b/gcc/inchash.h
index e88f9b5eac18..82f50eb0f258 100644
--- a/gcc/inchash.h
+++ b/gcc/inchash.h
@@ -46,7 +46,7 @@ class hash
   }
 
   /* End incremential hashing and provide the final value.  */
-  hashval_t end ()
+  hashval_t end () const
   {
 return val;
   }
@@ -109,7 +109,7 @@ class hash
   }
 
   /* Hash in state from other inchash OTHER.  */
-  void merge (hash )
+  void merge (const hash )
   {
 merge_hash (other.val);
   }
@@ -136,7 +136,7 @@ class hash
  based on their value. This is useful for hashing commutative
  expressions, so that A+B and B+A get the same hash.  */
 
-  void add_commutative (hash , hash )
+  void add_commutative (const hash , const hash )
   {
 if (a.end() > b.end())
   {


GCC 11.5 Released

2024-07-19 Thread Richard Biener via Gcc
The GNU Compiler Collection version 11.5 has been released.

GCC 11.5 is a bug-fix release from the GCC 11 branch
containing important fixes for regressions and serious bugs in
GCC 11.4 with more than 157 bugs fixed since the previous release.

This is also the last release from the GCC 11 branch, GCC continues
to be maintained on the GCC 12, GCC 13 and GCC 14 branches and the
development trunk.

This release is available from the FTP servers listed here:

  https://gcc.gnu.org/pub/gcc/releases/gcc-11.5.0/
  https://gcc.gnu.org/mirrors.html

Please do not contact me directly regarding questions or comments
about this release.  Instead, use the resources available from
http://gcc.gnu.org.

As always, a vast number of people contributed to this GCC release
-- far too many to thank them individually!


[gcc r15-2151] Close GCC 11 branch

2024-07-19 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:a589d3bfe5026ae8ed842bde48b6a5fd74690cf2

commit r15-2151-ga589d3bfe5026ae8ed842bde48b6a5fd74690cf2
Author: Richard Biener 
Date:   Fri Jul 19 07:58:28 2024 +0200

Close GCC 11 branch

Remove gcc-11 branch from updating and snapshot generating

contrib/
* gcc-changelog/git_update_version.py: Remove gcc-11 branch.

maintainer-scripts/
* crontab: Remove entry for gcc-11 branch.

Diff:
---
 contrib/gcc-changelog/git_update_version.py | 2 +-
 maintainer-scripts/crontab  | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/contrib/gcc-changelog/git_update_version.py 
b/contrib/gcc-changelog/git_update_version.py
index c69a3a6897a6..ec06fc965f8a 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -80,7 +80,7 @@ def prepend_to_changelog_files(repo, folder, git_commit, 
add_to_git):
 repo.git.add(full_path)
 
 
-active_refs = ['master', 'releases/gcc-11',
+active_refs = ['master',
'releases/gcc-12', 'releases/gcc-13', 'releases/gcc-14']
 
 parser = argparse.ArgumentParser(description='Update DATESTAMP and generate '
diff --git a/maintainer-scripts/crontab b/maintainer-scripts/crontab
index 322778ab23f8..7bb73625bd88 100644
--- a/maintainer-scripts/crontab
+++ b/maintainer-scripts/crontab
@@ -1,7 +1,6 @@
 16  0 * * * sh /home/gccadmin/scripts/update_version_git
 50  0 * * * sh /home/gccadmin/scripts/update_web_docs_git
 55  0 * * * sh /home/gccadmin/scripts/update_web_docs_libstdcxx_git
-32 22 * * 3 sh /home/gccadmin/scripts/gcc_release -s 11:releases/gcc-11 -l -d 
/sourceware/snapshot-tmp/gcc all
 32 22 * * 4 sh /home/gccadmin/scripts/gcc_release -s 12:releases/gcc-12 -l -d 
/sourceware/snapshot-tmp/gcc all
 32 22 * * 5 sh /home/gccadmin/scripts/gcc_release -s 13:releases/gcc-13 -l -d 
/sourceware/snapshot-tmp/gcc all
 32 22 * * 6 sh /home/gccadmin/scripts/gcc_release -s 14:releases/gcc-14 -l -d 
/sourceware/snapshot-tmp/gcc all


GCC 11.5 Released

2024-07-19 Thread Richard Biener via gcc-announce
The GNU Compiler Collection version 11.5 has been released.

GCC 11.5 is a bug-fix release from the GCC 11 branch
containing important fixes for regressions and serious bugs in
GCC 11.4 with more than 157 bugs fixed since the previous release.

This is also the last release from the GCC 11 branch, GCC continues
to be maintained on the GCC 12, GCC 13 and GCC 14 branches and the
development trunk.

This release is available from the FTP servers listed here:

  https://gcc.gnu.org/pub/gcc/releases/gcc-11.5.0/
  https://gcc.gnu.org/mirrors.html

Please do not contact me directly regarding questions or comments
about this release.  Instead, use the resources available from
http://gcc.gnu.org.

As always, a vast number of people contributed to this GCC release
-- far too many to thank them individually!


gcc-wwwdocs branch master updated. d186078d0c62049753676d16bb0e317572a7e82c

2024-07-19 Thread Richard Biener via Gcc-cvs-wwwdocs
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  d186078d0c62049753676d16bb0e317572a7e82c (commit)
  from  ca1edf11223f1ff1ac0f9a4ba7d4532f03d55086 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit d186078d0c62049753676d16bb0e317572a7e82c
Author: Richard Biener 
Date:   Fri Jul 19 08:24:04 2024 +0200

Update main page for the GCC 11.5 release and branch closing.

diff --git a/htdocs/index.html b/htdocs/index.html
index a75656b1..44396224 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -55,6 +55,10 @@ mission statement.
 News
 
 
+GCC 11.5 released
+[2024-07-19]
+
+
 GCC 12.4 released
 [2024-06-20]
 
@@ -162,10 +166,10 @@ More news? Let ger...@pfeifer.com know!
   (regression fixes  docs only).
   
   https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedshort_desc_type=regexpshort_desc=%5C%5B(%5B%200-9.%2F%5D*%5B%20%2F%5D)*14%5B%20%2F%5D%5B%200-9.%2F%5D*%5BRr%5Degression%20*%5C%5Dtarget_milestone=11.5target_milestone=12.5target_milestone=13.4target_milestone=14.2known_to_fail_type=allwordssubstrknown_to_work_type=allwordssubstrlong_desc_type=allwordssubstrlong_desc=bug_file_loc_type=allwordssubstrbug_file_loc=gcchost_type=allwordssubstrgcchost=gcctarget_type=allwordssubstrgcctarget=gccbuild_type=allwordssubstrgccbuild=keywords_type=allwordskeywords=bug_status=UNCONFIRMEDbug_status=NEWbug_status=ASSIGNEDbug_status=SUSPENDEDbug_status=WAITINGbug_status=REOPENEDpriority=P1priority=P2priority=P3emailtype1=substringemail1=emailtype2=substringemail2=bugidtype=includebug_id=votes=chfieldf
 
rom=chfieldto=Nowchfieldvalue=cmdtype=doitorder=Reuse+same+sort+as+last+timefield0-0-0=nooptype0-0-0=noopvalue0-0-0=">Serious
+  
href="https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedshort_desc_type=regexpshort_desc=%5C%5B(%5B%200-9.%2F%5D*%5B%20%2F%5D)*14%5B%20%2F%5D%5B%200-9.%2F%5D*%5BRr%5Degression%20*%5C%5Dtarget_milestone=12.5target_milestone=13.4target_milestone=14.2known_to_fail_type=allwordssubstrknown_to_work_type=allwordssubstrlong_desc_type=allwordssubstrlong_desc=bug_file_loc_type=allwordssubstrbug_file_loc=gcchost_type=allwordssubstrgcchost=gcctarget_type=allwordssubstrgcctarget=gccbuild_type=allwordssubstrgccbuild=keywords_type=allwordskeywords=bug_status=UNCONFIRMEDbug_status=NEWbug_status=ASSIGNEDbug_status=SUSPENDEDbug_status=WAITINGbug_status=REOPENEDpriority=P1priority=P2priority=P3emailtype1=substringemail1=emailtype2=substringemail2=bugidtype=includebug_id=votes=chfieldfrom=chfieldto=Now
 
;chfieldvalue=cmdtype=doitorder=Reuse+same+sort+as+last+timefield0-0-0=nooptype0-0-0=noopvalue0-0-0=">Serious
   regressions.
   https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedshort_desc_type=regexpshort_desc=%5C%5B(%5B%200-9.%2F%5D*%5B%20%2F%5D)*14%5B%20%2F%5D%5B%200-9.%2F%5D*%5BRr%5Degression%20*%5C%5Dtarget_milestone=11.5target_milestone=12.5target_milestone=13.4target_milestone=14.2known_to_fail_type=allwordssubstrknown_to_work_type=allwordssubstrlong_desc_type=allwordssubstrlong_desc=bug_file_loc_type=allwordssubstrbug_file_loc=gcchost_type=allwordssubstrgcchost=gcctarget_type=allwordssubstrgcctarget=gccbuild_type=allwordssubstrgccbuild=keywords_type=allwordskeywords=bug_status=UNCONFIRMEDbug_status=NEWbug_status=ASSIGNEDbug_status=SUSPENDEDbug_status=WAITINGbug_status=REOPENEDemailtype1=substringemail1=emailtype2=substringemail2=bugidtype=includebug_id=votes=chfieldfrom=chfieldto=Nowchfieldvalue=cmd
 
type=doitorder=Reuse+same+sort+as+last+timefield0-0-0=nooptype0-0-0=noopvalue0-0-0=">All
+  
href="https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedshort_desc_type=regexpshort_desc=%5C%5B(%5B%200-9.%2F%5D*%5B%20%2F%5D)*14%5B%20%2F%5D%5B%200-9.%2F%5D*%5BRr%5Degression%20*%5C%5Dtarget_milestone=12.5target_milestone=13.4target_milestone=14.2known_to_fail_type=allwordssubstrknown_to_work_type=allwordssubstrlong_desc_type=allwordssubstrlong_desc=bug_file_loc_type=allwordssubstrbug_file_loc=gcchost_type=allwordssubstrgcchost=gcctarget_type=allwordssubstrgcctarget=gccbuild_type=allwordssubstrgccbuild=keywords_type=allwordskeywords=bug_status=UNCONFIRMEDbug_status=NEWbug_status=ASSIGNEDbug_status=SUSPENDEDbug_status=WAITINGbug_status=REOPENEDemailtype1=substringemail1=emailtype2=substringemail2=bugidtype=includebug_id=votes=chfieldfrom=chfieldto=Nowchfieldvalue=cmdtype=doitorder=Reuse+
 
same+sort+as+last+timefield0-0-0=nooptype0-0-0=noopvalue0-0-0=">All
   regressions.
   
 
@@ -180,10 +184,10 @@ More news? Let ger...@pfeifer.com know!
   (regression fixes  docs only).
   
   

gcc-wwwdocs branch master updated. ca1edf11223f1ff1ac0f9a4ba7d4532f03d55086

2024-07-19 Thread Richard Biener via Gcc-cvs-wwwdocs
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  ca1edf11223f1ff1ac0f9a4ba7d4532f03d55086 (commit)
  from  9f8c1b6b5945a5dfe2a6e17fa2c066550333ee97 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit ca1edf11223f1ff1ac0f9a4ba7d4532f03d55086
Author: Richard Biener 
Date:   Fri Jul 19 08:10:35 2024 +0200

Update for the GCC 11.5 release

diff --git a/htdocs/develop.html b/htdocs/develop.html
index effb6047..e98b442e 100644
--- a/htdocs/develop.html
+++ b/htdocs/develop.html
@@ -690,7 +690,9 @@ stages of development, branch points, and releases:
|  \
|   v
|   GCC 11.4 release (2023-05-29)
-   +-- GCC 12 branch created ---+
+   |\
+   | v
+   +-- GCC 12 branch created ---+  GCC 11.5 release (2024-07-19)
| \
|  v
   GCC 13 Stage 1 (starts 2022-04-28)   GCC 12.1 release (2022-05-06)
diff --git a/htdocs/releases.html b/htdocs/releases.html
index 15854b11..a10cc58d 100644
--- a/htdocs/releases.html
+++ b/htdocs/releases.html
@@ -33,6 +33,7 @@ releases and an alternative view of the release history.
 
 
 ReleaseRelease date
+GCC 11.5 July 19, 2024
 GCC 12.4 June 20, 2024
 GCC 13.3 May 21, 2024
 GCC 14.1 May 6, 2024

---

Summary of changes:
 htdocs/develop.html  | 4 +++-
 htdocs/releases.html | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)


hooks/post-receive
-- 
gcc-wwwdocs


gcc-wwwdocs branch master updated. 9f8c1b6b5945a5dfe2a6e17fa2c066550333ee97

2024-07-19 Thread Richard Biener via Gcc-cvs-wwwdocs
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  9f8c1b6b5945a5dfe2a6e17fa2c066550333ee97 (commit)
  from  24e35757dcf718697fe2dce121e6d4ae8cdb6d14 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 9f8c1b6b5945a5dfe2a6e17fa2c066550333ee97
Author: Richard Biener 
Date:   Fri Jul 19 08:09:16 2024 +0200

Update online docs for GCC 11.5 release

diff --git a/htdocs/onlinedocs/11.5.0/index.html 
b/htdocs/onlinedocs/11.5.0/index.html
new file mode 100644
index ..81b590ac
--- /dev/null
+++ b/htdocs/onlinedocs/11.5.0/index.html
@@ -0,0 +1,90 @@
+
+
+
+
+
+GCC 11.5 manuals
+https://gcc.gnu.org/gcc.css;>
+
+
+
+
+11.5 manuals
+  
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gcc/;>GCC
+ 11.5 Manual (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gcc.pdf;>also
+ in PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gcc.ps.gz;>PostScript or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gcc-html.tar.gz;>an
+ HTML tarball)
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gfortran/;>GCC
+ 11.5 GNU Fortran Manual (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gfortran.pdf;>also
+ in PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gfortran.ps.gz;>PostScript 
or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gfortran-html.tar.gz;>an
+ HTML tarball)
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/cpp/;>GCC 
+ 11.5 CPP Manual (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/cpp.pdf;>also
+ in PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/cpp.ps.gz;>PostScript or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/cpp-html.tar.gz;>an
+ HTML tarball)
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gnat_rm/;>GCC
+ 11.5 GNAT Reference Manual (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gnat_rm.pdf;>also
+ in PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gnat_rm.ps.gz;>PostScript 
or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gnat_rm-html.tar.gz;>an
+ HTML tarball)
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gnat_ugn/;>GCC
+ 11.5 GNAT User's Guide (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gnat_ugn.pdf;>also
+ in PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gnat_ugn.ps.gz;>PostScript 
or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gnat_ugn-html.tar.gz;>an
+ HTML tarball)
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libstdc++/manual/;>GCC
+ 11.5 Standard C++ Library Manual  (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libstdc++-manual.pdf.gz;>also
+ in PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libstdc++-manual.xml.gz;>XML
 or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libstdc++-manual-html.tar.gz;>an
+ HTML tarball)
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libstdc++/api/;>GCC
+ 11.5 Standard C++ Library Reference Manual  (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libstdc++-api.pdf.gz;>also
+ in PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libstdc++-api.xml.gz;>XML 
GPL or
+ https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libstdc++-api-gfdl.xml.gz;>XML 
GFDL or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libstdc++-api-html.tar.gz;>an
+ HTML tarball)
+   https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gccgo/;>GCCGO 11.5 
Manual (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gccgo.pdf;>also in
+   PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gccgo.ps.gz;>PostScript or 
https://gcc.gnu.org/onlinedocs/gcc-11.5.0/gccgo-html.tar.gz;>an
+   HTML tarball)
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libgomp/;>GCC 11.5
+ GNU Offloading and Multi Processing Runtime Library Manual (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libgomp.pdf;>also in
+ PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libgomp.ps.gz;>PostScript 
or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libgomp-html.tar.gz;>an
+ HTML tarball)
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libquadmath/;>GCC 
11.5
+ Quad-Precision Math Library Manual (https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libquadmath.pdf;>also 
in
+ PDF or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libquadmath.ps.gz;>PostScript
 or https://gcc.gnu.org/onlinedocs/gcc-11.5.0/libquadmath-html.tar.gz;>an
+ HTML tarball)
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/jit/;>GCC 11.5 JIT
+ Library
+https://gcc.gnu.org/onlinedocs/gcc-11.5.0/docs-sources.tar.gz;>Texinfo
+ sources of all the GCC 11.5 manuals
+  
+
diff --git a/htdocs/onlinedocs/index.html b/htdocs/onlinedocs/index.html
index 862f9e32..8df1611e 100644
--- a/htdocs/onlinedocs/index.html
+++ b/htdocs/onlinedocs/index.html
@@ -286,83 +286,83 @@
 
 
 
-  GCC 11.4 

[gcc] Created tag 'releases/gcc-11.5.0'

2024-07-18 Thread Richard Biener via Gcc-cvs
The signed tag 'releases/gcc-11.5.0' was created pointing to:

 5cc4c42a0d4d... Update ChangeLog and version files for release

Tagger: Richard Biener 
Date: Fri Jul 19 05:53:40 2024 +

GCC 11.5.0 release


[gcc r11-11584] Update ChangeLog and version files for release

2024-07-18 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:5cc4c42a0d4de08715c2eef8715ad5b2e92a23b6

commit r11-11584-g5cc4c42a0d4de08715c2eef8715ad5b2e92a23b6
Author: Richard Biener 
Date:   Fri Jul 19 05:53:33 2024 +

Update ChangeLog and version files for release

Diff:
---
 ChangeLog | 4 
 c++tools/ChangeLog| 4 
 config/ChangeLog  | 4 
 contrib/ChangeLog | 4 
 contrib/header-tools/ChangeLog| 4 
 contrib/reghunt/ChangeLog | 4 
 contrib/regression/ChangeLog  | 4 
 fixincludes/ChangeLog | 4 
 gcc/BASE-VER  | 2 +-
 gcc/ChangeLog | 4 
 gcc/ada/ChangeLog | 4 
 gcc/analyzer/ChangeLog| 4 
 gcc/brig/ChangeLog| 4 
 gcc/c-family/ChangeLog| 4 
 gcc/c/ChangeLog   | 4 
 gcc/cp/ChangeLog  | 4 
 gcc/d/ChangeLog   | 4 
 gcc/fortran/ChangeLog | 4 
 gcc/go/ChangeLog  | 4 
 gcc/jit/ChangeLog | 4 
 gcc/lto/ChangeLog | 4 
 gcc/objc/ChangeLog| 4 
 gcc/objcp/ChangeLog   | 4 
 gcc/po/ChangeLog  | 4 
 gcc/testsuite/ChangeLog   | 4 
 gnattools/ChangeLog   | 4 
 gotools/ChangeLog | 4 
 include/ChangeLog | 4 
 intl/ChangeLog| 4 
 libada/ChangeLog  | 4 
 libatomic/ChangeLog   | 4 
 libbacktrace/ChangeLog| 4 
 libcc1/ChangeLog  | 4 
 libcody/ChangeLog | 4 
 libcpp/ChangeLog  | 4 
 libcpp/po/ChangeLog   | 4 
 libdecnumber/ChangeLog| 4 
 libffi/ChangeLog  | 4 
 libgcc/ChangeLog  | 4 
 libgcc/config/avr/libf7/ChangeLog | 4 
 libgcc/config/libbid/ChangeLog| 4 
 libgfortran/ChangeLog | 4 
 libgomp/ChangeLog | 4 
 libhsail-rt/ChangeLog | 4 
 libiberty/ChangeLog   | 4 
 libitm/ChangeLog  | 4 
 libobjc/ChangeLog | 4 
 liboffloadmic/ChangeLog   | 4 
 libphobos/ChangeLog   | 4 
 libquadmath/ChangeLog | 4 
 libsanitizer/ChangeLog| 4 
 libssp/ChangeLog  | 4 
 libstdc++-v3/ChangeLog| 4 
 libvtv/ChangeLog  | 4 
 lto-plugin/ChangeLog  | 4 
 maintainer-scripts/ChangeLog  | 4 
 zlib/ChangeLog| 4 
 57 files changed, 225 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 48d69d2bf1c4..ccc548944b16 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2024-07-19  Release Manager
+
+   * GCC 11.5.0 released.
+
 2023-05-29  Release Manager
 
* GCC 11.4.0 released.
diff --git a/c++tools/ChangeLog b/c++tools/ChangeLog
index 46505bf853b6..a4cfdae87fdb 100644
--- a/c++tools/ChangeLog
+++ b/c++tools/ChangeLog
@@ -1,3 +1,7 @@
+2024-07-19  Release Manager
+
+   * GCC 11.5.0 released.
+
 2023-05-29  Release Manager
 
* GCC 11.4.0 released.
diff --git a/config/ChangeLog b/config/ChangeLog
index 28b8a737fb3e..6bf7fc63c7a8 100644
--- a/config/ChangeLog
+++ b/config/ChangeLog
@@ -1,3 +1,7 @@
+2024-07-19  Release Manager
+
+   * GCC 11.5.0 released.
+
 2023-05-29  Release Manager
 
* GCC 11.4.0 released.
diff --git a/contrib/ChangeLog b/contrib/ChangeLog
index 73ecfe5ddc7a..bb0f18e30f97 100644
--- a/contrib/ChangeLog
+++ b/contrib/ChangeLog
@@ -1,3 +1,7 @@
+2024-07-19  Release Manager
+
+   * GCC 11.5.0 released.
+
 2023-05-29  Release Manager
 
* GCC 11.4.0 released.
diff --git a/contrib/header-tools/ChangeLog b/contrib/header-tools/ChangeLog
index c6bf489533e0..e64c34b9792c 100644
--- a/contrib/header-tools/ChangeLog
+++ b/contrib/header-tools/ChangeLog
@@ -1,3 +1,7 @@
+2024-07-19  Release Manager
+
+   * GCC 11.5.0 released.
+
 2023-05-29  Release Manager
 
* GCC 11.4.0 released.
diff --git a/contrib/reghunt/ChangeLog b/contrib/reghunt/ChangeLog
index 8c438a39a115..c5aab3bc981b 100644
--- a/contrib/reghunt/ChangeLog
+++ b/contrib/reghunt/ChangeLog
@@ -1,3 +1,7 @@
+2024-07-19  Release Manager
+
+   * GCC 11.5.0 released.
+
 2023-05-29  Release Manager
 
* GCC 11.4.0 released.
diff --git a/contrib/regression/ChangeLog b/contrib/regression/ChangeLog
index 5412df7fab59..ad7c629493fb 100644
--- a/contrib/regression/ChangeLog
+++ b/contrib/regression/ChangeLog
@@ -1,3 +1,7 @@
+2024-07-19  Release Manager
+
+   * GCC 11.5.0 released.
+
 2023-05-29  Release Manager
 
* GCC 11.4.0 released.
diff --git a/fixincludes/ChangeLog b/fixincludes/ChangeLog
index aa3cdfbd4ec9..f22552738ab2 100644
--- a/fixincludes/ChangeLog

gcc-wwwdocs branch master updated. 24e35757dcf718697fe2dce121e6d4ae8cdb6d14

2024-07-18 Thread Richard Biener via Gcc-cvs-wwwdocs
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  24e35757dcf718697fe2dce121e6d4ae8cdb6d14 (commit)
  from  2904e761f8d32d8c04f704e3f937480a396f5b57 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 24e35757dcf718697fe2dce121e6d4ae8cdb6d14
Author: Richard Biener 
Date:   Fri Jul 19 07:49:14 2024 +0200

GCC 11.5 release preparations.

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 3737af5b..e010cd08 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -1154,5 +1154,15 @@ are not listed here).
   
 
 
+
+
+GCC 11.5
+
+This is the https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVEDresolution=FIXEDtarget_milestone=11.5;>list
+of problem reports (PRs) from GCC's bug tracking system that are
+known to be fixed in the 11.5 release. This list might not be
+complete (that is, it is possible that some PRs that have been fixed
+are not listed here).
+
 
 
diff --git a/htdocs/gcc-11/index.html b/htdocs/gcc-11/index.html
index 681da6a1..c2bde497 100644
--- a/htdocs/gcc-11/index.html
+++ b/htdocs/gcc-11/index.html
@@ -11,17 +11,23 @@
 
 GCC 11 Release Series
 
-May 29, 2023
+July 19, 2024
 
-The GCC developers are pleased to announce the release of GCC 11.4.
+The GCC developers are pleased to announce the release of GCC 11.5.
 
 This release is a bug-fix release, containing fixes for regressions in
-GCC 11.3 relative to previous releases of GCC.
+GCC 11.4 relative to previous releases of GCC.
 
 Release History
 
 
 
+GCC 11.5
+July 19, 2024
+(changes,
+ https://gcc.gnu.org/onlinedocs/11.5.0/;>documentation)
+
+
 GCC 11.4
 May 29, 2023
 (changes,

---

Summary of changes:
 htdocs/gcc-11/changes.html | 10 ++
 htdocs/gcc-11/index.html   | 12 +---
 2 files changed, 19 insertions(+), 3 deletions(-)


hooks/post-receive
-- 
gcc-wwwdocs


Re: tsvc test iteration count during check-gcc

2024-07-18 Thread Richard Biener via Gcc



> Am 18.07.2024 um 16:20 schrieb Joern Wolfgang Rennecke 
> :
> 
> The tsvc tests take just too long on simulators, particularly if there is 
> little or no vectorization of the test because of compiler limitations, 
> target limitations, or the chosen options.  Having
> 151 tests time out at a quarter of an hour is not fun, and making the time 
> out go away by upping the timeout might make for better looking results, but 
> not for better turn-around times.
> 
> So, I though to just change the iteration count (which is currently defined 
> as 1 in tsvc.h, resulting in billions of operations for a single test) to 
> something small, like 10.
> 
> This requires new expected results, but there were pretty straightforward to 
> auto-generate.  The lack of a separate number for s3111 caused me some 
> puzzlement, but it can indeed share a value with s3.
> 
> But then if I want to specifically change the iteration count for simulators, 
> I have to change 151 individual test files to add another 
> dg-additional-options stanza. I can leave the job to grep / bash / ed,
> but then I get 151 locally changed files, which is a pain to merge.
> So I wonder if tsvc.h shouldn't really default to a low iteration count.
> Is there actually any reason to run the regression tests with an iteration 
> count of 1 on any host?

Only laziness of not generating new expected outcomes.  So I’m fine with 
lowering them.

Richard 

> I mean, if you wanted to get some regression check on performance, you'd 
> really want to have something more exact that wall clock time doesn't exceed 
> whatever timeout is set.  You could test set a ulimit for cpu time and fine 
> tune that for proper benchmark regression test - but for
> the purposes of an ordinary gcc regression test, you generally just want the 
> optimizations perfromed (like in the dump file tests present) and the 
> computation be performed correctly.  And for these, it makes little
> difference how many iterations you use for the test, as long as you convince 
> GCC that the code is 'hot'.


[gcc r15-2139] middle-end/115641 - invalid address construction

2024-07-18 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:3670c70c561656a19f6bff36dd229f18120af127

commit r15-2139-g3670c70c561656a19f6bff36dd229f18120af127
Author: Richard Biener 
Date:   Thu Jul 18 13:35:33 2024 +0200

middle-end/115641 - invalid address construction

fold_truth_andor_1 via make_bit_field_ref builds an address of
a CALL_EXPR which isn't valid GENERIC and later causes an ICE.
The following simply avoids the folding for f ().a != 1 || f ().b != 2
as it is a premature optimization anyway.  The alternative would
have been to build a TARGET_EXPR around the call.  To get this far
f () has to be const as otherwise the two calls are not semantically
equivalent for the optimization.

PR middle-end/115641
* fold-const.cc (decode_field_reference): If the inner
reference isn't something we can take the address of, fail.

* gcc.dg/torture/pr115641.c: New testcase.

Diff:
---
 gcc/fold-const.cc   |  3 +++
 gcc/testsuite/gcc.dg/torture/pr115641.c | 29 +
 2 files changed, 32 insertions(+)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 710d697c0217..6179a09f9c0a 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -5003,6 +5003,9 @@ decode_field_reference (location_t loc, tree *exp_, 
HOST_WIDE_INT *pbitsize,
   || *pbitsize < 0
   || offset != 0
   || TREE_CODE (inner) == PLACEHOLDER_EXPR
+  /* We eventually want to build a larger reference and need to take
+the address of this.  */
+  || (!REFERENCE_CLASS_P (inner) && !DECL_P (inner))
   /* Reject out-of-bound accesses (PR79731).  */
   || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
  && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
diff --git a/gcc/testsuite/gcc.dg/torture/pr115641.c 
b/gcc/testsuite/gcc.dg/torture/pr115641.c
new file mode 100644
index ..65fb09ca64fc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr115641.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+
+typedef struct {
+  char hours, day, month;
+  short year;
+} T;
+
+T g (void)
+{
+  T now;
+  now.hours = 1;
+  now.day = 2;
+  now.month = 3;
+  now.year = 4;
+  return now;
+}
+
+__attribute__((const)) T f (void)
+{
+  T virk = g ();
+  return virk;
+}
+
+int main ()
+{
+  if (f ().hours != 1 || f ().day != 2 || f ().month != 3 || f ().year != 4)
+__builtin_abort ();
+  return 0;
+}


[gcc r14-10459] c++: ICE with __has_unique_object_representations [PR115476]

2024-07-18 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:c314867fc06d475e3c2ace32032e0d72e3915b55

commit r14-10459-gc314867fc06d475e3c2ace32032e0d72e3915b55
Author: Marek Polacek 
Date:   Mon Jun 17 17:53:12 2024 -0400

c++: ICE with __has_unique_object_representations [PR115476]

Here we started to ICE with r13-25: in check_trait_type, for "X[]" we
return true here:

  if (kind == 1 && TREE_CODE (type) == ARRAY_TYPE && !TYPE_DOMAIN (type))
return true; // Array of unknown bound. Don't care about completeness.

and then end up crashing in record_has_unique_obj_representations:

4836  if (cur != wi::to_offset (sz))

because sz is null.


https://eel.is/c++draft/type.traits#tab:meta.unary.prop-row-47-column-3-sentence-1
says that the preconditions for __has_unique_object_representations are:
"T shall be a complete type, cv void, or an array of unknown bound" and
that "For an array type T, the same result as
has_unique_object_representations_v>" so T[]
should be treated as T.  So we should use kind==2 for the trait.

PR c++/115476

gcc/cp/ChangeLog:

* semantics.cc (finish_trait_expr)
: Move below to call
check_trait_type with kind==2.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/has-unique-obj-representations4.C: New test.

(cherry picked from commit fc382a373e6824bb998007d1dcb0805b0cf4b8e8)

Diff:
---
 gcc/cp/semantics.cc  |  2 +-
 .../g++.dg/cpp1z/has-unique-obj-representations4.C   | 16 
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index ec741c0b203d..edb8947e4c2f 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12744,7 +12744,6 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_HAS_NOTHROW_COPY:
 case CPTK_HAS_TRIVIAL_COPY:
 case CPTK_HAS_TRIVIAL_DESTRUCTOR:
-case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
   if (!check_trait_type (type1))
return error_mark_node;
   break;
@@ -12754,6 +12753,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_STD_LAYOUT:
 case CPTK_IS_TRIVIAL:
 case CPTK_IS_TRIVIALLY_COPYABLE:
+case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
   if (!check_trait_type (type1, /* kind = */ 2))
return error_mark_node;
   break;
diff --git a/gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C 
b/gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C
new file mode 100644
index ..d6949dc7005e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C
@@ -0,0 +1,16 @@
+// PR c++/115476
+// { dg-do compile { target c++11 } }
+
+struct X;
+static_assert(__has_unique_object_representations(X), "");   // { dg-error 
"invalid use of incomplete type" }
+static_assert(__has_unique_object_representations(X[]), "");  // { dg-error 
"invalid use of incomplete type" }
+static_assert(__has_unique_object_representations(X[1]), "");  // { dg-error 
"invalid use of incomplete type" }
+static_assert(__has_unique_object_representations(X[][1]), "");  // { dg-error 
"invalid use of incomplete type" }
+
+struct X {
+  int x;
+};
+static_assert(__has_unique_object_representations(X), "");
+static_assert(__has_unique_object_representations(X[]), "");
+static_assert(__has_unique_object_representations(X[1]), "");
+static_assert(__has_unique_object_representations(X[][1]), "");


[gcc r14-10458] i386: PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

2024-07-18 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:a4c9ade72885f9cf72c873d110545e4e3c2c7805

commit r14-10458-ga4c9ade72885f9cf72c873d110545e4e3c2c7805
Author: Roger Sayle 
Date:   Fri Jun 7 14:03:20 2024 +0100

i386: PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

This patch addresses PR target/115351, which is a code quality regression
on x86 when passing floating point complex numbers.  The ABI considers
these arguments to have TImode, requiring interunit moves to place the
FP values (which are actually passed in SSE registers) into the upper
and lower parts of a TImode pseudo, and then similar moves back again
before they can be used.

The cause of the regression is that changes in how TImode initialization
is represented in RTL now prevents the RTL optimizers from eliminating
these redundant moves.  The specific cause is that the *concatditi3
pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default)
rtx_cost, preventing fwprop1 from propagating it.  This pattern just
sets the hipart and lopart of a double-word register, typically two
instructions (less if reload can allocate things appropriately) but
the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52.

propagating insn 5 into insn 6, replacing:
(set (reg:TI 110)
(ior:TI (and:TI (reg:TI 110)
(const_wide_int 0x0))
(ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0))
(const_int 64 [0x40]
successfully matched this instruction to *concatditi3_3:
(set (reg:TI 110)
(ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 
]) 0))
(const_int 64 [0x40]))
(zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0
change not profitable (cost 50 -> cost 52)

This issue is resolved by having ix86_rtx_costs return more reasonable
values for these (place-holder) patterns.

2024-06-07  Roger Sayle  

gcc/ChangeLog
PR target/115351
* config/i386/i386.cc (ix86_rtx_costs): Provide estimates for
the *concatditi3 and *insvti_highpart patterns, about two insns.

gcc/testsuite/ChangeLog
PR target/115351
* g++.target/i386/pr115351.C: New test case.

(cherry picked from commit fb3e4c549d16d5050e10114439ad77149f33c597)

Diff:
---
 gcc/config/i386/i386.cc  | 43 
 gcc/testsuite/g++.target/i386/pr115351.C | 19 ++
 2 files changed, 62 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3827e2b61fe4..35a282433892 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21865,6 +21865,49 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
}
  *total = ix86_vec_cost (mode, cost->sse_op);
}
+  else if (TARGET_64BIT
+  && mode == TImode
+  && GET_CODE (XEXP (x, 0)) == ASHIFT
+  && GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTEND
+  && GET_MODE (XEXP (XEXP (XEXP (x, 0), 0), 0)) == DImode
+  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+  && INTVAL (XEXP (XEXP (x, 0), 1)) == 64
+  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+  && GET_MODE (XEXP (XEXP (x, 1), 0)) == DImode)
+   {
+ /* *concatditi3 is cheap.  */
+ rtx op0 = XEXP (XEXP (XEXP (x, 0), 0), 0);
+ rtx op1 = XEXP (XEXP (x, 1), 0);
+ *total = (SUBREG_P (op0) && GET_MODE (SUBREG_REG (op0)) == DFmode)
+  ? COSTS_N_INSNS (1)/* movq.  */
+  : set_src_cost (op0, DImode, speed);
+ *total += (SUBREG_P (op1) && GET_MODE (SUBREG_REG (op1)) == DFmode)
+   ? COSTS_N_INSNS (1)/* movq.  */
+   : set_src_cost (op1, DImode, speed);
+ return true;
+   }
+  else if (TARGET_64BIT
+  && mode == TImode
+  && GET_CODE (XEXP (x, 0)) == AND
+  && REG_P (XEXP (XEXP (x, 0), 0))
+  && CONST_WIDE_INT_P (XEXP (XEXP (x, 0), 1))
+  && CONST_WIDE_INT_NUNITS (XEXP (XEXP (x, 0), 1)) == 2
+  && CONST_WIDE_INT_ELT (XEXP (XEXP (x, 0), 1), 0) == -1
+  && CONST_WIDE_INT_ELT (XEXP (XEXP (x, 0), 1), 1) == 0
+  && GET_CODE (XEXP (x, 1)) == ASHIFT
+  && GET_CODE (XEXP (XEXP (x, 1), 0)) == ZERO_EXTEND
+  && GET_MODE (XEXP (XEXP (XEXP (x, 1), 0), 0)) == DImode
+  && CONST_INT_P (XEXP (XEXP (x, 1), 1))
+  && INTVAL (XEXP (XEXP (x, 1), 1)) == 64)
+   {
+ /* *insvti_highpart is cheap.  */
+ rtx op = XEXP (XEXP (XEXP (x, 1), 0), 0);
+ *total = COSTS_N_INSNS (1) + 1;
+ *total += (SUBREG_P (op) && GET_MODE (SUBREG_REG (op)) == DFmode)
+   ? COSTS_N_INSNS (1)/* movq. 

[gcc r14-10457] analyzer: fix ICE seen with -fsanitize=undefined [PR114899]

2024-07-18 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:b0452ed2fdc8bd9d5e911b6ea3166e4cd4be5256

commit r14-10457-gb0452ed2fdc8bd9d5e911b6ea3166e4cd4be5256
Author: David Malcolm 
Date:   Wed May 15 18:40:56 2024 -0400

analyzer: fix ICE seen with -fsanitize=undefined [PR114899]

gcc/analyzer/ChangeLog:
PR analyzer/114899
* access-diagram.cc
(written_svalue_spatial_item::get_label_string): Bulletproof
against SSA_NAME_VAR being null.

gcc/testsuite/ChangeLog:
PR analyzer/114899
* c-c++-common/analyzer/out-of-bounds-diagram-pr114899.c: New test.

Signed-off-by: David Malcolm 
(cherry picked from commit 1779e22150b917e28e959623c819ef943fab02df)

Diff:
---
 gcc/analyzer/access-diagram.cc|  3 ++-
 .../analyzer/out-of-bounds-diagram-pr114899.c | 15 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/access-diagram.cc b/gcc/analyzer/access-diagram.cc
index 500480b68328..8d7461fe381d 100644
--- a/gcc/analyzer/access-diagram.cc
+++ b/gcc/analyzer/access-diagram.cc
@@ -1632,7 +1632,8 @@ protected:
 if (rep_tree)
   {
if (TREE_CODE (rep_tree) == SSA_NAME)
- rep_tree = SSA_NAME_VAR (rep_tree);
+ if (tree var = SSA_NAME_VAR (rep_tree))
+   rep_tree = var;
switch (TREE_CODE (rep_tree))
  {
  default:
diff --git 
a/gcc/testsuite/c-c++-common/analyzer/out-of-bounds-diagram-pr114899.c 
b/gcc/testsuite/c-c++-common/analyzer/out-of-bounds-diagram-pr114899.c
new file mode 100644
index ..14ba540d4ec2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/out-of-bounds-diagram-pr114899.c
@@ -0,0 +1,15 @@
+/* Verify we don't ICE generating out-of-bounds diagram.  */
+
+/* { dg-additional-options " -fsanitize=undefined 
-fdiagnostics-text-art-charset=unicode" } */
+
+int * a() {
+  int *b = (int *)__builtin_malloc(sizeof(int));
+  int *c = b - 1;
+  ++*c;
+  return b;
+}
+
+/* We don't care about the exact diagram, just that we don't ICE.  */
+
+/* { dg-allow-blank-lines-in-output 1 } */
+/* { dg-prune-output ".*" } */


[gcc r14-10456] Fix points_to_local_or_readonly_memory_p wrt TARGET_MEM_REF

2024-07-18 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:0b7ec50ae2959153650c0b3dc134c8872ff9fcfc

commit r14-10456-g0b7ec50ae2959153650c0b3dc134c8872ff9fcfc
Author: Jan Hubicka 
Date:   Thu May 16 15:33:55 2024 +0200

Fix points_to_local_or_readonly_memory_p wrt TARGET_MEM_REF

TARGET_MEM_REF can be used to offset constant base into a memory object (to
produce lea instruction).  This confuses 
points_to_local_or_readonly_memory_p
which treats the constant address as a base of the access.

Bootstrapped/regtsted x86_64-linux, comitted.
Honza

gcc/ChangeLog:

PR ipa/113787
* ipa-fnsummary.cc (points_to_local_or_readonly_memory_p): Do not
look into TARGET_MEM_REFS with constant opreand 0.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr113787.c: New test.

(cherry picked from commit 96d53252aefcbc2fe419c4c3b4bcd3fc03d4d187)

Diff:
---
 gcc/ipa-fnsummary.cc   |  4 ++-
 gcc/testsuite/gcc.c-torture/execute/pr113787.c | 38 ++
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index dff40cd8aa51..f2937df0292d 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -2644,7 +2644,9 @@ points_to_local_or_readonly_memory_p (tree t)
return true;
   return !ptr_deref_may_alias_global_p (t, false);
 }
-  if (TREE_CODE (t) == ADDR_EXPR)
+  if (TREE_CODE (t) == ADDR_EXPR
+  && (TREE_CODE (TREE_OPERAND (t, 0)) != TARGET_MEM_REF
+ || TREE_CODE (TREE_OPERAND (TREE_OPERAND (t, 0), 0)) != INTEGER_CST))
 return refs_local_or_readonly_memory_p (TREE_OPERAND (t, 0));
   return false;
 }
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113787.c 
b/gcc/testsuite/gcc.c-torture/execute/pr113787.c
new file mode 100644
index ..702b6c35fc68
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr113787.c
@@ -0,0 +1,38 @@
+void foo(int x, int y, int z, int d, int *buf)
+{
+  for(int i = z; i < y-z; ++i)
+for(int j = 0; j < d; ++j)
+  /* buf[x(i+1) + j] = buf[x(i+1)-j-1] */
+  buf[i*x+(x-z+j)] = buf[i*x+(x-z-1-j)];
+}
+
+void bar(int x, int y, int z, int d, int *buf)
+{
+  for(int i = 0; i < d; ++i)
+for(int j = z; j < x-z; ++j)
+  /* buf[j+(y+i)*x] = buf[j+(y-1-i)*x] */
+  buf[j+(y-z+i)*x] = buf[j+(y-z-1-i)*x];
+}
+
+__attribute__((noipa))
+void baz(int x, int y, int d, int *buf)
+{
+  foo(x, y, 0, d, buf);
+  bar(x, y, 0, d, buf);
+}
+
+int main(void)
+{
+  int a[] = { 1, 2, 3 };
+  baz (1, 2, 1, a);
+  /* foo does:
+ buf[1] = buf[0];
+ buf[2] = buf[1];
+
+ bar does:
+ buf[2] = buf[1]; (no-op)
+ so we should have { 1, 1, 1 }.  */
+  for (int i = 0; i < 3; i++)
+if (a[i] != 1)
+  __builtin_abort ();
+}


[gcc r14-10455] PR tree-optimization/113673: Avoid load merging when potentially trapping.

2024-07-18 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:0f593e4cd82eeced5d7666ae6752f238c7dbd7f6

commit r14-10455-g0f593e4cd82eeced5d7666ae6752f238c7dbd7f6
Author: Roger Sayle 
Date:   Mon Jun 24 15:34:03 2024 +0100

PR tree-optimization/113673: Avoid load merging when potentially trapping.

This patch fixes PR tree-optimization/113673, a P2 ice-on-valid regression
caused by load merging of (ptr[0]<<8)+ptr[1] when -ftrapv has been
specified.  When the operator is | or ^ this is safe, but for addition
of signed integer types, a trap may be generated/required, so merging this
idiom into a single non-trapping instruction is inappropriate, confusing
the compiler by transforming a basic block with an exception edge into one
without.

This revision implements Richard Biener's feedback to add an early check
for stmt_can_throw_internal (cfun, stmt) to prevent transforming in the
presence of any statement that could trap, not just overflow on addition.
The one other tweak included in this patch is to mark the local function
find_bswap_or_nop_load as static ensuring that it isn't called from outside
this file, and guaranteeing that it is dominated by stmt_can_throw_internal
checking.

2024-06-24  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR tree-optimization/113673
* gimple-ssa-store-merging.cc (find_bswap_or_nop_load): Make static.
(find_bswap_or_nop_1): Avoid transformations (load merging) when
stmt_can_throw_internal indicates that a statement can trap.

gcc/testsuite/ChangeLog
PR tree-optimization/113673
* g++.dg/pr113673.C: New test case.

(cherry picked from commit d8b05aef77443e1d3d8f3f5d2c56ac49a503fee3)

Diff:
---
 gcc/gimple-ssa-store-merging.cc |  6 --
 gcc/testsuite/g++.dg/pr113673.C | 14 ++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-ssa-store-merging.cc b/gcc/gimple-ssa-store-merging.cc
index cb0cb5f42f65..7dba4a7a781f 100644
--- a/gcc/gimple-ssa-store-merging.cc
+++ b/gcc/gimple-ssa-store-merging.cc
@@ -363,7 +363,7 @@ init_symbolic_number (struct symbolic_number *n, tree src)
the answer. If so, REF is that memory source and the base of the memory area
accessed and the offset of the access from that base are recorded in N.  */
 
-bool
+static bool
 find_bswap_or_nop_load (gimple *stmt, tree ref, struct symbolic_number *n)
 {
   /* Leaf node is an array or component ref. Memorize its base and
@@ -610,7 +610,9 @@ find_bswap_or_nop_1 (gimple *stmt, struct symbolic_number 
*n, int limit)
   gimple *rhs1_stmt, *rhs2_stmt, *source_stmt1;
   enum gimple_rhs_class rhs_class;
 
-  if (!limit || !is_gimple_assign (stmt))
+  if (!limit
+  || !is_gimple_assign (stmt)
+  || stmt_can_throw_internal (cfun, stmt))
 return NULL;
 
   rhs1 = gimple_assign_rhs1 (stmt);
diff --git a/gcc/testsuite/g++.dg/pr113673.C b/gcc/testsuite/g++.dg/pr113673.C
new file mode 100644
index ..11489777f5b9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr113673.C
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fnon-call-exceptions -ftrapv" } */
+
+struct s { ~s(); };
+void
+h (unsigned char *data, int c)
+{
+  s a1;
+  while (c)
+{
+  int m = *data++ << 8;
+  m += *data++;
+}
+}


[gcc r15-2134] gimple-fold: consistent dump of builtin call simplifications

2024-07-18 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:cee56fe0ba757cae17dcc4be216cea88be76e740

commit r15-2134-gcee56fe0ba757cae17dcc4be216cea88be76e740
Author: Rubin Gerritsen 
Date:   Tue Jul 16 21:11:24 2024 +0200

gimple-fold: consistent dump of builtin call simplifications

Previously only simplifications of the `__st[xrp]cpy_chk`
were dumped. Now all call replacement simplifications are
dumped.

Examples of statements with corresponding dumpfile entries:

`printf("mystr\n");`:
  optimized: simplified printf to __builtin_puts
`printf("%c", 'a');`:
  optimized: simplified printf to __builtin_putchar
`printf("%s\n", "mystr");`:
  optimized: simplified printf to __builtin_puts

The below test suites passed for this patch
* The x86 bootstrap test.
* Manual testing with some small example code manually
  examining dump logs, outputting the lines mentioned above.

gcc/ChangeLog:

* gimple-fold.cc (dump_transformation): Moved definition.
(replace_call_with_call_and_fold): Calls dump_transformation.
(gimple_fold_builtin_stxcpy_chk): Removes call to
dump_transformation, now in replace_call_with_call_and_fold.
(gimple_fold_builtin_stxncpy_chk): Removes call to
dump_transformation, now in replace_call_with_call_and_fold.

Signed-off-by: Rubin Gerritsen 

Diff:
---
 gcc/gimple-fold.cc | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index ed9508e4c912..c20102f73f59 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -802,6 +802,15 @@ gimplify_and_update_call_from_tree (gimple_stmt_iterator 
*si_p, tree expr)
   gsi_replace_with_seq_vops (si_p, stmts);
 }
 
+/* Print a message in the dump file recording transformation of FROM to TO.  */
+
+static void
+dump_transformation (gcall *from, gcall *to)
+{
+  if (dump_enabled_p ())
+dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to %T\n",
+gimple_call_fn (from), gimple_call_fn (to));
+}
 
 /* Replace the call at *GSI with the gimple value VAL.  */
 
@@ -835,6 +844,7 @@ static void
 replace_call_with_call_and_fold (gimple_stmt_iterator *gsi, gimple *repl)
 {
   gimple *stmt = gsi_stmt (*gsi);
+  dump_transformation (as_a  (stmt), as_a  (repl));
   gimple_call_set_lhs (repl, gimple_call_lhs (stmt));
   gimple_set_location (repl, gimple_location (stmt));
   gimple_move_vops (repl, stmt);
@@ -3090,16 +3100,6 @@ gimple_fold_builtin_memory_chk (gimple_stmt_iterator 
*gsi,
   return true;
 }
 
-/* Print a message in the dump file recording transformation of FROM to TO.  */
-
-static void
-dump_transformation (gcall *from, gcall *to)
-{
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to %T\n",
-gimple_call_fn (from), gimple_call_fn (to));
-}
-
 /* Fold a call to the __st[rp]cpy_chk builtin.
DEST, SRC, and SIZE are the arguments to the call.
IGNORE is true if return value can be ignored.  FCODE is the BUILT_IN_*
@@ -3189,7 +3189,6 @@ gimple_fold_builtin_stxcpy_chk (gimple_stmt_iterator *gsi,
 return false;
 
   gcall *repl = gimple_build_call (fn, 2, dest, src);
-  dump_transformation (stmt, repl);
   replace_call_with_call_and_fold (gsi, repl);
   return true;
 }
@@ -3235,7 +3234,6 @@ gimple_fold_builtin_stxncpy_chk (gimple_stmt_iterator 
*gsi,
 return false;
 
   gcall *repl = gimple_build_call (fn, 3, dest, src, len);
-  dump_transformation (stmt, repl);
   replace_call_with_call_and_fold (gsi, repl);
   return true;
 }


[gcc r15-2133] tree-optimization/104515 - store motion and clobbers

2024-07-18 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:8c67dc40459e3d72e8169b099cc8c5dbdb759da3

commit r15-2133-g8c67dc40459e3d72e8169b099cc8c5dbdb759da3
Author: Richard Biener 
Date:   Wed Jul 17 10:22:47 2024 +0200

tree-optimization/104515 - store motion and clobbers

The following addresses an old regression when end-of-object/storage
clobbers were introduced.  In particular when there's an end-of-object
clobber in a loop but no corresponding begin-of-object we can still
perform store motion of may-aliased refs when we re-issue the
end-of-object/storage on the exits but elide it from the loop.  This
should be the safest way to deal with this considering stack-slot
sharing and it should not cause missed dead store eliminations given
DSE can now follow multiple paths in case there are multiple exits.

Note when the clobber is re-materialized only on one exit but not
on anther we are erroring on the side of removing the clobber on
such path.  This should be OK (removing clobbers is always OK).

Note there's no corresponding code to handle begin-of-object/storage
during the hoisting part of loads that are part of a store motion
optimization, so this only enables stored-only store motion or cases
without such clobber inside the loop.

PR tree-optimization/104515
* tree-ssa-loop-im.cc (execute_sm_exit): Add clobbers_to_prune
parameter and handle re-materializing of clobbers.
(sm_seq_valid_bb): end-of-storage/object clobbers are OK inside
an ordered sequence of stores.
(sm_seq_push_down): Refuse to push down clobbers.
(hoist_memory_references): Prune clobbers from the loop body
we re-materialized on an exit.

* g++.dg/opt/pr104515.C: New testcase.

Diff:
---
 gcc/testsuite/g++.dg/opt/pr104515.C | 18 
 gcc/tree-ssa-loop-im.cc | 86 ++---
 2 files changed, 89 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/g++.dg/opt/pr104515.C 
b/gcc/testsuite/g++.dg/opt/pr104515.C
new file mode 100644
index ..f5455a45aa63
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr104515.C
@@ -0,0 +1,18 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -fdump-tree-lim2-details" }
+
+using T = int;
+struct Vec {
+  T* end;
+};
+void pop_back_many(Vec& v, unsigned n)
+{
+  for (unsigned i = 0; i < n; ++i) {
+--v.end;
+//  The end-of-object clobber prevented store motion of v
+v.end->~T();
+  }
+}
+
+// { dg-final { scan-tree-dump "Executing store motion of v" "lim2" } }
+// { dg-final { scan-tree-dump "Re-issueing dependent" "lim2" } }
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 61c6339bc351..c53efbb8d597 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -2368,7 +2368,8 @@ struct seq_entry
 static void
 execute_sm_exit (class loop *loop, edge ex, vec ,
 hash_map _map, sm_kind kind,
-edge _cond_position, edge _cond_fallthru)
+edge _cond_position, edge _cond_fallthru,
+bitmap clobbers_to_prune)
 {
   /* Sink the stores to exit from the loop.  */
   for (unsigned i = seq.length (); i > 0; --i)
@@ -2377,15 +2378,35 @@ execute_sm_exit (class loop *loop, edge ex, 
vec ,
   if (seq[i-1].second == sm_other)
{
  gcc_assert (kind == sm_ord && seq[i-1].from != NULL_TREE);
- if (dump_file && (dump_flags & TDF_DETAILS))
+ gassign *store;
+ if (ref->mem.ref == error_mark_node)
{
- fprintf (dump_file, "Re-issueing dependent store of ");
- print_generic_expr (dump_file, ref->mem.ref);
- fprintf (dump_file, " from loop %d on exit %d -> %d\n",
-  loop->num, ex->src->index, ex->dest->index);
+ tree lhs = gimple_assign_lhs (ref->accesses_in_loop[0].stmt);
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Re-issueing dependent ");
+ print_generic_expr (dump_file, unshare_expr (seq[i-1].from));
+ fprintf (dump_file, " of ");
+ print_generic_expr (dump_file, lhs);
+ fprintf (dump_file, " from loop %d on exit %d -> %d\n",
+  loop->num, ex->src->index, ex->dest->index);
+   }
+ store = gimple_build_assign (unshare_expr (lhs),
+  unshare_expr (seq[i-1].from));
+ bitmap_set_bit (clobbers_to_prune, seq[i-1].first);
+   }
+ else
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Re-issueing dependent store of ");
+ print_generic_expr (dump_file, ref->mem.ref);
+ fprintf (dump_file, " from loop %d on exit %d -> %d\n",
+ 

[gcc r15-2093] tree-optimization/115959 - ICE with SLP condition reduction

2024-07-17 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:24689b84b8ec0c74c2b9a72ec4fb467069806bda

commit r15-2093-g24689b84b8ec0c74c2b9a72ec4fb467069806bda
Author: Richard Biener 
Date:   Wed Jul 17 11:42:13 2024 +0200

tree-optimization/115959 - ICE with SLP condition reduction

The following fixes how during reduction epilogue generation we
gather conditional compares for condition reductions, thereby
following the reduction chain via STMT_VINFO_REDUC_IDX.  The issue
is that SLP nodes for COND_EXPRs can have either three or four
children dependent on whether we have legacy GENERIC expressions
in the transitional pattern GIMPLE for the COND_EXPR condition.

PR tree-optimization/115959
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Get at the REDUC_IDX child in a safer way for COND_EXPR
nodes.

* gcc.dg/vect/pr115959.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115959.c | 14 ++
 gcc/tree-vect-loop.cc| 10 +++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115959.c 
b/gcc/testsuite/gcc.dg/vect/pr115959.c
new file mode 100644
index ..181d55220182
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115959.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+int a;
+_Bool *b;
+void f()
+{
+  int t = a;
+  for (int e = 0; e < 2048; e++)
+{
+  if (!b[e])
+   t = 0;
+}
+  a = t;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b8124a321280..a464bc8607c2 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6090,9 +6090,13 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
(std::make_pair (gimple_assign_rhs1 (vec_stmt),
 STMT_VINFO_REDUC_IDX (cond_info) == 2));
}
- /* ???  We probably want to have REDUC_IDX on the SLP node?  */
- cond_node = SLP_TREE_CHILDREN
-   (cond_node)[STMT_VINFO_REDUC_IDX (cond_info)];
+ /* ???  We probably want to have REDUC_IDX on the SLP node?
+We have both three and four children COND_EXPR nodes
+dependent on whether the comparison is still embedded
+as GENERIC.  So work backwards.  */
+ int slp_reduc_idx = (SLP_TREE_CHILDREN (cond_node).length () - 3
+  + STMT_VINFO_REDUC_IDX (cond_info));
+ cond_node = SLP_TREE_CHILDREN (cond_node)[slp_reduc_idx];
}
}
   else


[gcc r14-10444] vect: Merge loop mask and cond_op mask in fold-left reduction [PR115382].

2024-07-17 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:bf64404280a90715d1228edef0d5756e81635a64

commit r14-10444-gbf64404280a90715d1228edef0d5756e81635a64
Author: Robin Dapp 
Date:   Fri Jun 7 14:36:41 2024 +0200

vect: Merge loop mask and cond_op mask in fold-left reduction [PR115382].

Currently we discard the cond-op mask when the loop is fully masked
which causes wrong code in
gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
when compiled with
-O3 -march=cascadelake --param vect-partial-vector-usage=2.

This patch ANDs both masks.

gcc/ChangeLog:

PR tree-optimization/115382

* tree-vect-loop.cc (vectorize_fold_left_reduction): Use
prepare_vec_mask.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
Remove static of prepare_vec_mask.
* tree-vectorizer.h (prepare_vec_mask): Export.

(cherry picked from commit 2b438a0d2aa80f051a09b245a58f643540d4004b)

Diff:
---
 gcc/tree-vect-loop.cc  | 10 +-
 gcc/tree-vect-stmts.cc |  2 +-
 gcc/tree-vectorizer.h  |  3 +++
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index feed73585921..acc6b75fb170 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7188,7 +7188,15 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   tree len = NULL_TREE;
   tree bias = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-   mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
+   {
+ tree loop_mask = vect_get_loop_mask (loop_vinfo, gsi, masks,
+  vec_num, vectype_in, i);
+ if (is_cond_op)
+   mask = prepare_vec_mask (loop_vinfo, TREE_TYPE (loop_mask),
+loop_mask, vec_opmask[i], gsi);
+ else
+   mask = loop_mask;
+   }
   else if (is_cond_op)
mask = vec_opmask[0];
   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ecf6f7459634..a25ac53a4cd3 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1643,7 +1643,7 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
MASK_TYPE is the type of both masks.  If new statements are needed,
insert them before GSI.  */
 
-static tree
+tree
 prepare_vec_mask (loop_vec_info loop_vinfo, tree mask_type, tree loop_mask,
  tree vec_mask, gimple_stmt_iterator *gsi)
 {
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index db44d730b702..c076cb648f4b 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2495,6 +2495,9 @@ extern void vect_free_slp_tree (slp_tree);
 extern bool compatible_calls_p (gcall *, gcall *);
 extern int vect_slp_child_index_for_operand (const gimple *, int op, bool);
 
+extern tree prepare_vec_mask (loop_vec_info, tree, tree, tree,
+ gimple_stmt_iterator *);
+
 /* In tree-vect-patterns.cc.  */
 extern void
 vect_mark_pattern_stmts (vec_info *, stmt_vec_info, gimple *, tree);


[gcc r14-10443] tree-optimization/115868 - ICE with .MASK_CALL in simdclone

2024-07-17 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:c58bede01c06c84f0b36881fafd1e5d6456a38f4

commit r14-10443-gc58bede01c06c84f0b36881fafd1e5d6456a38f4
Author: Richard Biener 
Date:   Thu Jul 11 09:56:56 2024 +0200

tree-optimization/115868 - ICE with .MASK_CALL in simdclone

The following adjusts mask recording which didn't take into account
that we can merge call arguments from two vectors like

  _50 = {vect_d_1.253_41, vect_d_1.254_43};
  _51 = VIEW_CONVERT_EXPR(mask__19.257_49);
  _52 = (unsigned int) _51;
  _53 = _Z3bazd.simdclone.7 (_50, _52);
  _54 = BIT_FIELD_REF <_53, 256, 0>;
  _55 = BIT_FIELD_REF <_53, 256, 256>;

The testcase g++.dg/vect/pr68762-2.cc exercises this on x86_64 with
partial vector usage enabled and AVX512 support.

PR tree-optimization/115868
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Correctly
compute the number of mask copies required for 
vect_record_loop_mask.

(cherry picked from commit abf3964711f05b6858d9775c3595ec2b45483e14)

Diff:
---
 gcc/tree-vect-stmts.cc | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index eed5c7d821cb..ecf6f7459634 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4317,9 +4317,14 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
case SIMD_CLONE_ARG_TYPE_MASK:
  if (loop_vinfo
  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
-   vect_record_loop_mask (loop_vinfo,
-  _VINFO_MASKS (loop_vinfo),
-  ncopies, vectype, op);
+   {
+ unsigned nmasks
+   = exact_div (ncopies * bestn->simdclone->simdlen,
+TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
+ vect_record_loop_mask (loop_vinfo,
+_VINFO_MASKS (loop_vinfo),
+nmasks, vectype, op);
+   }
 
  break;
}


[gcc r14-10440] tree-optimization/115841 - reduction epilogue placement issue

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:59ed01d5e3d2b0e59163d3248bdba9f1e35de599

commit r14-10440-g59ed01d5e3d2b0e59163d3248bdba9f1e35de599
Author: Richard Biener 
Date:   Tue Jul 16 11:53:17 2024 +0200

tree-optimization/115841 - reduction epilogue placement issue

When emitting the compensation to the vectorized main loop for
a vector reduction value to be re-used in the vectorized epilogue
we fail to place it in the correct block when the main loop is
known to be entered (no loop_vinfo->main_loop_edge) but the
epilogue is not (a loop_vinfo->skip_this_loop_edge).  The code
currently disregards this situation.

With the recent znver4 cost fix I couldn't trigger this situation
with the testcase but I adjusted it so it could eventually trigger
on other targets.

PR tree-optimization/115841
* tree-vect-loop.cc (vect_transform_cycle_phi): Correctly
place the partial vector reduction for the accumulator
re-use when the main loop cannot be skipped but the
epilogue can.

* gcc.dg/vect/pr115841.c: New testcase.

(cherry picked from commit 016c947b02e79a5c0c0c2d4ad5cb71aa04db3efd)

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115841.c | 42 
 gcc/tree-vect-loop.cc|  7 +++---
 2 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115841.c 
b/gcc/testsuite/gcc.dg/vect/pr115841.c
new file mode 100644
index ..aa5c66004a03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115841.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast -fcommon -fvect-cost-model=dynamic --param 
vect-partial-vector-usage=1" } */
+/* { dg-additional-options "-mavx512vl" { target avx512vl } } */
+
+/* To trigger the bug costing needs to determine that aligning the A170
+   accesses with a prologue is good and there should be a vectorized
+   epilogue with a smaller vector size, re-using the vector accumulator
+   from the vectorized main loop that's statically known to execute
+   but the epilogue loop is not.  */
+
+static unsigned char xl[192];
+unsigned char A170[192*3];
+
+void jerate (unsigned char *, unsigned char *);
+float foo (unsigned n)
+{
+  jerate (xl, A170);
+
+  unsigned i = 32;
+  int kr = 1;
+  float sfn11s = 0.f;
+  float sfn12s = 0.f;
+  do
+{
+  int krm1 = kr - 1;
+  long j = krm1;
+  float a = (*(float(*)[n])A170)[j];
+  float b = (*(float(*)[n])xl)[j];
+  float c = a * b;
+  float d = c * 6.93149983882904052734375e-1f;
+  float e = (*(float(*)[n])A170)[j+48];
+  float f = (*(float(*)[n])A170)[j+96];
+  float g = d * e;
+  sfn11s = sfn11s + g;
+  float h = f * d;
+  sfn12s = sfn12s + h;
+  kr++;
+}
+  while (--i != 0);
+  float tem = sfn11s + sfn12s;
+  return tem;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 832399f7e9d7..feed73585921 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8880,14 +8880,15 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
  /* And the reduction could be carried out using a different sign.  */
  if (!useless_type_conversion_p (vectype_out, TREE_TYPE (def)))
def = gimple_convert (, vectype_out, def);
- if (loop_vinfo->main_loop_edge)
+ edge e;
+ if ((e = loop_vinfo->main_loop_edge)
+ || (e = loop_vinfo->skip_this_loop_edge))
{
  /* While we'd like to insert on the edge this will split
 blocks and disturb bookkeeping, we also will eventually
 need this on the skip edge.  Rely on sinking to
 fixup optimal placement and insert in the pred.  */
- gimple_stmt_iterator gsi
-   = gsi_last_bb (loop_vinfo->main_loop_edge->src);
+ gimple_stmt_iterator gsi = gsi_last_bb (e->src);
  /* Insert before a cond that eventually skips the
 epilogue.  */
  if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))


[gcc r14-10439] tree-optimization/115843 - fix wrong-code with fully-masked loop and peeling

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:06829e593d2e5611e7924624cb8228795691e2b7

commit r14-10439-g06829e593d2e5611e7924624cb8228795691e2b7
Author: Richard Biener 
Date:   Mon Jul 15 13:50:58 2024 +0200

tree-optimization/115843 - fix wrong-code with fully-masked loop and peeling

When AVX512 uses a fully masked loop and peeling we fail to create the
correct initial loop mask when the mask is composed of multiple
components in some cases.  The following fixes this by properly applying
the bias for the component to the shift amount.

PR tree-optimization/115843
* tree-vect-loop-manip.cc
(vect_set_loop_condition_partial_vectors_avx512): Properly
bias the shift of the initial mask for alignment peeling.

* gcc.dg/vect/pr115843.c: New testcase.

(cherry picked from commit a177be05f6952c3f7e62186d2e138d96c475b81a)

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115843.c | 41 
 gcc/tree-vect-loop-manip.cc  |  8 +--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115843.c 
b/gcc/testsuite/gcc.dg/vect/pr115843.c
new file mode 100644
index ..3dbb6c792788
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115843.c
@@ -0,0 +1,41 @@
+/* { dg-additional-options "-mavx512f --param vect-partial-vector-usage=2" { 
target avx512f_runtime } } */
+
+#include "tree-vect.h"
+
+typedef __UINT64_TYPE__ BITBOARD;
+BITBOARD KingPressureMask1[64], KingSafetyMask1[64];
+
+void __attribute__((noinline))
+foo()
+{
+  for (int i = 0; i < 64; i++)
+{
+  if ((i & 7) == 0)
+   KingPressureMask1[i] = KingSafetyMask1[i + 1];
+  else if ((i & 7) == 7)
+   KingPressureMask1[i] = KingSafetyMask1[i - 1];
+  else
+   KingPressureMask1[i] = KingSafetyMask1[i];
+}
+}
+
+BITBOARD verify[64]
+  = {1, 1, 2, 3, 4, 5, 6, 6, 9, 9, 10, 11, 12, 13, 14, 14, 17, 17, 18, 19,
+20, 21, 22, 22, 25, 25, 26, 27, 28, 29, 30, 30, 33, 33, 34, 35, 36, 37, 38,
+38, 41, 41, 42, 43, 44, 45, 46, 46, 49, 49, 50, 51, 52, 53, 54, 54, 57, 57,
+58, 59, 60, 61, 62, 62};
+
+int main()
+{
+  check_vect ();
+
+#pragma GCC novector
+  for (int i = 0; i < 64; ++i)
+KingSafetyMask1[i] = i;
+  foo ();
+#pragma GCC novector
+  for (int i = 0; i < 64; ++i)
+if (KingPressureMask1[i] != verify[i])
+  __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 43c7881c640d..1ece4a58bd50 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1149,10 +1149,14 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
loop *loop,
  /* ???  But when the shift amount isn't constant this requires
 a round-trip to GRPs.  We could apply the bias to either
 side of the compare instead.  */
- tree shift = gimple_build (_seq, MULT_EXPR,
+ tree shift = gimple_build (_seq, MINUS_EXPR,
 TREE_TYPE (niters_skip), niters_skip,
 build_int_cst (TREE_TYPE (niters_skip),
-   
rgc.max_nscalars_per_iter));
+   bias));
+ shift = gimple_build (_seq, MULT_EXPR,
+   TREE_TYPE (niters_skip), shift,
+   build_int_cst (TREE_TYPE (niters_skip),
+  rgc.max_nscalars_per_iter));
  init_ctrl = gimple_build (_seq, LSHIFT_EXPR,
TREE_TYPE (init_ctrl),
init_ctrl, shift);


[gcc r14-10436] tree-optimization/115867 - ICE with simdcall vectorization in masked loop

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:ca275b68ef11d7d70bff8d7426e45b3734b3

commit r14-10436-gca275b68ef11d7d70bff8d7426e45b3734b3
Author: Richard Biener 
Date:   Thu Jul 11 10:18:55 2024 +0200

tree-optimization/115867 - ICE with simdcall vectorization in masked loop

When only a loop mask is to be supplied for the inbranch arg to a
simd function we fail to handle integer mode masks correctly.  We
need to guess the number of elements represented by it.  This assumes
that excess arguments are all for masks, I wasn't able to create
a simdclone with more than one integer mode mask argument.

The gcc.dg/vect/vect-simd-clone-20.c exercises this with -mavx512vl

PR tree-optimization/115867
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Properly
guess the number of mask elements for integer mode masks.

(cherry picked from commit 4f4478f0f31263997bfdc4159f90e58dd79b38f9)

Diff:
---
 gcc/tree-vect-stmts.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 21e8fe98e44a..eed5c7d821cb 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4716,7 +4716,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
  SIMD_CLONE_ARG_TYPE_MASK);
 
  tree masktype = bestn->simdclone->args[mask_i].vector_type;
- callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
+ if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode))
+   /* Guess the number of lanes represented by masktype.  */
+   callee_nelements = exact_div (bestn->simdclone->simdlen,
+ bestn->simdclone->nargs - nargs);
+ else
+   callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
  o = vector_unroll_factor (nunits, callee_nelements);
  for (m = j * o; m < (j + 1) * o; m++)
{


[gcc r14-10435] Fixup unaligned load/store cost for znver5

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:4a04110ec8388b6540380cfedbe50af1b29e3e36

commit r14-10435-g4a04110ec8388b6540380cfedbe50af1b29e3e36
Author: Richard Biener 
Date:   Tue Jul 16 10:45:27 2024 +0200

Fixup unaligned load/store cost for znver5

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply copied from the bogus znver4 costs.  The following makes
the unaligned costs equal to the aligned costs like in the fixed znver4
version.

* config/i386/x86-tune-costs.h (znver5_cost): Update unaligned
load and store cost from the aligned costs.

(cherry picked from commit 896393791ee34ffc176c87d232dfee735db3aaab)

Diff:
---
 gcc/config/i386/x86-tune-costs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index d0168eebdc15..8348ab8230ad 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -2060,8 +2060,8 @@ struct processor_costs znver5_cost = {
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
   {8, 8, 8, 12, 12},   /* cost of storing SSE register
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
-  {6, 6, 6, 6, 6}, /* cost of unaligned loads.  */
-  {8, 8, 8, 8, 8}, /* cost of unaligned stores.  */
+  {6, 6, 10, 10, 12},  /* cost of unaligned loads.  */
+  {8, 8, 8, 12, 12},   /* cost of unaligned stores.  */
   2, 2, 2, /* cost of moving XMM,YMM,ZMM
   register.  */
   6,   /* cost of moving SSE register to 
integer.  */


[gcc r14-10438] tree-optimization/115701 - fix maybe_duplicate_ssa_info_at_copy

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:e01012c459c931ae39558b019107226c232fa4d1

commit r14-10438-ge01012c459c931ae39558b019107226c232fa4d1
Author: Richard Biener 
Date:   Sun Jun 30 11:34:43 2024 +0200

tree-optimization/115701 - fix maybe_duplicate_ssa_info_at_copy

The following restricts copying of points-to info from defs that
might be in regions invoking UB and are never executed.

PR tree-optimization/115701
* tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy):
Only copy info from within the same BB.

* gcc.dg/torture/pr115701.c: New testcase.

(cherry picked from commit b77f17c5feec9614568bf2dee7f7d811465ee4a5)

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr115701.c | 22 ++
 gcc/tree-ssanames.cc| 22 --
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr115701.c 
b/gcc/testsuite/gcc.dg/torture/pr115701.c
new file mode 100644
index ..9b7c34b23d78
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr115701.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+/* IPA PTA disables local PTA recompute after IPA.  */
+/* { dg-additional-options "-fipa-pta" } */
+
+int a, c, d;
+static int b;
+int main()
+{
+  int *e = , **f = 
+  while (1) {
+int **g, ***h = 
+if (c)
+  *g = e;
+else if (!b)
+  break;
+*e = **g;
+e = 
+  }
+  if (e != )
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 5ad7d117bd33..6c2525900abf 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -763,25 +763,19 @@ duplicate_ssa_name_range_info (tree name, tree src)
 void
 maybe_duplicate_ssa_info_at_copy (tree dest, tree src)
 {
+  /* While points-to info is flow-insensitive we have to avoid copying
+ info from not executed regions invoking UB to dominating defs.  */
+  if (gimple_bb (SSA_NAME_DEF_STMT (src))
+  != gimple_bb (SSA_NAME_DEF_STMT (dest)))
+return;
+
   if (POINTER_TYPE_P (TREE_TYPE (dest))
   && SSA_NAME_PTR_INFO (dest)
   && ! SSA_NAME_PTR_INFO (src))
-{
-  duplicate_ssa_name_ptr_info (src, SSA_NAME_PTR_INFO (dest));
-  /* Points-to information is cfg insensitive,
-but VRP might record context sensitive alignment
-info, non-nullness, etc.  So reset context sensitive
-info if the two SSA_NAMEs aren't defined in the same
-basic block.  */
-  if (gimple_bb (SSA_NAME_DEF_STMT (src))
- != gimple_bb (SSA_NAME_DEF_STMT (dest)))
-   reset_flow_sensitive_info (src);
-}
+duplicate_ssa_name_ptr_info (src, SSA_NAME_PTR_INFO (dest));
   else if (INTEGRAL_TYPE_P (TREE_TYPE (dest))
   && SSA_NAME_RANGE_INFO (dest)
-  && ! SSA_NAME_RANGE_INFO (src)
-  && (gimple_bb (SSA_NAME_DEF_STMT (src))
-  == gimple_bb (SSA_NAME_DEF_STMT (dest
+  && ! SSA_NAME_RANGE_INFO (src))
 duplicate_ssa_name_range_info (src, dest);
 }


[gcc r14-10437] tree-optimization/115701 - factor out maybe_duplicate_ssa_info_at_copy

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:6f74a5f5dc12bc337068f0f6a554d72604488959

commit r14-10437-g6f74a5f5dc12bc337068f0f6a554d72604488959
Author: Richard Biener 
Date:   Sun Jun 30 11:28:11 2024 +0200

tree-optimization/115701 - factor out maybe_duplicate_ssa_info_at_copy

The following factors out the code that preserves SSA info of the LHS
of a SSA copy LHS = RHS when LHS is about to be eliminated to RHS.

PR tree-optimization/115701
* tree-ssanames.h (maybe_duplicate_ssa_info_at_copy): Declare.
* tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy): New
function, split out from ...
* tree-ssa-copy.cc (fini_copy_prop): ... here.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): ...
and here.

(cherry picked from commit b5c64b413fd5bc03a1a8ef86d005892071e42cbe)

Diff:
---
 gcc/tree-ssa-copy.cc  | 32 ++--
 gcc/tree-ssa-sccvn.cc | 21 ++---
 gcc/tree-ssanames.cc  | 28 
 gcc/tree-ssanames.h   |  3 ++-
 4 files changed, 34 insertions(+), 50 deletions(-)

diff --git a/gcc/tree-ssa-copy.cc b/gcc/tree-ssa-copy.cc
index bb88472304c2..9c9ec47adcaa 100644
--- a/gcc/tree-ssa-copy.cc
+++ b/gcc/tree-ssa-copy.cc
@@ -527,38 +527,10 @@ fini_copy_prop (void)
  || copy_of[i].value == var)
continue;
 
-  /* In theory the points-to solution of all members of the
- copy chain is their intersection.  For now we do not bother
-to compute this but only make sure we do not lose points-to
-information completely by setting the points-to solution
-of the representative to the first solution we find if
-it doesn't have one already.  */
+  /* Duplicate points-to and range info appropriately.  */
   if (copy_of[i].value != var
  && TREE_CODE (copy_of[i].value) == SSA_NAME)
-   {
- basic_block copy_of_bb
-   = gimple_bb (SSA_NAME_DEF_STMT (copy_of[i].value));
- basic_block var_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
- if (POINTER_TYPE_P (TREE_TYPE (var))
- && SSA_NAME_PTR_INFO (var)
- && !SSA_NAME_PTR_INFO (copy_of[i].value))
-   {
- duplicate_ssa_name_ptr_info (copy_of[i].value,
-  SSA_NAME_PTR_INFO (var));
- /* Points-to information is cfg insensitive,
-but [E]VRP might record context sensitive alignment
-info, non-nullness, etc.  So reset context sensitive
-info if the two SSA_NAMEs aren't defined in the same
-basic block.  */
- if (var_bb != copy_of_bb)
-   reset_flow_sensitive_info (copy_of[i].value);
-   }
- else if (!POINTER_TYPE_P (TREE_TYPE (var))
-  && SSA_NAME_RANGE_INFO (var)
-  && !SSA_NAME_RANGE_INFO (copy_of[i].value)
-  && var_bb == copy_of_bb)
-   duplicate_ssa_name_range_info (copy_of[i].value, var);
-   }
+   maybe_duplicate_ssa_info_at_copy (var, copy_of[i].value);
 }
 
   class copy_folder copy_folder;
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 02c3bd5f5381..0b5c638df455 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -6871,27 +6871,10 @@ eliminate_dom_walker::eliminate_stmt (basic_block b, 
gimple_stmt_iterator *gsi)
 
   /* If this now constitutes a copy duplicate points-to
 and range info appropriately.  This is especially
-important for inserted code.  See tree-ssa-copy.cc
-for similar code.  */
+important for inserted code.  */
   if (sprime
  && TREE_CODE (sprime) == SSA_NAME)
-   {
- basic_block sprime_b = gimple_bb (SSA_NAME_DEF_STMT (sprime));
- if (POINTER_TYPE_P (TREE_TYPE (lhs))
- && SSA_NAME_PTR_INFO (lhs)
- && ! SSA_NAME_PTR_INFO (sprime))
-   {
- duplicate_ssa_name_ptr_info (sprime,
-  SSA_NAME_PTR_INFO (lhs));
- if (b != sprime_b)
-   reset_flow_sensitive_info (sprime);
-   }
- else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-  && SSA_NAME_RANGE_INFO (lhs)
-  && ! SSA_NAME_RANGE_INFO (sprime)
-  && b == sprime_b)
-   duplicate_ssa_name_range_info (sprime, lhs);
-   }
+   maybe_duplicate_ssa_info_at_copy (lhs, sprime);
 
   /* Inhibit the use of an inserted PHI on a loop header when
 the address of the memory reference is a simple induction
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 1753a421a0ba..5ad7d117bd33 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -757,6 +757,34 @@ duplicate_ssa_name_range_info (tree name, tree src)
 }
 }
 
+/* For a SSA copy DEST = SRC duplicate SSA info present on DEST to SRC
+   to 

[gcc r14-10434] Fixup unaligned load/store cost for znver4

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:d702a957753caf020cb550d143e9e9a62f79e9f5

commit r14-10434-gd702a957753caf020cb550d143e9e9a62f79e9f5
Author: Richard Biener 
Date:   Mon Jul 15 13:01:24 2024 +0200

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

(cherry picked from commit 1e3aa9c9278db69d4bdb661a750a7268789188d6)

Diff:
---
 gcc/config/i386/x86-tune-costs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index d34b5cc2..d0168eebdc15 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1924,8 +1924,8 @@ struct processor_costs znver4_cost = {
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
   {8, 8, 8, 12, 12},   /* cost of storing SSE register
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
-  {6, 6, 6, 6, 6}, /* cost of unaligned loads.  */
-  {8, 8, 8, 8, 8}, /* cost of unaligned stores.  */
+  {6, 6, 10, 10, 12},  /* cost of unaligned loads.  */
+  {8, 8, 8, 12, 12},   /* cost of unaligned stores.  */
   2, 2, 2, /* cost of moving XMM,YMM,ZMM
   register.  */
   6,   /* cost of moving SSE register to 
integer.  */


[gcc r15-2065] tree-optimization/115841 - reduction epilogue placement issue

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:016c947b02e79a5c0c0c2d4ad5cb71aa04db3efd

commit r15-2065-g016c947b02e79a5c0c0c2d4ad5cb71aa04db3efd
Author: Richard Biener 
Date:   Tue Jul 16 11:53:17 2024 +0200

tree-optimization/115841 - reduction epilogue placement issue

When emitting the compensation to the vectorized main loop for
a vector reduction value to be re-used in the vectorized epilogue
we fail to place it in the correct block when the main loop is
known to be entered (no loop_vinfo->main_loop_edge) but the
epilogue is not (a loop_vinfo->skip_this_loop_edge).  The code
currently disregards this situation.

With the recent znver4 cost fix I couldn't trigger this situation
with the testcase but I adjusted it so it could eventually trigger
on other targets.

PR tree-optimization/115841
* tree-vect-loop.cc (vect_transform_cycle_phi): Correctly
place the partial vector reduction for the accumulator
re-use when the main loop cannot be skipped but the
epilogue can.

* gcc.dg/vect/pr115841.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115841.c | 42 
 gcc/tree-vect-loop.cc|  7 +++---
 2 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115841.c 
b/gcc/testsuite/gcc.dg/vect/pr115841.c
new file mode 100644
index ..aa5c66004a03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115841.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast -fcommon -fvect-cost-model=dynamic --param 
vect-partial-vector-usage=1" } */
+/* { dg-additional-options "-mavx512vl" { target avx512vl } } */
+
+/* To trigger the bug costing needs to determine that aligning the A170
+   accesses with a prologue is good and there should be a vectorized
+   epilogue with a smaller vector size, re-using the vector accumulator
+   from the vectorized main loop that's statically known to execute
+   but the epilogue loop is not.  */
+
+static unsigned char xl[192];
+unsigned char A170[192*3];
+
+void jerate (unsigned char *, unsigned char *);
+float foo (unsigned n)
+{
+  jerate (xl, A170);
+
+  unsigned i = 32;
+  int kr = 1;
+  float sfn11s = 0.f;
+  float sfn12s = 0.f;
+  do
+{
+  int krm1 = kr - 1;
+  long j = krm1;
+  float a = (*(float(*)[n])A170)[j];
+  float b = (*(float(*)[n])xl)[j];
+  float c = a * b;
+  float d = c * 6.93149983882904052734375e-1f;
+  float e = (*(float(*)[n])A170)[j+48];
+  float f = (*(float(*)[n])A170)[j+96];
+  float g = d * e;
+  sfn11s = sfn11s + g;
+  float h = f * d;
+  sfn12s = sfn12s + h;
+  kr++;
+}
+  while (--i != 0);
+  float tem = sfn11s + sfn12s;
+  return tem;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a64b5082bd18..b8124a321280 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9026,14 +9026,15 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
  /* And the reduction could be carried out using a different sign.  */
  if (!useless_type_conversion_p (vectype_out, TREE_TYPE (def)))
def = gimple_convert (, vectype_out, def);
- if (loop_vinfo->main_loop_edge)
+ edge e;
+ if ((e = loop_vinfo->main_loop_edge)
+ || (e = loop_vinfo->skip_this_loop_edge))
{
  /* While we'd like to insert on the edge this will split
 blocks and disturb bookkeeping, we also will eventually
 need this on the skip edge.  Rely on sinking to
 fixup optimal placement and insert in the pred.  */
- gimple_stmt_iterator gsi
-   = gsi_last_bb (loop_vinfo->main_loop_edge->src);
+ gimple_stmt_iterator gsi = gsi_last_bb (e->src);
  /* Insert before a cond that eventually skips the
 epilogue.  */
  if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))


[gcc r11-11578] Fixup unaligned load/store cost for znver4

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:bcb2a35a0c04417c407a97d9ff05c2af1d6d1b8d

commit r11-11578-gbcb2a35a0c04417c407a97d9ff05c2af1d6d1b8d
Author: Richard Biener 
Date:   Mon Jul 15 13:01:24 2024 +0200

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

(cherry picked from commit 1e3aa9c9278db69d4bdb661a750a7268789188d6)

Diff:
---
 gcc/config/i386/x86-tune-costs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 48100d104156..58dd711864c8 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1894,8 +1894,8 @@ struct processor_costs znver4_cost = {
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
   {8, 8, 8, 12, 12},   /* cost of storing SSE register
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
-  {6, 6, 6, 6, 6}, /* cost of unaligned loads.  */
-  {8, 8, 8, 8, 8}, /* cost of unaligned stores.  */
+  {6, 6, 10, 10, 12},  /* cost of unaligned loads.  */
+  {8, 8, 8, 12, 12},   /* cost of unaligned stores.  */
   2, 2, 2, /* cost of moving XMM,YMM,ZMM
   register.  */
   6,   /* cost of moving SSE register to 
integer.  */


[gcc r15-2059] Fixup unaligned load/store cost for znver5

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:896393791ee34ffc176c87d232dfee735db3aaab

commit r15-2059-g896393791ee34ffc176c87d232dfee735db3aaab
Author: Richard Biener 
Date:   Tue Jul 16 10:45:27 2024 +0200

Fixup unaligned load/store cost for znver5

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply copied from the bogus znver4 costs.  The following makes
the unaligned costs equal to the aligned costs like in the fixed znver4
version.

* config/i386/x86-tune-costs.h (znver5_cost): Update unaligned
load and store cost from the aligned costs.

Diff:
---
 gcc/config/i386/x86-tune-costs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 2ac75c35aee6..769f334e5318 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -2060,8 +2060,8 @@ struct processor_costs znver5_cost = {
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
   {8, 8, 8, 12, 12},   /* cost of storing SSE register
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
-  {6, 6, 6, 6, 6}, /* cost of unaligned loads.  */
-  {8, 8, 8, 8, 8}, /* cost of unaligned stores.  */
+  {6, 6, 10, 10, 12},  /* cost of unaligned loads.  */
+  {8, 8, 8, 12, 12},   /* cost of unaligned stores.  */
   2, 2, 2, /* cost of moving XMM,YMM,ZMM
   register.  */
   6,   /* cost of moving SSE register to 
integer.  */


[gcc r15-2055] tree-optimization/115843 - fix wrong-code with fully-masked loop and peeling

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:a177be05f6952c3f7e62186d2e138d96c475b81a

commit r15-2055-ga177be05f6952c3f7e62186d2e138d96c475b81a
Author: Richard Biener 
Date:   Mon Jul 15 13:50:58 2024 +0200

tree-optimization/115843 - fix wrong-code with fully-masked loop and peeling

When AVX512 uses a fully masked loop and peeling we fail to create the
correct initial loop mask when the mask is composed of multiple
components in some cases.  The following fixes this by properly applying
the bias for the component to the shift amount.

PR tree-optimization/115843
* tree-vect-loop-manip.cc
(vect_set_loop_condition_partial_vectors_avx512): Properly
bias the shift of the initial mask for alignment peeling.

* gcc.dg/vect/pr115843.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115843.c | 41 
 gcc/tree-vect-loop-manip.cc  |  8 +--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115843.c 
b/gcc/testsuite/gcc.dg/vect/pr115843.c
new file mode 100644
index ..3dbb6c792788
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115843.c
@@ -0,0 +1,41 @@
+/* { dg-additional-options "-mavx512f --param vect-partial-vector-usage=2" { 
target avx512f_runtime } } */
+
+#include "tree-vect.h"
+
+typedef __UINT64_TYPE__ BITBOARD;
+BITBOARD KingPressureMask1[64], KingSafetyMask1[64];
+
+void __attribute__((noinline))
+foo()
+{
+  for (int i = 0; i < 64; i++)
+{
+  if ((i & 7) == 0)
+   KingPressureMask1[i] = KingSafetyMask1[i + 1];
+  else if ((i & 7) == 7)
+   KingPressureMask1[i] = KingSafetyMask1[i - 1];
+  else
+   KingPressureMask1[i] = KingSafetyMask1[i];
+}
+}
+
+BITBOARD verify[64]
+  = {1, 1, 2, 3, 4, 5, 6, 6, 9, 9, 10, 11, 12, 13, 14, 14, 17, 17, 18, 19,
+20, 21, 22, 22, 25, 25, 26, 27, 28, 29, 30, 30, 33, 33, 34, 35, 36, 37, 38,
+38, 41, 41, 42, 43, 44, 45, 46, 46, 49, 49, 50, 51, 52, 53, 54, 54, 57, 57,
+58, 59, 60, 61, 62, 62};
+
+int main()
+{
+  check_vect ();
+
+#pragma GCC novector
+  for (int i = 0; i < 64; ++i)
+KingSafetyMask1[i] = i;
+  foo ();
+#pragma GCC novector
+  for (int i = 0; i < 64; ++i)
+if (KingPressureMask1[i] != verify[i])
+  __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index ac13873cd88d..57dbcbe862cd 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1149,10 +1149,14 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
loop *loop,
  /* ???  But when the shift amount isn't constant this requires
 a round-trip to GRPs.  We could apply the bias to either
 side of the compare instead.  */
- tree shift = gimple_build (_seq, MULT_EXPR,
+ tree shift = gimple_build (_seq, MINUS_EXPR,
 TREE_TYPE (niters_skip), niters_skip,
 build_int_cst (TREE_TYPE (niters_skip),
-   
rgc.max_nscalars_per_iter));
+   bias));
+ shift = gimple_build (_seq, MULT_EXPR,
+   TREE_TYPE (niters_skip), shift,
+   build_int_cst (TREE_TYPE (niters_skip),
+  rgc.max_nscalars_per_iter));
  init_ctrl = gimple_build (_seq, LSHIFT_EXPR,
TREE_TYPE (init_ctrl),
init_ctrl, shift);


[gcc r15-2054] Fixup unaligned load/store cost for znver4

2024-07-16 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:1e3aa9c9278db69d4bdb661a750a7268789188d6

commit r15-2054-g1e3aa9c9278db69d4bdb661a750a7268789188d6
Author: Richard Biener 
Date:   Mon Jul 15 13:01:24 2024 +0200

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

Diff:
---
 gcc/config/i386/x86-tune-costs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index a933794ed505..2ac75c35aee6 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1924,8 +1924,8 @@ struct processor_costs znver4_cost = {
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
   {8, 8, 8, 12, 12},   /* cost of storing SSE register
   in 32bit, 64bit, 128bit, 256bit and 
512bit */
-  {6, 6, 6, 6, 6}, /* cost of unaligned loads.  */
-  {8, 8, 8, 8, 8}, /* cost of unaligned stores.  */
+  {6, 6, 10, 10, 12},  /* cost of unaligned loads.  */
+  {8, 8, 8, 12, 12},   /* cost of unaligned stores.  */
   2, 2, 2, /* cost of moving XMM,YMM,ZMM
   register.  */
   6,   /* cost of moving SSE register to 
integer.  */


Re: GCC 11.5 Release Candidate available from gcc.gnu.org

2024-07-15 Thread Richard Biener via Gcc



> Am 15.07.2024 um 20:07 schrieb William Seurer via Gcc :
> 
> On 7/12/24 7:47 AM, Richard Biener via Gcc wrote:
>> The first release candidate for GCC 11.5 is available from
>> 
>> https://gcc.gnu.org/pub/gcc/snapshots/11.5.0-RC-20240712/
>> 
>> and shortly its mirrors.  It has been generated from git commit
>> r11-11573-g30ffca55041518.
>> 
>> I have so far bootstrapped and tested the release candidate on
>> x86_64-linux.
>> Please test it and report any issues to bugzilla.
>> 
>> If all goes well, we'd like to release 11.5 on Friday, July 19th.
>> 
>> The GCC 11 branch will be closed after this release.
> 
> 
> I tried all the usual powerpc64 variations both BE and LE and the only 
> possible issue I noticed was the ICE from 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109573 is still present

As noted in the audit trail that’s expected with checking enabled.

Richard 

[gcc r15-2014] tree-optimization/115868 - ICE with .MASK_CALL in simdclone

2024-07-13 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:abf3964711f05b6858d9775c3595ec2b45483e14

commit r15-2014-gabf3964711f05b6858d9775c3595ec2b45483e14
Author: Richard Biener 
Date:   Thu Jul 11 09:56:56 2024 +0200

tree-optimization/115868 - ICE with .MASK_CALL in simdclone

The following adjusts mask recording which didn't take into account
that we can merge call arguments from two vectors like

  _50 = {vect_d_1.253_41, vect_d_1.254_43};
  _51 = VIEW_CONVERT_EXPR(mask__19.257_49);
  _52 = (unsigned int) _51;
  _53 = _Z3bazd.simdclone.7 (_50, _52);
  _54 = BIT_FIELD_REF <_53, 256, 0>;
  _55 = BIT_FIELD_REF <_53, 256, 256>;

The testcase g++.dg/vect/pr68762-2.cc exercises this on x86_64 with
partial vector usage enabled and AVX512 support.

PR tree-optimization/115868
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Correctly
compute the number of mask copies required for 
vect_record_loop_mask.

Diff:
---
 gcc/tree-vect-stmts.cc | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 2e4d500d1f26..8530a98e6d69 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4349,9 +4349,14 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
case SIMD_CLONE_ARG_TYPE_MASK:
  if (loop_vinfo
  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
-   vect_record_loop_mask (loop_vinfo,
-  _VINFO_MASKS (loop_vinfo),
-  ncopies, vectype, op);
+   {
+ unsigned nmasks
+   = exact_div (ncopies * bestn->simdclone->simdlen,
+TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
+ vect_record_loop_mask (loop_vinfo,
+_VINFO_MASKS (loop_vinfo),
+nmasks, vectype, op);
+   }
 
  break;
}


GCC 11.5 Release Candidate available from gcc.gnu.org

2024-07-12 Thread Richard Biener via Gcc
The first release candidate for GCC 11.5 is available from

https://gcc.gnu.org/pub/gcc/snapshots/11.5.0-RC-20240712/

and shortly its mirrors.  It has been generated from git commit
r11-11573-g30ffca55041518.

I have so far bootstrapped and tested the release candidate on
x86_64-linux.
Please test it and report any issues to bugzilla.

If all goes well, we'd like to release 11.5 on Friday, July 19th.

The GCC 11 branch will be closed after this release.


gcc-wwwdocs branch master updated. d7ecaf734b4a980ce5d20bc1db92221630e52bf8

2024-07-12 Thread Richard Biener via Gcc-cvs-wwwdocs
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  d7ecaf734b4a980ce5d20bc1db92221630e52bf8 (commit)
  from  79c00fdae63b50b8b8e807529b317d6c42fdd75b (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit d7ecaf734b4a980ce5d20bc1db92221630e52bf8
Author: Richard Biener 
Date:   Fri Jul 12 14:08:56 2024 +0200

Update GCC 11 status.

diff --git a/htdocs/index.html b/htdocs/index.html
index fa941933..a75656b1 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -213,7 +213,7 @@ More news? Let ger...@pfeifer.com know!
   
   https://gcc.gnu.org/pipermail/gcc/2024-June/244175.html;>2024-06-20
   
-  (regression fixes  docs only).
+  (frozen for release).
   
   https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedshort_desc_type=regexpshort_desc=%5C%5B(%5B%200-9.%2F%5D*%5B%20%2F%5D)*11%5B%20%2F%5D%5B%200-9.%2F%5D*%5BRr%5Degression%20*%5C%5Dtarget_milestone=11.5known_to_fail_type=allwordssubstrknown_to_work_type=allwordssubstrlong_desc_type=allwordssubstrlong_desc=bug_file_loc_type=allwordssubstrbug_file_loc=gcchost_type=allwordssubstrgcchost=gcctarget_type=allwordssubstrgcctarget=gccbuild_type=allwordssubstrgccbuild=keywords_type=allwordskeywords=bug_status=UNCONFIRMEDbug_status=NEWbug_status=ASSIGNEDbug_status=SUSPENDEDbug_status=WAITINGbug_status=REOPENEDpriority=P1priority=P2priority=P3emailtype1=substringemail1=emailtype2=substringemail2=bugidtype=includebug_id=votes=chfieldfrom=chfieldto=Nowchfieldvalue=cmdtype=doitorder=Reuse+same
 
+sort+as+last+timefield0-0-0=nooptype0-0-0=noopvalue0-0-0=">Serious

---

Summary of changes:
 htdocs/index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


hooks/post-receive
-- 
gcc-wwwdocs


Re: Help with vector cost model

2024-07-12 Thread Richard Biener via Gcc
On Fri, Jul 12, 2024 at 4:42 AM Andrew Pinski  wrote:
>
> On Thu, Jul 11, 2024 at 2:14 AM Richard Biener
>  wrote:
> >
> > On Thu, Jul 11, 2024 at 10:58 AM Richard Sandiford
> >  wrote:
> > >
> > > Andrew Pinski  writes:
> > > > I need some help with the vector cost model for aarch64.
> > > > I am adding V2HI and V4QI mode support by emulating it using the
> > > > native V4HI/V8QI instructions (similarly to mmx as SSE is done). The
> > > > problem is I am running into a cost model issue with
> > > > gcc.target/aarch64/pr98772.c (wminus is similar to
> > > > gcc.dg/vect/slp-gap-1.c, just slightly different offsets for the
> > > > address).
> > > > It seems like the cost mode is overestimating the number of loads for
> > > > V8QI case .
> > > > With the new cost model usage (-march=armv9-a+nosve), I get:
> > > > ```
> > > > t.c:7:21: note:  * Analysis succeeded with vector mode V4QI
> > > > t.c:7:21: note:  Comparing two main loops (V4QI at VF 1 vs V8QI at VF 2)
> > > > t.c:7:21: note:  Issue info for V4QI loop:
> > > > t.c:7:21: note:load operations = 2
> > > > t.c:7:21: note:store operations = 1
> > > > t.c:7:21: note:general operations = 4
> > > > t.c:7:21: note:reduction latency = 0
> > > > t.c:7:21: note:estimated min cycles per iteration = 2.00
> > > > t.c:7:21: note:  Issue info for V8QI loop:
> > > > t.c:7:21: note:load operations = 12
> > > > t.c:7:21: note:store operations = 1
> > > > t.c:7:21: note:general operations = 6
> > > > t.c:7:21: note:reduction latency = 0
> > > > t.c:7:21: note:estimated min cycles per iteration = 4.33
> > > > t.c:7:21: note:  Weighted cycles per iteration of V4QI loop ~= 4.00
> > > > t.c:7:21: note:  Weighted cycles per iteration of V8QI loop ~= 4.33
> > > > t.c:7:21: note:  Preferring loop with lower cycles per iteration
> > > > t.c:7:21: note:  * Preferring vector mode V4QI to vector mode V8QI
> > > > ```
> > > >
> > > > That is totally wrong and instead of vectorizing using V8QI we
> > > > vectorize using V4QI and the resulting code is worse.
> > > >
> > > > Attached is my current patch for adding V4QI/V2HI to the aarch64
> > > > backend (Note I have not finished up the changelog nor the testcases;
> > > > I have secondary patches that add the testcases already).
> > > > Is there something I am missing here or are we just over estimating
> > > > V8QI cost and is something easy to fix?
> > >
> > > Trying it locally, I get:
> > >
> > > foo.c:15:23: note:  * Analysis succeeded with vector mode V4QI
> > > foo.c:15:23: note:  Comparing two main loops (V4QI at VF 1 vs V8QI at VF 
> > > 2)
> > > foo.c:15:23: note:  Issue info for V4QI loop:
> > > foo.c:15:23: note:load operations = 2
> > > foo.c:15:23: note:store operations = 1
> > > foo.c:15:23: note:general operations = 4
> > > foo.c:15:23: note:reduction latency = 0
> > > foo.c:15:23: note:estimated min cycles per iteration = 2.00
> > > foo.c:15:23: note:  Issue info for V8QI loop:
> > > foo.c:15:23: note:load operations = 8
> > > foo.c:15:23: note:store operations = 1
> > > foo.c:15:23: note:general operations = 6
> > > foo.c:15:23: note:reduction latency = 0
> > > foo.c:15:23: note:estimated min cycles per iteration = 3.00
> > > foo.c:15:23: note:  Weighted cycles per iteration of V4QI loop ~= 4.00
> > > foo.c:15:23: note:  Weighted cycles per iteration of V8QI loop ~= 3.00
> > > foo.c:15:23: note:  Preferring loop with lower cycles per iteration
> > >
> > > The function is:
> > >
> > > extern void
> > > wplus (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
> > > {
> > > for (int y = 0; y < 4; y++ )
> > > {
> > > for (int x = 0; x < 4; x++ )
> > > d[x + y*4] = pix1[x] + pix2[x];
> > > pix1 += 16;
> > > pix2 += 16;
> > > }
> > > }
> > >
> > > For V8QI we need a VF of 2, so that there are 8 elements to store to d.
> > > Conceptually, we handle those two iterations by loading 4 V8QIs from
> > > pix1 and pix2 (32 bytes each), with mitigations against overrun,
> > > and then permute the result to single V8QIs.
> > >
> > > vectorize_load doesn't seem to be smart enough to realise that only 2
> > > of those 4 loads are actually used in the permuation, and so only 2
> > > loads should be costed for each of pix1 and pix2.
> >
> > Though it has code to do that.
>
> So looking into this a little further. Yes there is code that does it
> but it still adds the extra loads and then removes them. And the
> costing part is done before the removal of the extra loads.
>
> From (a non modifed trunk):
> ```
> /app/example.cpp:2:21: note:   add new stmt: vect__34.7_15 = MEM
>  [(unsigned char *)vectp_pix1.5_17];
> /app/example.cpp:2:21: note:   add new stmt: vectp_pix1.5_14 =
> vectp_pix1.5_17 + 8;
> /app/example.cpp:2:21: note:   add new stmt: vect__34.8_13 = MEM
>  [(unsigned char *)vectp_pix1.5_14];
> /app/example.cpp:2:21: note:   add new stmt: 

[gcc r15-1990] tree-optimization/115867 - ICE with simdcall vectorization in masked loop

2024-07-11 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:4f4478f0f31263997bfdc4159f90e58dd79b38f9

commit r15-1990-g4f4478f0f31263997bfdc4159f90e58dd79b38f9
Author: Richard Biener 
Date:   Thu Jul 11 10:18:55 2024 +0200

tree-optimization/115867 - ICE with simdcall vectorization in masked loop

When only a loop mask is to be supplied for the inbranch arg to a
simd function we fail to handle integer mode masks correctly.  We
need to guess the number of elements represented by it.  This assumes
that excess arguments are all for masks, I wasn't able to create
a simdclone with more than one integer mode mask argument.

The gcc.dg/vect/vect-simd-clone-20.c exercises this with -mavx512vl

PR tree-optimization/115867
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Properly
guess the number of mask elements for integer mode masks.

Diff:
---
 gcc/tree-vect-stmts.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fdcda0d2abae..2e4d500d1f26 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4748,7 +4748,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
  SIMD_CLONE_ARG_TYPE_MASK);
 
  tree masktype = bestn->simdclone->args[mask_i].vector_type;
- callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
+ if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode))
+   /* Guess the number of lanes represented by masktype.  */
+   callee_nelements = exact_div (bestn->simdclone->simdlen,
+ bestn->simdclone->nargs - nargs);
+ else
+   callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
  o = vector_unroll_factor (nunits, callee_nelements);
  for (m = j * o; m < (j + 1) * o; m++)
{


Re: Help with vector cost model

2024-07-11 Thread Richard Biener via Gcc
On Thu, Jul 11, 2024 at 10:58 AM Richard Sandiford
 wrote:
>
> Andrew Pinski  writes:
> > I need some help with the vector cost model for aarch64.
> > I am adding V2HI and V4QI mode support by emulating it using the
> > native V4HI/V8QI instructions (similarly to mmx as SSE is done). The
> > problem is I am running into a cost model issue with
> > gcc.target/aarch64/pr98772.c (wminus is similar to
> > gcc.dg/vect/slp-gap-1.c, just slightly different offsets for the
> > address).
> > It seems like the cost mode is overestimating the number of loads for
> > V8QI case .
> > With the new cost model usage (-march=armv9-a+nosve), I get:
> > ```
> > t.c:7:21: note:  * Analysis succeeded with vector mode V4QI
> > t.c:7:21: note:  Comparing two main loops (V4QI at VF 1 vs V8QI at VF 2)
> > t.c:7:21: note:  Issue info for V4QI loop:
> > t.c:7:21: note:load operations = 2
> > t.c:7:21: note:store operations = 1
> > t.c:7:21: note:general operations = 4
> > t.c:7:21: note:reduction latency = 0
> > t.c:7:21: note:estimated min cycles per iteration = 2.00
> > t.c:7:21: note:  Issue info for V8QI loop:
> > t.c:7:21: note:load operations = 12
> > t.c:7:21: note:store operations = 1
> > t.c:7:21: note:general operations = 6
> > t.c:7:21: note:reduction latency = 0
> > t.c:7:21: note:estimated min cycles per iteration = 4.33
> > t.c:7:21: note:  Weighted cycles per iteration of V4QI loop ~= 4.00
> > t.c:7:21: note:  Weighted cycles per iteration of V8QI loop ~= 4.33
> > t.c:7:21: note:  Preferring loop with lower cycles per iteration
> > t.c:7:21: note:  * Preferring vector mode V4QI to vector mode V8QI
> > ```
> >
> > That is totally wrong and instead of vectorizing using V8QI we
> > vectorize using V4QI and the resulting code is worse.
> >
> > Attached is my current patch for adding V4QI/V2HI to the aarch64
> > backend (Note I have not finished up the changelog nor the testcases;
> > I have secondary patches that add the testcases already).
> > Is there something I am missing here or are we just over estimating
> > V8QI cost and is something easy to fix?
>
> Trying it locally, I get:
>
> foo.c:15:23: note:  * Analysis succeeded with vector mode V4QI
> foo.c:15:23: note:  Comparing two main loops (V4QI at VF 1 vs V8QI at VF 2)
> foo.c:15:23: note:  Issue info for V4QI loop:
> foo.c:15:23: note:load operations = 2
> foo.c:15:23: note:store operations = 1
> foo.c:15:23: note:general operations = 4
> foo.c:15:23: note:reduction latency = 0
> foo.c:15:23: note:estimated min cycles per iteration = 2.00
> foo.c:15:23: note:  Issue info for V8QI loop:
> foo.c:15:23: note:load operations = 8
> foo.c:15:23: note:store operations = 1
> foo.c:15:23: note:general operations = 6
> foo.c:15:23: note:reduction latency = 0
> foo.c:15:23: note:estimated min cycles per iteration = 3.00
> foo.c:15:23: note:  Weighted cycles per iteration of V4QI loop ~= 4.00
> foo.c:15:23: note:  Weighted cycles per iteration of V8QI loop ~= 3.00
> foo.c:15:23: note:  Preferring loop with lower cycles per iteration
>
> The function is:
>
> extern void
> wplus (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
> {
> for (int y = 0; y < 4; y++ )
> {
> for (int x = 0; x < 4; x++ )
> d[x + y*4] = pix1[x] + pix2[x];
> pix1 += 16;
> pix2 += 16;
> }
> }
>
> For V8QI we need a VF of 2, so that there are 8 elements to store to d.
> Conceptually, we handle those two iterations by loading 4 V8QIs from
> pix1 and pix2 (32 bytes each), with mitigations against overrun,
> and then permute the result to single V8QIs.
>
> vectorize_load doesn't seem to be smart enough to realise that only 2
> of those 4 loads are actually used in the permuation, and so only 2
> loads should be costed for each of pix1 and pix2.

Though it has code to do that.

Richard.

> Thanks,
> Richard


[gcc r11-11562] c++: Add testcase for this PR [PR97990]

2024-07-08 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:c2c216d0f85f861cc10529a455edfaf645aa393f

commit r11-11562-gc2c216d0f85f861cc10529a455edfaf645aa393f
Author: Andrew Pinski 
Date:   Fri Feb 16 10:55:43 2024 -0800

c++: Add testcase for this PR [PR97990]

This testcase was fixed by r14-5934-gf26d68d5d128c8 but we should add
one to make sure it does not regress again.

Committed as obvious after a quick test on the testcase.

PR c++/97990

gcc/testsuite/ChangeLog:

* g++.dg/torture/vector-struct-1.C: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 5f1438db419c9eb8901d1d1d7f98fb69082aec8e)

Diff:
---
 gcc/testsuite/g++.dg/torture/vector-struct-1.C | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/testsuite/g++.dg/torture/vector-struct-1.C 
b/gcc/testsuite/g++.dg/torture/vector-struct-1.C
new file mode 100644
index ..e2747417e2d5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/vector-struct-1.C
@@ -0,0 +1,18 @@
+/* PR c++/97990 */
+/* This used to crash with lto and strict aliasing enabled as the
+   vector type variant still had TYPE_ALIAS_SET set on it. */
+
+typedef __attribute__((__vector_size__(sizeof(short short TSimd;
+TSimd hh(int);
+struct y6
+{
+  TSimd VALUE;
+  ~y6();
+};
+template 
+auto f2(T1 p1, T2){
+  return hh(p1) <= 0;
+}
+void f1(){
+  f2(0, y6{});
+}


[gcc r11-11561] middle-end/112732 - stray TYPE_ALIAS_SET in type variant

2024-07-08 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:e7879391bb2b86606d0ce35ed97eccc108970e36

commit r11-11561-ge7879391bb2b86606d0ce35ed97eccc108970e36
Author: Richard Biener 
Date:   Tue Nov 28 12:36:21 2023 +0100

middle-end/112732 - stray TYPE_ALIAS_SET in type variant

The following fixes a stray TYPE_ALIAS_SET in a type variant built
by build_opaque_vector_type which is diagnosed by type checking
enabled with -flto.

PR middle-end/112732
* tree.c (build_opaque_vector_type): Reset TYPE_ALIAS_SET
of the newly built type.

(cherry picked from commit f26d68d5d128c86faaceeb81b1e8f22254ad53df)

Diff:
---
 gcc/tree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree.c b/gcc/tree.c
index 8b5b0b7508cc..2cbdc7b65ba9 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -11098,6 +11098,8 @@ build_opaque_vector_type (tree innertype, poly_int64 
nunits)
   TYPE_NEXT_VARIANT (cand) = TYPE_NEXT_VARIANT (t);
   TYPE_NEXT_VARIANT (t) = cand;
   TYPE_MAIN_VARIANT (cand) = TYPE_MAIN_VARIANT (t);
+  /* Type variants have no alias set defined.  */
+  TYPE_ALIAS_SET (cand) = -1;
   return cand;
 }


[gcc r14-10394] tree-optimization/115723 - ICE with .COND_ADD reduction

2024-07-08 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:64a6c0d594c05f275de91df35047cffb3ccecf2f

commit r14-10394-g64a6c0d594c05f275de91df35047cffb3ccecf2f
Author: Richard Biener 
Date:   Mon Jul 1 10:06:55 2024 +0200

tree-optimization/115723 - ICE with .COND_ADD reduction

The following fixes an ICE with a .COND_ADD discovered as reduction
even though its else value isn't the reduction chain link but a
constant.  This would be wrong-code with --disable-checking I think.

PR tree-optimization/115723
* tree-vect-loop.cc (check_reduction_path): For a .COND_ADD
verify the else value also refers to the reduction chain op.

* gcc.dg/vect/pr115723.c: New testcase.

(cherry picked from commit 286cda3461d6f5ce7d911d3f26bd4975ea7ea11d)

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115723.c | 25 +
 gcc/tree-vect-loop.cc| 12 
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115723.c 
b/gcc/testsuite/gcc.dg/vect/pr115723.c
new file mode 100644
index ..b98b29d48702
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115723.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-ffast-math -fno-unsafe-math-optimizations" } */
+
+#include "tree-vect.h"
+
+double __attribute__((noipa))
+foo (double *x, double *y, int n)
+{
+  double res = 0.;
+  for (int i = 0; i < n; ++i)
+if (y[i] > 0.)
+  res += x[i];
+else
+  res = 64.;
+  return res;
+}
+
+double y[16] = { 1., 1., 1., 1., 0., 1., 1., 1.,
+ 1., 1., 1., 1., 1., 1., 1., 1. };
+int main ()
+{
+  check_vect ();
+  if (foo (y, y, 16) != 64. + 11.)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 29c03c246d45..832399f7e9d7 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4161,15 +4161,19 @@ pop:
 
   FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
{
-   /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
-  op1 twice (once as definition, once as else) in the same operation.
-  Allow this.  */
+ /* In case of a COND_OP (mask, op1, op2, op1) reduction we should
+have op1 twice (once as definition, once as else) in the same
+operation.  Enforce this.  */
  if (cond_fn_p && op_use_stmt == use_stmt)
{
  gcall *call = as_a (use_stmt);
  unsigned else_pos
= internal_fn_else_index (internal_fn (op.code));
-
+ if (gimple_call_arg (call, else_pos) != op.ops[opi])
+   {
+ fail = true;
+ break;
+   }
  for (unsigned int j = 0; j < gimple_call_num_args (call); ++j)
{
  if (j == else_pos)


[gcc r14-10392] tree-optimization/115669 - fix SLP reduction association

2024-07-08 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:03844a2a15a85015506c0f187d0e9d526900cc2c

commit r14-10392-g03844a2a15a85015506c0f187d0e9d526900cc2c
Author: Richard Biener 
Date:   Thu Jun 27 11:26:08 2024 +0200

tree-optimization/115669 - fix SLP reduction association

The following avoids associating a reduction path as that might
get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order.
This is a latent issue with SLP reductions but now easily exposed
as we're doing single-lane SLP reductions.

When we achieved SLP only we can move and update this meta-data.

PR tree-optimization/115669
* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate
chains that participate in a reduction.

* gcc.dg/vect/pr115669.c: New testcase.

(cherry picked from commit 7886830bb45c4f5dca0496d4deae9a45204d78f5)

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115669.c | 22 ++
 gcc/tree-vect-slp.cc |  3 +++
 2 files changed, 25 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115669.c 
b/gcc/testsuite/gcc.dg/vect/pr115669.c
new file mode 100644
index ..361a17a64e68
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115669.c
@@ -0,0 +1,22 @@
+/* { dg-additional-options "-fwrapv" } */
+
+#include "tree-vect.h"
+
+int a = 10;
+unsigned b;
+long long c[100];
+int foo()
+{
+  long long *d = c;
+  for (short e = 0; e < a; e++)
+b += ~(d ? d[e] : 0);
+  return b;
+}
+
+int main()
+{
+  check_vect ();
+  if (foo () != -10)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 5e7e9b5bf085..0795605ec527 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2050,6 +2050,9 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
   else if (is_a  (vinfo)
   /* ???  We don't handle !vect_internal_def defs below.  */
   && STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def
+  /* ???  Do not associate a reduction, this will wreck REDUC_IDX
+ mapping as long as that exists on the stmt_info level.  */
+  && STMT_VINFO_REDUC_IDX (stmt_info) == -1
   && is_gimple_assign (stmt_info->stmt)
   && (associative_tree_code (gimple_assign_rhs_code (stmt_info->stmt))
   || gimple_assign_rhs_code (stmt_info->stmt) == MINUS_EXPR)


[gcc r14-10393] tree-optimization/115694 - ICE with complex store rewrite

2024-07-08 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:cde411950e91e0174a0134360d2eb138ca6821c6

commit r14-10393-gcde411950e91e0174a0134360d2eb138ca6821c6
Author: Richard Biener 
Date:   Sun Jun 30 13:07:14 2024 +0200

tree-optimization/115694 - ICE with complex store rewrite

The following adds a missed check when forwprop attempts to rewrite
a complex store.

PR tree-optimization/115694
* tree-ssa-forwprop.cc (pass_forwprop::execute): Check the
store is complex before rewriting it.

* g++.dg/torture/pr115694.C: New testcase.

(cherry picked from commit 543a5b9da964f821b9e723ed9c93d6cdca464d47)

Diff:
---
 gcc/testsuite/g++.dg/torture/pr115694.C | 13 +
 gcc/tree-ssa-forwprop.cc|  2 ++
 2 files changed, 15 insertions(+)

diff --git a/gcc/testsuite/g++.dg/torture/pr115694.C 
b/gcc/testsuite/g++.dg/torture/pr115694.C
new file mode 100644
index ..bbce47decf83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr115694.C
@@ -0,0 +1,13 @@
+// { dg-do compile }
+
+_Complex a;
+typedef struct {
+  double a[2];
+} b;
+void c(b);
+void d()
+{
+  _Complex b1 = a;
+  b t = __builtin_bit_cast (b, b1);
+  c(t);
+}
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 05d42ccd3c61..abf71f0d3a03 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -3762,6 +3762,8 @@ pass_forwprop::execute (function *fun)
  && gimple_store_p (use_stmt)
  && !gimple_has_volatile_ops (use_stmt)
  && is_gimple_assign (use_stmt)
+ && (TREE_CODE (TREE_TYPE (gimple_assign_lhs (use_stmt)))
+ == COMPLEX_TYPE)
  && (TREE_CODE (gimple_assign_lhs (use_stmt))
  != TARGET_MEM_REF))
{


[gcc r14-10391] tree-optimization/115646 - ICE with pow shrink-wrapping from bitfield

2024-07-08 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:078cdccc849831b8f1ff74b9ad16ce3f5aa172be

commit r14-10391-g078cdccc849831b8f1ff74b9ad16ce3f5aa172be
Author: Richard Biener 
Date:   Tue Jun 25 16:13:02 2024 +0200

tree-optimization/115646 - ICE with pow shrink-wrapping from bitfield

The following makes analysis and transform agree on constraints.

PR tree-optimization/115646
* tree-call-cdce.cc (check_pow): Check for bit_sz values
as allowed by transform.

* gcc.dg/pr115646.c: New testcase.

(cherry picked from commit 453b1d291d1a0f89087ad91cf6b1bed1ec68eff3)

Diff:
---
 gcc/testsuite/gcc.dg/pr115646.c | 13 +
 gcc/tree-call-cdce.cc   |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr115646.c b/gcc/testsuite/gcc.dg/pr115646.c
new file mode 100644
index ..24bc1e45
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115646.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+extern double pow(double x, double y);
+
+struct S {
+unsigned int a : 3, b : 8, c : 21;
+};
+
+void foo (struct S *p)
+{
+  pow (p->c, 42);
+}
diff --git a/gcc/tree-call-cdce.cc b/gcc/tree-call-cdce.cc
index 7f67a0b2dc6f..befe6acf178a 100644
--- a/gcc/tree-call-cdce.cc
+++ b/gcc/tree-call-cdce.cc
@@ -260,7 +260,7 @@ check_pow (gcall *pow_call)
   /* If the type of the base is too wide,
  the resulting shrink wrapping condition
 will be too conservative.  */
-  if (bit_sz > MAX_BASE_INT_BIT_SIZE)
+  if (bit_sz != 8 && bit_sz != 16 && bit_sz != MAX_BASE_INT_BIT_SIZE)
 return false;
 
   return true;


[gcc r15-1848] Support group size of three in SLP store permute lowering

2024-07-05 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:7eb8b65780d9dc3e266056383279b00d5e152bea

commit r15-1848-g7eb8b65780d9dc3e266056383279b00d5e152bea
Author: Richard Biener 
Date:   Wed Jul 3 13:50:59 2024 +0200

Support group size of three in SLP store permute lowering

The following implements the group-size three scheme from
vect_permute_store_chain in SLP grouped store permute lowering
and extends it to power-of-two multiples of group size three.

The scheme goes from vectors A, B and C to
{ A[0], B[0], C[0], A[1], B[1], C[1], ... } by first producing
{ A[0], B[0], X, A[1], B[1], X, ... } (with X random but chosen
to A[n]) and then permuting in C[n] in the appropriate places.

The extension goes as to replace vector elements with a
power-of-two number of lanes and you'd get pairwise interleaving
until the final three input permutes happen.

The last permute step could be seen as extending C to { C[0], C[0],
C[0], ... } and then performing a blend.

VLA archs will want to use store-lanes here I guess, I'm not sure
if the three vector interleave operation is also available with
a register source and destination and thus available for a shuffle.

* tree-vect-slp.cc (vect_build_slp_instance): Special case
three input permute with the same number of lanes in store
permute lowering.

* gcc.dg/vect/slp-53.c: New testcase.
* gcc.dg/vect/slp-54.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/vect/slp-53.c | 15 +
 gcc/testsuite/gcc.dg/vect/slp-54.c | 18 +++
 gcc/tree-vect-slp.cc   | 65 +-
 3 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-53.c 
b/gcc/testsuite/gcc.dg/vect/slp-53.c
new file mode 100644
index 000..d8cd5f85b3c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-53.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+void foo (int * __restrict x, int *y)
+{
+  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
+  y = __builtin_assume_aligned (y, __BIGGEST_ALIGNMENT__);
+  for (int i = 0; i < 1024; ++i)
+{
+  x[3*i+0] = y[2*i+0] * 7 + 5;
+  x[3*i+1] = y[2*i+1] * 2;
+  x[3*i+2] = y[2*i+0] + 3;
+}
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_int && vect_int_mult } xfail vect_load_lanes } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-54.c 
b/gcc/testsuite/gcc.dg/vect/slp-54.c
new file mode 100644
index 000..ab66b349d1f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-54.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+
+void foo (int * __restrict x, int *y)
+{
+  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
+  y = __builtin_assume_aligned (y, __BIGGEST_ALIGNMENT__);
+  for (int i = 0; i < 1024; ++i)
+{
+  x[6*i+0] = y[4*i+0] * 7 + 5;
+  x[6*i+1] = y[4*i+1] * 2;
+  x[6*i+2] = y[4*i+2] + 3;
+  x[6*i+3] = y[4*i+3] * 7 + 5;
+  x[6*i+4] = y[4*i+0] * 2;
+  x[6*i+5] = y[4*i+3] + 3;
+}
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_int && vect_int_mult } xfail riscv*-*-* } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index a8bb08ea7be..d0a8531fd3b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3697,6 +3697,69 @@ vect_build_slp_instance (vec_info *vinfo,
 most two vector inputs to produce a single vector output.  */
  while (SLP_TREE_CHILDREN (perm).length () > 2)
{
+ /* When we have three equal sized groups left the pairwise
+reduction does not result in a scheme that avoids using
+three vectors.  Instead merge the first two groups
+to the final size with do-not-care elements (chosen
+from the first group) and then merge with the third.
+  { A0, B0,  x, A1, B1,  x, ... }
+   -> { A0, B0, C0, A1, B1, C1, ... }
+This handles group size of three (and at least
+power-of-two multiples of that).  */
+ if (SLP_TREE_CHILDREN (perm).length () == 3
+ && (SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[0])
+ == SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[1]))
+ && (SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[0])
+ == SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[2])))
+   {
+ int ai = 0;
+ int bi = 1;
+ slp_tree a = SLP_TREE_CHILDREN (perm)[ai];
+ slp_tree b = SLP_TREE_CHILDREN (perm)[bi];
+ unsigned n = SLP_TREE_LANES (perm);
+
+ slp_tree permab
+   = vect_create_new_slp_node (2, VEC_PERM_EXPR);
+ SLP_TREE_LANES 

[gcc r15-1837] middle-end/115426 - wrong gimplification of "rm" asm output operand

2024-07-04 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:a4bbdec2be1c9f8fb49276b8a54ee86024ceac17

commit r15-1837-ga4bbdec2be1c9f8fb49276b8a54ee86024ceac17
Author: Richard Biener 
Date:   Tue Jun 11 13:11:08 2024 +0200

middle-end/115426 - wrong gimplification of "rm" asm output operand

When the operand is gimplified to an extract of a register or a
register we have to disallow memory as we otherwise fail to
gimplify it properly.  Instead of

  __asm__("" : "=rm" __imag );

we want

  __asm__("" : "=rm" D.2772);
  _1 = REALPART_EXPR ;
  r = COMPLEX_EXPR <_1, D.2772>;

otherwise SSA rewrite will fail and generate wrong code with 'r'
left bare in the asm output.

PR middle-end/115426
* gimplify.cc (gimplify_asm_expr): Handle "rm" output
constraint gimplified to a register (operation).

* gcc.dg/pr115426.c: New testcase.

Diff:
---
 gcc/gimplify.cc |  8 
 gcc/testsuite/gcc.dg/pr115426.c | 14 ++
 2 files changed, 22 insertions(+)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 622c51d5c3f..5a9627c4acf 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7040,6 +7040,14 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
  ret = tret;
}
 
+  /* If the gimplified operand is a register we do not allow memory.  */
+  if (allows_reg
+ && allows_mem
+ && (is_gimple_reg (TREE_VALUE (link))
+ || (handled_component_p (TREE_VALUE (link))
+ && is_gimple_reg (TREE_OPERAND (TREE_VALUE (link), 0)
+   allows_mem = 0;
+
   /* If the constraint does not allow memory make sure we gimplify
  it to a register if it is not already but its base is.  This
 happens for complex and vector components.  */
diff --git a/gcc/testsuite/gcc.dg/pr115426.c b/gcc/testsuite/gcc.dg/pr115426.c
new file mode 100644
index 000..02bfc3f21fa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115426.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu11" } */
+
+_Complex int fcs (_Complex int r)
+{
+  __asm__("" : "=rm" (__imag__ r));
+  return r;
+}
+
+_Complex int fcs2 (_Complex int r)
+{
+  __asm__("" : "=m" (__imag__ r));
+  return r;
+}


[gcc r15-1818] Remove redundant vector permute dump

2024-07-03 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:1dc2096537818bd80191e0d6015412e2906658bc

commit r15-1818-g1dc2096537818bd80191e0d6015412e2906658bc
Author: Richard Biener 
Date:   Wed Jul 3 13:49:58 2024 +0200

Remove redundant vector permute dump

The following removes redundant dumping in vect permute vectorization.

* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
redundant dump.

Diff:
---
 gcc/tree-vect-slp.cc | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 22ed59a817d..a8bb08ea7be 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9350,16 +9350,6 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
 }
 
   gcc_assert (perm.length () == SLP_TREE_LANES (node));
-  if (dump_p)
-{
-  dump_printf_loc (MSG_NOTE, vect_location,
-  "vectorizing permutation");
-  for (unsigned i = 0; i < perm.length (); ++i)
-   dump_printf (MSG_NOTE, " op%u[%u]", perm[i].first, perm[i].second);
-  if (repeating_p)
-   dump_printf (MSG_NOTE, " (repeat %d)\n", SLP_TREE_LANES (node));
-  dump_printf (MSG_NOTE, "\n");
-}
 
   /* REPEATING_P is true if every output vector is guaranteed to use the
  same permute vector.  We can handle that case for both variable-length


[gcc r15-1811] Handle NULL stmt in SLP_TREE_SCALAR_STMTS

2024-07-03 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:03a810da10d8dfb5aec9261372cad7bf090e6986

commit r15-1811-g03a810da10d8dfb5aec9261372cad7bf090e6986
Author: Richard Biener 
Date:   Fri Jun 28 16:04:13 2024 +0200

Handle NULL stmt in SLP_TREE_SCALAR_STMTS

The following starts to handle NULL elements in SLP_TREE_SCALAR_STMTS
with the first candidate being the two-operator nodes where some
lanes are do-not-care and also do not have a scalar stmt computing
the result.  I originally added SLP_TREE_SCALAR_STMTS to two-operator
nodes but this exposes PR115764, so I've split that out.

I have a patch use NULL elements for loads from groups with gaps
where we get around not doing that by having a load permutation.

* tree-vect-slp.cc (bst_traits::hash): Handle NULL elements
in SLP_TREE_SCALAR_STMTS.
(vect_print_slp_tree): Likewise.
(vect_mark_slp_stmts): Likewise.
(vect_mark_slp_stmts_relevant): Likewise.
(vect_find_last_scalar_stmt_in_slp): Likewise.
(vect_bb_slp_mark_live_stmts): Likewise.
(vect_slp_prune_covered_roots): Likewise.
(vect_bb_partition_graph_r): Likewise.
(vect_remove_slp_scalar_calls): Likewise.
(vect_slp_gather_vectorized_scalar_stmts): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_contains_pattern_stmt_p): Likewise.
(vect_slp_convert_to_external): Likewise.
(vect_find_first_scalar_stmt_in_slp): Likewise.
(vect_optimize_slp_pass::remove_redundant_permutations): Likewise.
(vect_slp_analyze_node_operations_1): Likewise.
(vect_schedule_slp_node): Likewise.
* tree-vect-stmts.cc (can_vectorize_live_stmts): Likewise.
(vectorizable_shift): Likewise.
* tree-vect-data-refs.cc (vect_slp_analyze_load_dependences):
Handle NULL elements in SLP_TREE_SCALAR_STMTS.

Diff:
---
 gcc/tree-vect-data-refs.cc |  2 ++
 gcc/tree-vect-slp.cc   | 76 --
 gcc/tree-vect-stmts.cc | 22 --
 3 files changed, 61 insertions(+), 39 deletions(-)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 959e127c385..39fd887a96b 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -1041,6 +1041,8 @@ vect_slp_analyze_load_dependences (vec_info *vinfo, 
slp_tree node,
 
   for (unsigned k = 0; k < SLP_TREE_SCALAR_STMTS (node).length (); ++k)
 {
+  if (! SLP_TREE_SCALAR_STMTS (node)[k])
+   continue;
   stmt_vec_info access_info
= vect_orig_stmt (SLP_TREE_SCALAR_STMTS (node)[k]);
   if (access_info == first_access_info)
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 48e0f9d2705..22ed59a817d 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -355,7 +355,7 @@ vect_contains_pattern_stmt_p (vec stmts)
   stmt_vec_info stmt_info;
   unsigned int i;
   FOR_EACH_VEC_ELT (stmts, i, stmt_info)
-if (is_pattern_stmt_p (stmt_info))
+if (stmt_info && is_pattern_stmt_p (stmt_info))
   return true;
   return false;
 }
@@ -1591,7 +1591,7 @@ bst_traits::hash (value_type x)
 {
   inchash::hash h;
   for (unsigned i = 0; i < x.length (); ++i)
-h.add_int (gimple_uid (x[i]->stmt));
+h.add_int (x[i] ? gimple_uid (x[i]->stmt) : -1);
   return h.end ();
 }
 inline bool
@@ -2800,9 +2800,12 @@ vect_print_slp_tree (dump_flags_t dump_kind, 
dump_location_t loc,
 }
   if (SLP_TREE_SCALAR_STMTS (node).exists ())
 FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
-  dump_printf_loc (metadata, user_loc, "\t%sstmt %u %G",
-  STMT_VINFO_LIVE_P (stmt_info) ? "[l] " : "",
-  i, stmt_info->stmt);
+  if (stmt_info)
+   dump_printf_loc (metadata, user_loc, "\t%sstmt %u %G",
+STMT_VINFO_LIVE_P (stmt_info) ? "[l] " : "",
+i, stmt_info->stmt);
+  else
+   dump_printf_loc (metadata, user_loc, "\tstmt %u ---\n", i);
   else
 {
   dump_printf_loc (metadata, user_loc, "\t{ ");
@@ -2943,7 +2946,8 @@ vect_mark_slp_stmts (slp_tree node, hash_set 
)
 return;
 
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
-STMT_SLP_TYPE (stmt_info) = pure_slp;
+if (stmt_info)
+  STMT_SLP_TYPE (stmt_info) = pure_slp;
 
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
 if (child)
@@ -2973,11 +2977,12 @@ vect_mark_slp_stmts_relevant (slp_tree node, 
hash_set )
 return;
 
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
-{
-  gcc_assert (!STMT_VINFO_RELEVANT (stmt_info)
-  || STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope);
-  STMT_VINFO_RELEVANT (stmt_info) = vect_used_in_scope;
-}
+if (stmt_info)
+  {
+   gcc_assert (!STMT_VINFO_RELEVANT (stmt_info)
+   || STMT_VINFO_RELEVANT 

[gcc r15-1804] tree-optimization/115764 - testcase for BB SLP issue

2024-07-03 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:2be2145f4f14a79e4bb8e845168d7f0d25dc1b5b

commit r15-1804-g2be2145f4f14a79e4bb8e845168d7f0d25dc1b5b
Author: Richard Biener 
Date:   Wed Jul 3 09:05:06 2024 +0200

tree-optimization/115764 - testcase for BB SLP issue

The following adds a testcase for a CSE issue with BB SLP two operator
handling when we make those CSE aware by providing SLP_TREE_SCALAR_STMTS
for them.  This was reduced from 526.blender_r.

PR tree-optimization/115764
* gcc.dg/vect/bb-slp-76.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/vect/bb-slp-76.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-76.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-76.c
new file mode 100644
index 000..b3b6a58e7c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-76.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ffast-math" } */
+
+typedef struct {
+  float xmin, xmax;
+} rctf;
+int U_0;
+float BLI_rctf_size_x_rct_1, view_zoomdrag_apply_dx;
+void *view_zoomdrag_apply_op_0;
+float RNA_float_get();
+typedef struct {
+  rctf cur;
+} View2D;
+typedef struct {
+  View2D v2d;
+} v2dViewZoomData;
+void view_zoomdrag_apply() {
+  v2dViewZoomData *vzd = view_zoomdrag_apply_op_0;
+  View2D *v2d = >v2d;
+  view_zoomdrag_apply_dx = RNA_float_get();
+  if (U_0) {
+float mval_fac = BLI_rctf_size_x_rct_1, mval_faci = mval_fac,
+  ofs = mval_faci * view_zoomdrag_apply_dx;
+v2d->cur.xmin += ofs + view_zoomdrag_apply_dx;
+v2d->cur.xmax += ofs - view_zoomdrag_apply_dx;
+  } else {
+v2d->cur.xmin += view_zoomdrag_apply_dx;
+v2d->cur.xmax -= view_zoomdrag_apply_dx;
+  }
+}


Re: I have questions regarding the 4.3 codebase...

2024-07-03 Thread Richard Biener via Gcc
On Tue, Jul 2, 2024 at 9:26 PM Sid Maxwell via Gcc  wrote:
>
> I have another gcc 4.3 question.  I'm trying to find where in the code base
> the instrumentation for basic block coverage is done.  I've tracked down
> where/how mcount() calls are generated, but I haven't even been able to
> determine what function(s) are called to increment a basic block's count.
> I'd also like to find more detailed information regarding profiling,
> coverage, and function instrumentation.

Look into gcc/tree-profile.c

> On Wed, Mar 22, 2023 at 6:27 PM Sid Maxwell  wrote:
>
> > Is there anyone on the list with experience with the gcc 4.3 codebase?
> > I'm currently maintaining a fork of it, with a PDP10 code generator.
> >
> > I've run into an issue involving the transformation of a movmemhi to a
> > single PDP10 instruction (an xblt, if you're curious).
> > The transformation appears to 'lose' its knowledge of being a store,
> > resulting in certain following stores being declared dead, and code
> > motion that shouldn't happen (e.g. a load moved before the xblt that
> > depends on the result of the xblt).
> >
> > I'm hoping to find someone who can help me diagnose the problem.  We want
> > to use this instruction rather than the copy-word-loop currently generated
> > for struct assignments.
> >
> > Thanks, in advance, for any assistance.
> >
> > -+- Sid Maxwell
> >


[gcc r15-1783] tree-optimization/115741 - ICE with VMAT_CONTIGUOUS_REVERSE and gap

2024-07-02 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:9bd51351c175d345b8a9b3c19ba49ba358940272

commit r15-1783-g9bd51351c175d345b8a9b3c19ba49ba358940272
Author: Richard Biener 
Date:   Tue Jul 2 09:33:29 2024 +0200

tree-optimization/115741 - ICE with VMAT_CONTIGUOUS_REVERSE and gap

When we determine overrun we have to consider VMAT_CONTIGUOUS_REVERSE
the same as VMAT_CONTIGUOUS.

PR tree-optimization/115741
* tree-vect-stmts.cc (get_group_load_store_type): Also
handle VMAT_CONTIGUOUS_REVERSE when determining overrun.

Diff:
---
 gcc/tree-vect-stmts.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index aab3aa59962..20b84515446 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2099,7 +2099,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
 If there is a combination of the access not covering the full
 vector and a gap recorded then we may need to peel twice.  */
  if (loop_vinfo
- && *memory_access_type == VMAT_CONTIGUOUS
+ && (*memory_access_type == VMAT_CONTIGUOUS
+ || *memory_access_type == VMAT_CONTIGUOUS_REVERSE)
  && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
  && !multiple_p (group_size * LOOP_VINFO_VECT_FACTOR (loop_vinfo),
  nunits))


[gcc r15-1757] Preserve SSA info for more propagated copy

2024-07-01 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:4d24159a1fcb15e1e28f46aa418de5e1ae384ff5

commit r15-1757-g4d24159a1fcb15e1e28f46aa418de5e1ae384ff5
Author: Richard Biener 
Date:   Sun Jun 30 11:37:12 2024 +0200

Preserve SSA info for more propagated copy

Besides VN and copy-prop also CCP and VRP as well as forwprop
propagate out copies and thus it's worthwhile to try to preserve
range and points-to info there when possible.

Note that this also fixes the testcase from PR115701 but that's
because we do not actually intersect info but only copy info when
there was no info present.

* tree-ssa-forwprop.cc (fwprop_set_lattice_val): Preserve
SSA info.
* tree-ssa-propagate.cc
(substitute_and_fold_dom_walker::before_dom_children): Likewise.

Diff:
---
 gcc/tree-ssa-forwprop.cc  | 4 
 gcc/tree-ssa-propagate.cc | 8 
 2 files changed, 12 insertions(+)

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index abf71f0d3a0..44a6b5d39aa 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -207,6 +207,10 @@ fwprop_set_lattice_val (tree name, tree val)
  lattice.quick_grow_cleared (num_ssa_names);
}
   lattice[SSA_NAME_VERSION (name)] = val;
+  /* As this now constitutes a copy duplicate points-to
+and range info appropriately.  */
+  if (TREE_CODE (val) == SSA_NAME)
+   maybe_duplicate_ssa_info_at_copy (name, val);
 }
 }
 
diff --git a/gcc/tree-ssa-propagate.cc b/gcc/tree-ssa-propagate.cc
index a34c7618b55..d96d0a9fe19 100644
--- a/gcc/tree-ssa-propagate.cc
+++ b/gcc/tree-ssa-propagate.cc
@@ -789,6 +789,10 @@ substitute_and_fold_dom_walker::before_dom_children 
(basic_block bb)
  fprintf (dump_file, "\n");
}
  bitmap_set_bit (dceworklist, SSA_NAME_VERSION (res));
+ /* As this now constitutes a copy duplicate points-to
+and range info appropriately.  */
+ if (TREE_CODE (sprime) == SSA_NAME)
+   maybe_duplicate_ssa_info_at_copy (res, sprime);
  continue;
}
}
@@ -831,6 +835,10 @@ substitute_and_fold_dom_walker::before_dom_children 
(basic_block bb)
  fprintf (dump_file, "\n");
}
  bitmap_set_bit (dceworklist, SSA_NAME_VERSION (lhs));
+ /* As this now constitutes a copy duplicate points-to
+and range info appropriately.  */
+ if (TREE_CODE (sprime) == SSA_NAME)
+   maybe_duplicate_ssa_info_at_copy (lhs, sprime);
  continue;
}
}


[gcc r15-1745] tree-optimization/115723 - ICE with .COND_ADD reduction

2024-07-01 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:286cda3461d6f5ce7d911d3f26bd4975ea7ea11d

commit r15-1745-g286cda3461d6f5ce7d911d3f26bd4975ea7ea11d
Author: Richard Biener 
Date:   Mon Jul 1 10:06:55 2024 +0200

tree-optimization/115723 - ICE with .COND_ADD reduction

The following fixes an ICE with a .COND_ADD discovered as reduction
even though its else value isn't the reduction chain link but a
constant.  This would be wrong-code with --disable-checking I think.

PR tree-optimization/115723
* tree-vect-loop.cc (check_reduction_path): For a .COND_ADD
verify the else value also refers to the reduction chain op.

* gcc.dg/vect/pr115723.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115723.c | 25 +
 gcc/tree-vect-loop.cc| 12 
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115723.c 
b/gcc/testsuite/gcc.dg/vect/pr115723.c
new file mode 100644
index 000..b98b29d4870
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115723.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-ffast-math -fno-unsafe-math-optimizations" } */
+
+#include "tree-vect.h"
+
+double __attribute__((noipa))
+foo (double *x, double *y, int n)
+{
+  double res = 0.;
+  for (int i = 0; i < n; ++i)
+if (y[i] > 0.)
+  res += x[i];
+else
+  res = 64.;
+  return res;
+}
+
+double y[16] = { 1., 1., 1., 1., 0., 1., 1., 1.,
+ 1., 1., 1., 1., 1., 1., 1., 1. };
+int main ()
+{
+  check_vect ();
+  if (foo (y, y, 16) != 64. + 11.)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3095ff5ab6b..a64b5082bd1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4163,15 +4163,19 @@ pop:
 
   FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
{
-   /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
-  op1 twice (once as definition, once as else) in the same operation.
-  Allow this.  */
+ /* In case of a COND_OP (mask, op1, op2, op1) reduction we should
+have op1 twice (once as definition, once as else) in the same
+operation.  Enforce this.  */
  if (cond_fn_p && op_use_stmt == use_stmt)
{
  gcall *call = as_a (use_stmt);
  unsigned else_pos
= internal_fn_else_index (internal_fn (op.code));
-
+ if (gimple_call_arg (call, else_pos) != op.ops[opi])
+   {
+ fail = true;
+ break;
+   }
  for (unsigned int j = 0; j < gimple_call_num_args (call); ++j)
{
  if (j == else_pos)


[gcc r15-1743] tree-optimization/115694 - ICE with complex store rewrite

2024-06-30 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:543a5b9da964f821b9e723ed9c93d6cdca464d47

commit r15-1743-g543a5b9da964f821b9e723ed9c93d6cdca464d47
Author: Richard Biener 
Date:   Sun Jun 30 13:07:14 2024 +0200

tree-optimization/115694 - ICE with complex store rewrite

The following adds a missed check when forwprop attempts to rewrite
a complex store.

PR tree-optimization/115694
* tree-ssa-forwprop.cc (pass_forwprop::execute): Check the
store is complex before rewriting it.

* g++.dg/torture/pr115694.C: New testcase.

Diff:
---
 gcc/testsuite/g++.dg/torture/pr115694.C | 13 +
 gcc/tree-ssa-forwprop.cc|  2 ++
 2 files changed, 15 insertions(+)

diff --git a/gcc/testsuite/g++.dg/torture/pr115694.C 
b/gcc/testsuite/g++.dg/torture/pr115694.C
new file mode 100644
index 000..bbce47decf8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr115694.C
@@ -0,0 +1,13 @@
+// { dg-do compile }
+
+_Complex a;
+typedef struct {
+  double a[2];
+} b;
+void c(b);
+void d()
+{
+  _Complex b1 = a;
+  b t = __builtin_bit_cast (b, b1);
+  c(t);
+}
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 05d42ccd3c6..abf71f0d3a0 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -3762,6 +3762,8 @@ pass_forwprop::execute (function *fun)
  && gimple_store_p (use_stmt)
  && !gimple_has_volatile_ops (use_stmt)
  && is_gimple_assign (use_stmt)
+ && (TREE_CODE (TREE_TYPE (gimple_assign_lhs (use_stmt)))
+ == COMPLEX_TYPE)
  && (TREE_CODE (gimple_assign_lhs (use_stmt))
  != TARGET_MEM_REF))
{


[gcc r15-1730] tree-optimization/115701 - fix maybe_duplicate_ssa_info_at_copy

2024-06-30 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:b77f17c5feec9614568bf2dee7f7d811465ee4a5

commit r15-1730-gb77f17c5feec9614568bf2dee7f7d811465ee4a5
Author: Richard Biener 
Date:   Sun Jun 30 11:34:43 2024 +0200

tree-optimization/115701 - fix maybe_duplicate_ssa_info_at_copy

The following restricts copying of points-to info from defs that
might be in regions invoking UB and are never executed.

PR tree-optimization/115701
* tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy):
Only copy info from within the same BB.

* gcc.dg/torture/pr115701.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr115701.c | 22 ++
 gcc/tree-ssanames.cc| 22 --
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr115701.c 
b/gcc/testsuite/gcc.dg/torture/pr115701.c
new file mode 100644
index 000..9b7c34b23d7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr115701.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+/* IPA PTA disables local PTA recompute after IPA.  */
+/* { dg-additional-options "-fipa-pta" } */
+
+int a, c, d;
+static int b;
+int main()
+{
+  int *e = , **f = 
+  while (1) {
+int **g, ***h = 
+if (c)
+  *g = e;
+else if (!b)
+  break;
+*e = **g;
+e = 
+  }
+  if (e != )
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index bb9ed373f36..4f83fcbb517 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -775,25 +775,19 @@ duplicate_ssa_name_range_info (tree name, tree src)
 void
 maybe_duplicate_ssa_info_at_copy (tree dest, tree src)
 {
+  /* While points-to info is flow-insensitive we have to avoid copying
+ info from not executed regions invoking UB to dominating defs.  */
+  if (gimple_bb (SSA_NAME_DEF_STMT (src))
+  != gimple_bb (SSA_NAME_DEF_STMT (dest)))
+return;
+
   if (POINTER_TYPE_P (TREE_TYPE (dest))
   && SSA_NAME_PTR_INFO (dest)
   && ! SSA_NAME_PTR_INFO (src))
-{
-  duplicate_ssa_name_ptr_info (src, SSA_NAME_PTR_INFO (dest));
-  /* Points-to information is cfg insensitive,
-but VRP might record context sensitive alignment
-info, non-nullness, etc.  So reset context sensitive
-info if the two SSA_NAMEs aren't defined in the same
-basic block.  */
-  if (gimple_bb (SSA_NAME_DEF_STMT (src))
- != gimple_bb (SSA_NAME_DEF_STMT (dest)))
-   reset_flow_sensitive_info (src);
-}
+duplicate_ssa_name_ptr_info (src, SSA_NAME_PTR_INFO (dest));
   else if (INTEGRAL_TYPE_P (TREE_TYPE (dest))
   && SSA_NAME_RANGE_INFO (dest)
-  && ! SSA_NAME_RANGE_INFO (src)
-  && (gimple_bb (SSA_NAME_DEF_STMT (src))
-  == gimple_bb (SSA_NAME_DEF_STMT (dest
+  && ! SSA_NAME_RANGE_INFO (src))
 duplicate_ssa_name_range_info (src, dest);
 }


[gcc r15-1729] tree-optimization/115701 - factor out maybe_duplicate_ssa_info_at_copy

2024-06-30 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:b5c64b413fd5bc03a1a8ef86d005892071e42cbe

commit r15-1729-gb5c64b413fd5bc03a1a8ef86d005892071e42cbe
Author: Richard Biener 
Date:   Sun Jun 30 11:28:11 2024 +0200

tree-optimization/115701 - factor out maybe_duplicate_ssa_info_at_copy

The following factors out the code that preserves SSA info of the LHS
of a SSA copy LHS = RHS when LHS is about to be eliminated to RHS.

PR tree-optimization/115701
* tree-ssanames.h (maybe_duplicate_ssa_info_at_copy): Declare.
* tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy): New
function, split out from ...
* tree-ssa-copy.cc (fini_copy_prop): ... here.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): ...
and here.

Diff:
---
 gcc/tree-ssa-copy.cc  | 32 ++--
 gcc/tree-ssa-sccvn.cc | 21 ++---
 gcc/tree-ssanames.cc  | 28 
 gcc/tree-ssanames.h   |  3 ++-
 4 files changed, 34 insertions(+), 50 deletions(-)

diff --git a/gcc/tree-ssa-copy.cc b/gcc/tree-ssa-copy.cc
index bb88472304c..9c9ec47adca 100644
--- a/gcc/tree-ssa-copy.cc
+++ b/gcc/tree-ssa-copy.cc
@@ -527,38 +527,10 @@ fini_copy_prop (void)
  || copy_of[i].value == var)
continue;
 
-  /* In theory the points-to solution of all members of the
- copy chain is their intersection.  For now we do not bother
-to compute this but only make sure we do not lose points-to
-information completely by setting the points-to solution
-of the representative to the first solution we find if
-it doesn't have one already.  */
+  /* Duplicate points-to and range info appropriately.  */
   if (copy_of[i].value != var
  && TREE_CODE (copy_of[i].value) == SSA_NAME)
-   {
- basic_block copy_of_bb
-   = gimple_bb (SSA_NAME_DEF_STMT (copy_of[i].value));
- basic_block var_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
- if (POINTER_TYPE_P (TREE_TYPE (var))
- && SSA_NAME_PTR_INFO (var)
- && !SSA_NAME_PTR_INFO (copy_of[i].value))
-   {
- duplicate_ssa_name_ptr_info (copy_of[i].value,
-  SSA_NAME_PTR_INFO (var));
- /* Points-to information is cfg insensitive,
-but [E]VRP might record context sensitive alignment
-info, non-nullness, etc.  So reset context sensitive
-info if the two SSA_NAMEs aren't defined in the same
-basic block.  */
- if (var_bb != copy_of_bb)
-   reset_flow_sensitive_info (copy_of[i].value);
-   }
- else if (!POINTER_TYPE_P (TREE_TYPE (var))
-  && SSA_NAME_RANGE_INFO (var)
-  && !SSA_NAME_RANGE_INFO (copy_of[i].value)
-  && var_bb == copy_of_bb)
-   duplicate_ssa_name_range_info (copy_of[i].value, var);
-   }
+   maybe_duplicate_ssa_info_at_copy (var, copy_of[i].value);
 }
 
   class copy_folder copy_folder;
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index fbbfa557833..dc377fa16ce 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -6886,27 +6886,10 @@ eliminate_dom_walker::eliminate_stmt (basic_block b, 
gimple_stmt_iterator *gsi)
 
   /* If this now constitutes a copy duplicate points-to
 and range info appropriately.  This is especially
-important for inserted code.  See tree-ssa-copy.cc
-for similar code.  */
+important for inserted code.  */
   if (sprime
  && TREE_CODE (sprime) == SSA_NAME)
-   {
- basic_block sprime_b = gimple_bb (SSA_NAME_DEF_STMT (sprime));
- if (POINTER_TYPE_P (TREE_TYPE (lhs))
- && SSA_NAME_PTR_INFO (lhs)
- && ! SSA_NAME_PTR_INFO (sprime))
-   {
- duplicate_ssa_name_ptr_info (sprime,
-  SSA_NAME_PTR_INFO (lhs));
- if (b != sprime_b)
-   reset_flow_sensitive_info (sprime);
-   }
- else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-  && SSA_NAME_RANGE_INFO (lhs)
-  && ! SSA_NAME_RANGE_INFO (sprime)
-  && b == sprime_b)
-   duplicate_ssa_name_range_info (sprime, lhs);
-   }
+   maybe_duplicate_ssa_info_at_copy (lhs, sprime);
 
   /* Inhibit the use of an inserted PHI on a loop header when
 the address of the memory reference is a simple induction
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 411ea848c49..bb9ed373f36 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -769,6 +769,34 @@ duplicate_ssa_name_range_info (tree name, tree src)
 }
 }
 
+/* For a SSA copy DEST = SRC duplicate SSA info present on DEST to SRC
+   to preserve it in case DEST is eliminated to SRC.  */
+
+void

[gcc r15-1728] Harden SLP reduction support wrt STMT_VINFO_REDUC_IDX

2024-06-30 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:b443d7122ee8013c5af127d3d183a03962967f57

commit r15-1728-gb443d7122ee8013c5af127d3d183a03962967f57
Author: Richard Biener 
Date:   Thu Jun 27 11:36:07 2024 +0200

Harden SLP reduction support wrt STMT_VINFO_REDUC_IDX

The following makes sure that for a SLP reductions all lanes have
the same STMT_VINFO_REDUC_IDX.  Once we move that info and can adjust
it we can implement swapping.  It also makes the existing protection
against operand swapping trigger for all stmts participating in a
reduction, not just the final one marked as reduction-def.

* tree-vect-slp.cc (vect_build_slp_tree_1): Compare
STMT_VINFO_REDUC_IDX.
(vect_build_slp_tree_2): Prevent operand swapping for
all stmts participating in a reduction.

Diff:
---
 gcc/tree-vect-slp.cc | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index dd9017e5b3a..48e0f9d2705 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1072,6 +1072,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   stmt_vec_info first_load = NULL, prev_first_load = NULL;
   bool first_stmt_ldst_p = false, ldst_p = false;
   bool first_stmt_phi_p = false, phi_p = false;
+  int first_reduc_idx = -1;
   bool maybe_soft_fail = false;
   tree soft_fail_nunits_vectype = NULL_TREE;
 
@@ -1204,6 +1205,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  first_stmt_code = rhs_code;
  first_stmt_ldst_p = ldst_p;
  first_stmt_phi_p = phi_p;
+ first_reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
 
  /* Shift arguments should be equal in all the packed stmts for a
 vector shift with scalar shift operand.  */
@@ -1267,6 +1269,24 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
}
   else
{
+ if (first_reduc_idx != STMT_VINFO_REDUC_IDX (stmt_info)
+ /* For SLP reduction groups the index isn't necessarily
+uniform but only that of the first stmt matters.  */
+ && !(first_reduc_idx != -1
+  && STMT_VINFO_REDUC_IDX (stmt_info) != -1
+  && REDUC_GROUP_FIRST_ELEMENT (stmt_info)))
+   {
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "Build SLP failed: different reduc_idx "
+  "%d instead of %d in %G",
+  STMT_VINFO_REDUC_IDX (stmt_info),
+  first_reduc_idx, stmt);
+   }
+ /* Mismatch.  */
+ continue;
+   }
  if (first_stmt_code != rhs_code
  && alt_stmt_code == ERROR_MARK)
alt_stmt_code = rhs_code;
@@ -2530,8 +2550,7 @@ out:
  && oprnds_info[1]->first_dt == vect_internal_def
  && is_gimple_assign (stmt_info->stmt)
  /* Swapping operands for reductions breaks assumptions later on.  */
- && STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def
- && STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def)
+ && STMT_VINFO_REDUC_IDX (stmt_info) == -1)
{
  /* See whether we can swap the matching or the non-matching
 stmt operands.  */


[gcc r15-1709] tree-optimization/115652 - more fixing of the fix

2024-06-28 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:ff6e8b7f09712bd7ddfcd2830b286421f23abef9

commit r15-1709-gff6e8b7f09712bd7ddfcd2830b286421f23abef9
Author: Richard Biener 
Date:   Fri Jun 28 13:29:21 2024 +0200

tree-optimization/115652 - more fixing of the fix

The following addresses the corner case of an outer loop with an empty
header where we end up asking for the BB of a NULL stmt by
special-casing this case.

PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Handle the case
where the outer loop header block is empty.

Diff:
---
 gcc/tree-vect-slp.cc | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 174b4800fa9..dd9017e5b3a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9750,8 +9750,15 @@ vect_schedule_slp_node (vec_info *vinfo,
  {
gimple_stmt_iterator si2
  = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
-   if (last_stmt != *si2
-   && vect_stmt_dominates_stmt_p (last_stmt, *si2))
+   if ((gsi_end_p (si2)
+&& (LOOP_VINFO_LOOP (loop_vinfo)->header
+!= gimple_bb (last_stmt))
+&& dominated_by_p (CDI_DOMINATORS,
+   LOOP_VINFO_LOOP (loop_vinfo)->header,
+   gimple_bb (last_stmt)))
+   || (!gsi_end_p (si2)
+   && last_stmt != *si2
+   && vect_stmt_dominates_stmt_p (last_stmt, *si2)))
  si = si2;
  }
}


[gcc r15-1706] tree-optimization/115640 - outer loop vect with inner SLP permute

2024-06-28 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:0192341a07b8ea30f631cf4afdc6fcf3fa7ce838

commit r15-1706-g0192341a07b8ea30f631cf4afdc6fcf3fa7ce838
Author: Richard Biener 
Date:   Wed Jun 26 14:07:51 2024 +0200

tree-optimization/115640 - outer loop vect with inner SLP permute

The following fixes wrong-code when using outer loop vectorization
and an inner loop SLP access with permutation.  A wrong adjustment
to the IV increment is then applied on GCN.

PR tree-optimization/115640
* tree-vect-stmts.cc (vectorizable_load): With an inner
loop SLP access to not apply a gap adjustment.

Diff:
---
 gcc/tree-vect-stmts.cc | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 0b0761bf799..7b889f31645 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -10512,9 +10512,14 @@ vectorizable_load (vec_info *vinfo,
 whole group, not only the number of vector stmts the
 permutation result fits in.  */
  unsigned scalar_lanes = SLP_TREE_LANES (slp_node);
- if (slp_perm
- && (group_size != scalar_lanes 
- || !multiple_p (nunits, group_size)))
+ if (nested_in_vect_loop)
+   /* We do not support grouped accesses in a nested loop,
+  instead the access is contiguous but it might be
+  permuted.  No gap adjustment is needed though.  */
+   vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+ else if (slp_perm
+  && (group_size != scalar_lanes
+  || !multiple_p (nunits, group_size)))
{
  /* We don't yet generate such SLP_TREE_LOAD_PERMUTATIONs for
 variable VF; see vect_transform_slp_perm_load.  */


Re: About the effect of "O0" on inlining into a function.

2024-06-27 Thread Richard Biener via Gcc



> Am 27.06.2024 um 19:43 schrieb Iain Sandoe :
> 
> 
>> On 27 Jun 2024, at 14:51, Iain Sandoe  wrote:
>> 
>> If I declare a function __attribute__((noipa, optimize (“-O0”))), I was 
>> kinda expecting that it would not be optimized at all ..
>> 
>> however it does not seem to prevent functions called by it from being 
>> inlined into its body ..
>> 
>> am I missing some additional constraint that should be added?
>> 
>> (I explicitly want to avoid called functions being inlined into the body, 
>> but cannot mark _those_ functions as noinline)
> 
> Additional:  If I compile the entire code “O0” then all behaves as expected.
> 
> The issue seems to be when compiing (say) O2 and a function has a local 
> optimisation set lower (O0) ..
> perhaps this is a target problem ..
> although looking at say tree-ssa-ccp.cc I do not see any gating on the 
> optimisation level - which I guess suggests once it’s selected in the stack 
> .. it’s going to run…
> 
> any insights would be welcome.

It might be that we do not honor -O0 in the caller this way during IPA 
inlining.  I would guess it correctly disables early inlining into it though.  
It sounds like a bug to me.

Richard 

> Iain
> 
> 


Re: consistent unspecified pointer comparison

2024-06-27 Thread Richard Biener via Gcc



> Am 27.06.2024 um 20:55 schrieb Jason Merrill :
> 
> On Thu, Jun 27, 2024 at 2:38 PM Richard Biener
>  wrote:
 Am 27.06.2024 um 19:04 schrieb Jason Merrill via Gcc :
>>> 
>>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2434r1.html
>>> proposes to require that repeated unspecified comparisons be
>>> self-consistent, which does not match current behavior in either GCC
>>> or Clang.  The argument is that the current allowance to be
>>> inconsistent is user-unfriendly and does not enable significant
>>> optimizations.  Any feedback about this?
>> 
>> Can you give an example of an unspecified comparison?  I think the only way 
>> to do what the paper wants is for the implementation to make the comparison 
>> specified (without the need to document it).  Is the self-consistency 
>> required only within some specified scope (a single expression?) or even 
>> across TUs (which might be compiled by different compilers or compiler 
>> versions)?
>> 
>> So my feedback would be to make the comparison well-defined.
>> 
>> I’m still curious about which ones are unspecified now.
> 
> https://eel.is/c++draft/expr#eq-3.1
> "If one pointer represents the address of a complete object, and
> another pointer represents the address one past the last element of a
> different complete object, the result of the comparison is
> unspecified."
> 
> This is historically unspecified primarily because we don't want to
> force a particular layout of multiple variables.
> 
> See the example under "consequences for implementations" in the paper.

And how do we currently not have consistent behavior?  I don’t think we 
constantly fold such  comparisons in any way but we could take advantage of the 
unspecifiedness in more complex situations (though I can’t come up with one off 
my head).

Richard 

> Jason
> 


Re: consistent unspecified pointer comparison

2024-06-27 Thread Richard Biener via Gcc



> Am 27.06.2024 um 19:04 schrieb Jason Merrill via Gcc :
> 
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2434r1.html
> proposes to require that repeated unspecified comparisons be
> self-consistent, which does not match current behavior in either GCC
> or Clang.  The argument is that the current allowance to be
> inconsistent is user-unfriendly and does not enable significant
> optimizations.  Any feedback about this?

Can you give an example of an unspecified comparison?  I think the only way to 
do what the paper wants is for the implementation to make the comparison 
specified (without the need to document it).  Is the self-consistency required 
only within some specified scope (a single expression?) or even across TUs 
(which might be compiled by different compilers or compiler versions)?

So my feedback would be to make the comparison well-defined.

I’m still curious about which ones are unspecified now.

Richard 

> 
> Jason
> 


[gcc r15-1694] tree-optimization/115669 - fix SLP reduction association

2024-06-27 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:7886830bb45c4f5dca0496d4deae9a45204d78f5

commit r15-1694-g7886830bb45c4f5dca0496d4deae9a45204d78f5
Author: Richard Biener 
Date:   Thu Jun 27 11:26:08 2024 +0200

tree-optimization/115669 - fix SLP reduction association

The following avoids associating a reduction path as that might
get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order.
This is a latent issue with SLP reductions but now easily exposed
as we're doing single-lane SLP reductions.

When we achieved SLP only we can move and update this meta-data.

PR tree-optimization/115669
* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate
chains that participate in a reduction.

* gcc.dg/vect/pr115669.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115669.c | 22 ++
 gcc/tree-vect-slp.cc |  3 +++
 2 files changed, 25 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115669.c 
b/gcc/testsuite/gcc.dg/vect/pr115669.c
new file mode 100644
index 000..361a17a64e6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115669.c
@@ -0,0 +1,22 @@
+/* { dg-additional-options "-fwrapv" } */
+
+#include "tree-vect.h"
+
+int a = 10;
+unsigned b;
+long long c[100];
+int foo()
+{
+  long long *d = c;
+  for (short e = 0; e < a; e++)
+b += ~(d ? d[e] : 0);
+  return b;
+}
+
+int main()
+{
+  check_vect ();
+  if (foo () != -10)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 1252b613125..174b4800fa9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2069,6 +2069,9 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
   else if (is_a  (vinfo)
   /* ???  We don't handle !vect_internal_def defs below.  */
   && STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def
+  /* ???  Do not associate a reduction, this will wreck REDUC_IDX
+ mapping as long as that exists on the stmt_info level.  */
+  && STMT_VINFO_REDUC_IDX (stmt_info) == -1
   && is_gimple_assign (stmt_info->stmt)
   && (associative_tree_code (gimple_assign_rhs_code (stmt_info->stmt))
   || gimple_assign_rhs_code (stmt_info->stmt) == MINUS_EXPR)


[gcc r15-1670] tree-optimization/115652 - amend last fix

2024-06-26 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:c7cb0dd94589ab501bca27f93641b4074e5a2e99

commit r15-1670-gc7cb0dd94589ab501bca27f93641b4074e5a2e99
Author: Richard Biener 
Date:   Wed Jun 26 19:23:26 2024 +0200

tree-optimization/115652 - amend last fix

The previous fix breaks in the degenerate case when the discovered
last_stmt is equal to the first stmt in the block since then we
undo a required stmt advancement.

PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Only insert
at the start of the block if that strictly dominates
the discovered dependent stmt.

Diff:
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 1f5b3fccf41..1252b613125 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9747,7 +9747,8 @@ vect_schedule_slp_node (vec_info *vinfo,
  {
gimple_stmt_iterator si2
  = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
-   if (vect_stmt_dominates_stmt_p (last_stmt, *si2))
+   if (last_stmt != *si2
+   && vect_stmt_dominates_stmt_p (last_stmt, *si2))
  si = si2;
  }
}


[gcc r15-1669] tree-optimization/115493 - complete previous fix

2024-06-26 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:b7ba0670a768e76e87e04cfd6a72c28c35333b54

commit r15-1669-gb7ba0670a768e76e87e04cfd6a72c28c35333b54
Author: Richard Biener 
Date:   Wed Jun 26 19:11:04 2024 +0200

tree-optimization/115493 - complete previous fix

The following fixes the 2nd occurance of new_temp missed with the
previous fix.

PR tree-optimization/115493
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Use
first scalar result.

Diff:
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 347dac97e49..6f32867f85a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6849,7 +6849,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  tree initial_def = reduc_info->reduc_initial_values[0];
  tree tmp = make_ssa_name (new_scalar_dest);
  epilog_stmt = gimple_build_assign (tmp, COND_EXPR, zcompare,
-initial_def, new_temp);
+initial_def, scalar_results[0]);
  gsi_insert_before (_gsi, epilog_stmt, GSI_SAME_STMT);
  scalar_results[0] = tmp;
}


[gcc r15-1662] tree-optimization/115629 - missed tail merging

2024-06-26 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:629257bcb81434117f1e9c68479032563176dc0c

commit r15-1662-g629257bcb81434117f1e9c68479032563176dc0c
Author: Richard Biener 
Date:   Tue Jun 25 14:04:31 2024 +0200

tree-optimization/115629 - missed tail merging

The following fixes a missed tail-merging observed for the testcase
in PR115629.  The issue is that when deps_ok_for_redirect doesn't
compute both would be valid prevailing blocks it rejects the merge.
The following instead makes sure to record the working block as
prevailing.  Also stmt comparison fails for indirect references
and is not handling memory references thoroughly, failing to unify
array indices and pointers indirected.  The following attempts to
fix this.

PR tree-optimization/115629
* tree-ssa-tail-merge.cc (gimple_equal_p): Handle
memory references better.
(deps_ok_for_redirect): Handle the case not both blocks
are considered a valid prevailing block.

* gcc.dg/tree-ssa/tail-merge-1.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/tail-merge-1.c | 14 ++
 gcc/tree-ssa-tail-merge.cc   | 69 
 2 files changed, 75 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tail-merge-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tail-merge-1.c
new file mode 100644
index 000..e5670c33ba3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tail-merge-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dce4" } */
+
+void foo1 (int *restrict a, int *restrict b, int *restrict c,
+  int *restrict d, int *restrict res, int n)
+{
+  for (int i = 0; i < n; i++)
+res[i] = a[i] ? b[i] : (c[i] ? b[i] : d[i]);
+}
+
+/* After tail-merging (run during PRE) we should end up merging the two
+   blocks dereferencing 'b', ending up with two iftmp assigns and the
+   iftmp PHI def.  */
+/* { dg-final { scan-tree-dump-times "iftmp\[^\r\n\]* = " 3 "dce4" } } */
diff --git a/gcc/tree-ssa-tail-merge.cc b/gcc/tree-ssa-tail-merge.cc
index c8b4a79294d..27e7c6a37b2 100644
--- a/gcc/tree-ssa-tail-merge.cc
+++ b/gcc/tree-ssa-tail-merge.cc
@@ -1188,7 +1188,52 @@ gimple_equal_p (same_succ *same_succ, gimple *s1, gimple 
*s2)
{
  t1 = gimple_arg (s1, i);
  t2 = gimple_arg (s2, i);
- if (!gimple_operand_equal_value_p (t1, t2))
+ while (handled_component_p (t1) && handled_component_p (t2))
+   {
+ if (TREE_CODE (t1) != TREE_CODE (t2)
+ || TREE_THIS_VOLATILE (t1) != TREE_THIS_VOLATILE (t2))
+   return false;
+ switch (TREE_CODE (t1))
+   {
+   case COMPONENT_REF:
+ if (TREE_OPERAND (t1, 1) != TREE_OPERAND (t2, 1)
+ || !gimple_operand_equal_value_p (TREE_OPERAND (t1, 2),
+   TREE_OPERAND (t2, 2)))
+   return false;
+ break;
+   case ARRAY_REF:
+   case ARRAY_RANGE_REF:
+ if (!gimple_operand_equal_value_p (TREE_OPERAND (t1, 3),
+TREE_OPERAND (t2, 3)))
+   return false;
+ /* Fallthru.  */
+   case BIT_FIELD_REF:
+ if (!gimple_operand_equal_value_p (TREE_OPERAND (t1, 1),
+TREE_OPERAND (t2, 1))
+ || !gimple_operand_equal_value_p (TREE_OPERAND (t1, 2),
+   TREE_OPERAND (t2, 2)))
+   return false;
+ break;
+   case REALPART_EXPR:
+   case IMAGPART_EXPR:
+   case VIEW_CONVERT_EXPR:
+ break;
+   default:
+   gcc_unreachable ();
+   }
+ t1 = TREE_OPERAND (t1, 0);
+ t2 = TREE_OPERAND (t2, 0);
+   }
+ if (TREE_CODE (t1) == MEM_REF && TREE_CODE (t2) == MEM_REF)
+   {
+ if (TREE_THIS_VOLATILE (t1) != TREE_THIS_VOLATILE (t2)
+ || TYPE_ALIGN (TREE_TYPE (t1)) != TYPE_ALIGN (TREE_TYPE (t2))
+ || !gimple_operand_equal_value_p (TREE_OPERAND (t1, 0),
+   TREE_OPERAND (t2, 0))
+ || TREE_OPERAND (t1, 1) != TREE_OPERAND (t2, 1))
+   return false;
+   }
+ else if (!gimple_operand_equal_value_p (t1, t2))
return false;
}
   return true;
@@ -1462,16 +1507,24 @@ deps_ok_for_redirect_from_bb_to_bb (basic_block from, 
basic_block to)
replacement are dominates by their defs.  */
 
 static bool
-deps_ok_for_redirect (basic_block bb1, basic_block bb2)
+deps_ok_for_redirect (basic_block , basic_block )
 {
-  if (BB_CLUSTER (bb1) != NULL)
-bb1 = BB_CLUSTER (bb1)->rep_bb;
+ 

[gcc r15-1653] tree-optimization/115652 - adjust insertion gsi for SLP

2024-06-26 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:f80db5495d5f8455b3003951727eb6c8dc67d81d

commit r15-1653-gf80db5495d5f8455b3003951727eb6c8dc67d81d
Author: Richard Biener 
Date:   Wed Jun 26 09:25:27 2024 +0200

tree-optimization/115652 - adjust insertion gsi for SLP

The following adjusts how SLP computes the insertion location.  In
particular it advanced the insert iterator of the found last_stmt.
The vectorizer will later insert stmts _before_ it.  But we also
have the constraint that possibly masked ops may not be scheduled
outside of the loop and as we do not model the loop mask in the
SLP graph we have to adjust for that.  The following moves this
to after the advance since it isn't compatible with that as the
current GIMPLE_COND exception shows.  The PR is about in-order
reduction vectorization which also isn't happy when that's the
very first stmt.

PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Advance the
iterator based on last_stmt only for vector defs.

Diff:
---
 gcc/tree-vect-slp.cc | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b47b7e8c979..1f5b3fccf41 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9629,16 +9629,6 @@ vect_schedule_slp_node (vec_info *vinfo,
   /* Emit other stmts after the children vectorized defs which is
 earliest possible.  */
   gimple *last_stmt = NULL;
-  if (auto loop_vinfo = dyn_cast  (vinfo))
-   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
-   || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
- {
-   /* But avoid scheduling internal defs outside of the loop when
-  we might have only implicitly tracked loop mask/len defs.  */
-   gimple_stmt_iterator si
- = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
-   last_stmt = *si;
- }
   bool seen_vector_def = false;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
@@ -9747,12 +9737,19 @@ vect_schedule_slp_node (vec_info *vinfo,
   else
{
  si = gsi_for_stmt (last_stmt);
- /* When we're getting gsi_after_labels from the starting
-condition of a fully masked/len loop avoid insertion
-after a GIMPLE_COND that can appear as the only header
-stmt with early break vectorization.  */
- if (gimple_code (last_stmt) != GIMPLE_COND)
-   gsi_next ();
+ gsi_next ();
+
+ /* Avoid scheduling internal defs outside of the loop when
+we might have only implicitly tracked loop mask/len defs.  */
+ if (auto loop_vinfo = dyn_cast  (vinfo))
+   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+   || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+ {
+   gimple_stmt_iterator si2
+ = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
+   if (vect_stmt_dominates_stmt_p (last_stmt, *si2))
+ si = si2;
+ }
}
 }


[gcc r15-1643] tree-optimization/115646 - ICE with pow shrink-wrapping from bitfield

2024-06-26 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:453b1d291d1a0f89087ad91cf6b1bed1ec68eff3

commit r15-1643-g453b1d291d1a0f89087ad91cf6b1bed1ec68eff3
Author: Richard Biener 
Date:   Tue Jun 25 16:13:02 2024 +0200

tree-optimization/115646 - ICE with pow shrink-wrapping from bitfield

The following makes analysis and transform agree on constraints.

PR tree-optimization/115646
* tree-call-cdce.cc (check_pow): Check for bit_sz values
as allowed by transform.

* gcc.dg/pr115646.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/pr115646.c | 13 +
 gcc/tree-call-cdce.cc   |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr115646.c b/gcc/testsuite/gcc.dg/pr115646.c
new file mode 100644
index 000..24bc1e4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115646.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+extern double pow(double x, double y);
+
+struct S {
+unsigned int a : 3, b : 8, c : 21;
+};
+
+void foo (struct S *p)
+{
+  pow (p->c, 42);
+}
diff --git a/gcc/tree-call-cdce.cc b/gcc/tree-call-cdce.cc
index 7f67a0b2dc6..befe6acf178 100644
--- a/gcc/tree-call-cdce.cc
+++ b/gcc/tree-call-cdce.cc
@@ -260,7 +260,7 @@ check_pow (gcall *pow_call)
   /* If the type of the base is too wide,
  the resulting shrink wrapping condition
 will be too conservative.  */
-  if (bit_sz > MAX_BASE_INT_BIT_SIZE)
+  if (bit_sz != 8 && bit_sz != 16 && bit_sz != MAX_BASE_INT_BIT_SIZE)
 return false;
 
   return true;


Re: Straw poll on shifts with out of range operands

2024-06-26 Thread Richard Biener via Gcc
On Wed, Jun 26, 2024 at 4:59 AM Jeff Law via Gcc  wrote:
>
>
>
> On 6/25/24 8:44 PM, Andrew Pinski via Gcc wrote:
> > I am in the middle of improving the isolation path pass for shifts
> > with out of range operands.
> > There are 3 options we could do really:
> > 1) isolate the path to __builtin_unreachable
> > 2) isolate the path to __builtin_trap
> >  This is what is currently done for null pointer and divide by zero
> > 3) isolate the path and turn the shift into zero constant
> > This happens currently for explicit use in both match (in many
> > cases) and VRP for others.

How is isolation if we do 3) any useful?  IIRC the path isolation pass
would only look for literal out-of-range shifts?  Or do you plan to use
range info?  If we do 3) why not let range folding deal with this then?

> >
> > All 3 are not hard to implement.
> > This comes up in the context of https://gcc.gnu.org/PR115636 where the
> > original requestor thinks it should be #3 but I suspect they didn't
> > realize it was undefined behavior then.
> > 2 seems the best for user experience.
> > 1 seems like the best for optimizations.
> > 3 is more in line with how other parts of the compiler handle it.
> >
> > So the question I have is which one should we go for? (is there
> > another option I missed besides not do anything)
> There was a time when we were thinking about having a knob that would
> allow one to select which of the 3 approaches makes the most sense given
> the expected execution environment.

Since we do 3) already elsewhere I'd say we should do that by default and
have the other options behind a command-line switch - though we already
have UBSAN for that and that is going to be much more reliable than the late
path isolation done after folding catched most cases via 3)?

> While I prefer #2, some have (reasonably) argued that it's not
> appropriate behavior for code in a library.
>
> I'm not a fan of #1 because it allows unpredictable code execution.
> Essentially you just fall into whatever code was after the bogus shift
> in the executable and hope for the best.  One day this is going to bite
> us hard from a security standpoint.
>
> #3 isn't great, but it's not terrible either.
>
> Jeff


[gcc r15-1612] GORI cleanups

2024-06-25 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:3587bfae391616d155de0c9cbe98206634f3ed6b

commit r15-1612-g3587bfae391616d155de0c9cbe98206634f3ed6b
Author: Richard Biener 
Date:   Tue Jun 25 15:41:57 2024 +0200

GORI cleanups

The following replaces conditional is_export_p calls as is_export_p
handles a NULL bb itself.

* gimple-range-gori.cc (gori_compute::may_recompute_p):
Call is_export_p with NULL bb.

Diff:
---
 gcc/gimple-range-gori.cc | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 275283a424f..a31e3be65f7 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -1332,18 +1332,14 @@ gori_compute::may_recompute_p (tree name, basic_block 
bb, int depth)
  gcc_checking_assert (depth >= 1);
}
 
-  bool res = (bb ? m_map.is_export_p (dep1, bb)
-: m_map.is_export_p (dep1));
+  bool res = m_map.is_export_p (dep1, bb);
   if (res || depth <= 1)
return res;
   // Check another level of recomputation.
   return may_recompute_p (dep1, bb, --depth);
 }
   // Two dependencies terminate the depth of the search.
-  if (bb)
-return m_map.is_export_p (dep1, bb) || m_map.is_export_p (dep2, bb);
-  else
-return m_map.is_export_p (dep1) || m_map.is_export_p (dep2);
+  return m_map.is_export_p (dep1, bb) || m_map.is_export_p (dep2, bb);
 }
 
 // Return TRUE if NAME can be recomputed on edge E.  If any direct dependent


Re: [RFC] MAINTAINERS: require a BZ account field

2024-06-25 Thread Richard Biener via Gcc
On Tue, Jun 25, 2024 at 12:36 AM Arsen Arsenović via Gcc
 wrote:
>
> Hi,
>
> Sam James via Gcc  writes:
>
> > Hi!
> >
> > This comes up in #gcc on IRC every so often, so finally
> > writing an RFC.
> >
> > What?
> > ---
> >
> > I propose that MAINTAINERS be modified to be of the form,
> > adding an extra field for their GCC/sourceware account:
> >> sourceware account>
> >   Joe Bloggsjoeblo...@example.com  jblo...@gcc.gnu.org
> >
> > Further, that the field must not be blank (-> must have a BZ account;
> > there were/are some without at all)!
> >
> > Why?
> > ---
> >
> > 1) This is tied to whether or not people should use their committer email
> > on Bugzilla or a personal email. A lot of people don't seem to use their
> > committer email (-> no permissions) and end up not closing bugs, so
> > pinskia (and often myself these days) end up doing it for them.
> >
> > 2) It's standard practice to wish to CC the committer of a bisect result
> > - or to CC someone who you know wrote patches on a subject area. Doing
> > this on Bugzilla is challenging when there's no map between committer
> > <-> BZ account.
> >
> > Specifically, there are folks who have git committer+author as
> > joeblo...@example.com (or maybe even coold...@example.com) where the
> > local part of the address has *no relation* to their GCC/sw account,
> > so finding who to CC is difficult without e.g. trawling through gcc-cvs
> > mails or asking overseers for help.
>
> I was also proposing (and would like to re-air that here) enforcing that
> the committer field of each commit is a (valid) @gcc.gnu.org email.
> This can be configured repo-locally via:
>
>   $ git config committer.email @gcc.gnu.org
>
> Git has supported this since 39ab4d0951ba64edcfae7809740715991b44fa6d
> (v2.22.0).
>
> This makes a permanent association of each commit to its authors
> Sourceware account.

I'd welcome this - care to create a patch for contrib/gcc-git-customization.sh?

> This should not inhibit pushes, as the committer should be a reflection
> of who /applied/ a patch, and anyone applying a patch that can also push
> has a Sourceware account.  It also should not inhibit any workflow, as
> it should be automatic.
>
> > Summary
> > ---
> >
> > TL;DR: The proposal is:
> >
> > 1) MAINTAINERS should list a field containing either the gcc.gnu.org
> > email in full, or their gcc username (bikeshedding semi-welcome);
> >
> > 2) It should become a requirement that to be in MAINTAINERS, one must
> > possess a Bugzilla account (ideally using their gcc.gnu.org email).
>
> --
> Arsen Arsenović


[gcc r15-1583] tree-optimization/115602 - SLP CSE results in cycles

2024-06-24 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:c43c74f6ec795a586388de7abfdd20a0040f6f16

commit r15-1583-gc43c74f6ec795a586388de7abfdd20a0040f6f16
Author: Richard Biener 
Date:   Mon Jun 24 09:52:39 2024 +0200

tree-optimization/115602 - SLP CSE results in cycles

The following prevents SLP CSE to create new cycles which happened
because of a 1:1 permute node being present where its child was then
CSEd to the permute node.  Fixed by making a node only available to
CSE to after recursing.

PR tree-optimization/115602
* tree-vect-slp.cc (vect_cse_slp_nodes): Delay populating the
bst-map to avoid cycles.

* gcc.dg/vect/pr115602.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115602.c | 27 +++
 gcc/tree-vect-slp.cc | 33 +
 2 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115602.c 
b/gcc/testsuite/gcc.dg/vect/pr115602.c
new file mode 100644
index 000..9a208d1d950
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115602.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+
+typedef struct {
+  double x, y;
+} pointf;
+struct {
+  pointf focus;
+  double zoom;
+  pointf devscale;
+  char button;
+  pointf oldpointer;
+} gvevent_motion_job;
+char gvevent_motion_job_4;
+double gvevent_motion_pointer_1, gvevent_motion_pointer_0;
+void gvevent_motion() {
+  double dx = (gvevent_motion_pointer_0 - gvevent_motion_job.oldpointer.x) /
+  gvevent_motion_job.devscale.x,
+ dy = (gvevent_motion_pointer_1 - gvevent_motion_job.oldpointer.y) /
+  gvevent_motion_job.devscale.y;
+  if (dx && dy < .0001)
+return;
+  switch (gvevent_motion_job_4)
+  case 2: {
+gvevent_motion_job.focus.x -= dy / gvevent_motion_job.zoom;
+gvevent_motion_job.focus.y += dx / gvevent_motion_job.zoom;
+  }
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e84aeabef94..b47b7e8c979 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6079,35 +6079,44 @@ vect_optimize_slp_pass::run ()
 static void
 vect_cse_slp_nodes (scalar_stmts_to_slp_tree_map_t *bst_map, slp_tree& node)
 {
+  bool put_p = false;
   if (SLP_TREE_DEF_TYPE (node) == vect_internal_def
   /* Besides some VEC_PERM_EXPR, two-operator nodes also
 lack scalar stmts and thus CSE doesn't work via bst_map.  Ideally
 we'd have sth that works for all internal and external nodes.  */
   && !SLP_TREE_SCALAR_STMTS (node).is_empty ())
 {
-  if (slp_tree *leader = bst_map->get (SLP_TREE_SCALAR_STMTS (node)))
+  slp_tree *leader = bst_map->get (SLP_TREE_SCALAR_STMTS (node));
+  if (leader)
{
- if (*leader != node)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"re-using SLP tree %p for %p\n",
-(void *)*leader, (void *)node);
- vect_free_slp_tree (node);
- (*leader)->refcnt += 1;
- node = *leader;
-   }
+ /* We've visited this node already.  */
+ if (!*leader || *leader == node)
+   return;
+
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"re-using SLP tree %p for %p\n",
+(void *)*leader, (void *)node);
+ vect_free_slp_tree (node);
+ (*leader)->refcnt += 1;
+ node = *leader;
  return;
}
 
-  bst_map->put (SLP_TREE_SCALAR_STMTS (node).copy (), node);
+  /* Avoid creating a cycle by populating the map only after recursion.  */
+  bst_map->put (SLP_TREE_SCALAR_STMTS (node).copy (), nullptr);
   node->refcnt += 1;
+  put_p = true;
   /* And recurse.  */
 }
 
   for (slp_tree  : SLP_TREE_CHILDREN (node))
 if (child)
   vect_cse_slp_nodes (bst_map, child);
+
+  /* Now record the node for CSE in other siblings.  */
+  if (put_p)
+bst_map->put (SLP_TREE_SCALAR_STMTS (node).copy (), node);
 }
 
 /* Optimize the SLP graph of VINFO.  */


[gcc r15-1582] tree-optimization/115528 - fix vect alignment analysis for outer loop vect

2024-06-24 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:2f83ea87ee328d337f87d4430861221be9babe1e

commit r15-1582-g2f83ea87ee328d337f87d4430861221be9babe1e
Author: Richard Biener 
Date:   Fri Jun 21 13:19:26 2024 +0200

tree-optimization/115528 - fix vect alignment analysis for outer loop vect

For outer loop vectorization of a data reference in the inner loop
we have to look at both steps to see if they preserve alignment.

What is special for this testcase is that the outer loop step is
one element but the inner loop step four and that we now use SLP
and the vectorization factor is one.

PR tree-optimization/115528
* tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
Make sure to look at both the inner and outer loop step
behavior.

* gfortran.dg/vect/pr115528.f: New testcase.

Diff:
---
 gcc/testsuite/gfortran.dg/vect/pr115528.f | 27 +++
 gcc/tree-vect-data-refs.cc| 57 ---
 2 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/vect/pr115528.f 
b/gcc/testsuite/gfortran.dg/vect/pr115528.f
new file mode 100644
index 000..764a4b92b3e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/vect/pr115528.f
@@ -0,0 +1,27 @@
+! { dg-additional-options "-fno-inline" }
+
+  subroutine init(COEF1,FORM1,AA)
+  double precision COEF1,X
+  double complex FORM1
+  double precision AA(4,4)
+  COEF1=0
+  FORM1=0
+  AA=0
+  end
+  subroutine curr(HADCUR)
+  double precision COEF1
+  double complex HADCUR(4),FORM1
+  double precision AA(4,4)
+  call init(COEF1,FORM1,AA)
+  do i = 1,4
+ do j = 1,4
+HADCUR(I)=
+ $ HADCUR(I)+CMPLX(COEF1)*FORM1*AA(I,J)
+ end do
+  end do
+  end
+  program test
+double complex HADCUR(4)
+hadcur=0
+call curr(hadcur)
+  end
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ae237407672..959e127c385 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -1356,42 +1356,43 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
dr_vec_info *dr_info,
   step_preserves_misalignment_p = true;
 }
 
-  /* In case the dataref is in an inner-loop of the loop that is being
- vectorized (LOOP), we use the base and misalignment information
- relative to the outer-loop (LOOP).  This is ok only if the misalignment
- stays the same throughout the execution of the inner-loop, which is why
- we have to check that the stride of the dataref in the inner-loop evenly
- divides by the vector alignment.  */
-  else if (nested_in_vect_loop_p (loop, stmt_info))
-{
-  step_preserves_misalignment_p
-   = (DR_STEP_ALIGNMENT (dr_info->dr) % vect_align_c) == 0;
-
-  if (dump_enabled_p ())
-   {
- if (step_preserves_misalignment_p)
-   dump_printf_loc (MSG_NOTE, vect_location,
-"inner step divides the vector alignment.\n");
- else
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"inner step doesn't divide the vector"
-" alignment.\n");
-   }
-}
-
-  /* Similarly we can only use base and misalignment information relative to
- an innermost loop if the misalignment stays the same throughout the
- execution of the loop.  As above, this is the case if the stride of
- the dataref evenly divides by the alignment.  */
   else
 {
+  /* We can only use base and misalignment information relative to
+an innermost loop if the misalignment stays the same throughout the
+execution of the loop.  As above, this is the case if the stride of
+the dataref evenly divides by the alignment.  */
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   step_preserves_misalignment_p
-   = multiple_p (DR_STEP_ALIGNMENT (dr_info->dr) * vf, vect_align_c);
+   = multiple_p (drb->step_alignment * vf, vect_align_c);
 
   if (!step_preserves_misalignment_p && dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "step doesn't divide the vector alignment.\n");
+
+  /* In case the dataref is in an inner-loop of the loop that is being
+vectorized (LOOP), we use the base and misalignment information
+relative to the outer-loop (LOOP).  This is ok only if the
+misalignment stays the same throughout the execution of the
+inner-loop, which is why we have to check that the stride of the
+dataref in the inner-loop evenly divides by the vector alignment.  */
+  if (step_preserves_misalignment_p
+ && nested_in_vect_loop_p (loop, stmt_info))
+   {
+ step_preserves_misalignment_p
+   = (DR_STEP_ALIGNMENT (dr_info->dr) % vect_align_c) == 0;
+
+ if 

[gcc r15-1577] tree-optimization/115599 - reassoc qsort comparator issue

2024-06-24 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:ae13af26060eb686418ea9c9d455cd665049402d

commit r15-1577-gae13af26060eb686418ea9c9d455cd665049402d
Author: Richard Biener 
Date:   Sun Jun 23 14:37:53 2024 +0200

tree-optimization/115599 - reassoc qsort comparator issue

The compare_repeat_factors comparator fails qsort checking eventually
because it uses rf2->rank - rf1->rank to compare unsigned numbers
which causes issues for ranks that interpret negative as signed.

Fixed by re-writing the obvious way.  I've also fixed the count
comparison which suffers from truncation as count is 64bit signed
while the comparator result is 32bit int (that's a lot less likely
to hit in practice though).

The testcase from the PR is too large to include.

PR tree-optimization/115599
* tree-ssa-reassoc.cc (compare_repeat_factors): Use explicit
compares to avoid truncations.

Diff:
---
 gcc/tree-ssa-reassoc.cc | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-ssa-reassoc.cc b/gcc/tree-ssa-reassoc.cc
index 4d9f5216d4c..d74352268b5 100644
--- a/gcc/tree-ssa-reassoc.cc
+++ b/gcc/tree-ssa-reassoc.cc
@@ -6414,10 +6414,17 @@ compare_repeat_factors (const void *x1, const void *x2)
   const repeat_factor *rf1 = (const repeat_factor *) x1;
   const repeat_factor *rf2 = (const repeat_factor *) x2;
 
-  if (rf1->count != rf2->count)
-return rf1->count - rf2->count;
+  if (rf1->count < rf2->count)
+return -1;
+  else if (rf1->count > rf2->count)
+return 1;
+
+  if (rf1->rank < rf2->rank)
+return 1;
+  else if (rf1->rank > rf2->rank)
+return -1;
 
-  return rf2->rank - rf1->rank;
+  return 0;
 }
 
 /* Look for repeated operands in OPS in the multiply tree rooted at


[gcc r15-1565] tree-optimization/115597 - allow CSE of two-operator VEC_PERM nodes

2024-06-23 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:2a345214fc332b6f0821edf394ff8802b768db1d

commit r15-1565-g2a345214fc332b6f0821edf394ff8802b768db1d
Author: Richard Biener 
Date:   Sun Jun 23 11:26:39 2024 +0200

tree-optimization/115597 - allow CSE of two-operator VEC_PERM nodes

The following makes sure to always CSE when there's SLP_TREE_SCALAR_STMTS
as otherwise a chain of two-operator node operations can result in
exponential behavior of the CSE process as likely seen when building
510.parest on aarch64.

PR tree-optimization/115597
* tree-vect-slp.cc (vect_cse_slp_nodes): Allow to CSE
VEC_PERM nodes.

Diff:
---
 gcc/tree-vect-slp.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4935cf9e521..e84aeabef94 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6080,7 +6080,6 @@ static void
 vect_cse_slp_nodes (scalar_stmts_to_slp_tree_map_t *bst_map, slp_tree& node)
 {
   if (SLP_TREE_DEF_TYPE (node) == vect_internal_def
-  && SLP_TREE_CODE (node) != VEC_PERM_EXPR
   /* Besides some VEC_PERM_EXPR, two-operator nodes also
 lack scalar stmts and thus CSE doesn't work via bst_map.  Ideally
 we'd have sth that works for all internal and external nodes.  */


Re: gcc 12.4 release archive?

2024-06-23 Thread Richard Biener via Gcc
On Sun, Jun 23, 2024 at 12:02 AM Jonathan Wakely via Gcc
 wrote:
>
> On Sat, 22 Jun 2024, 20:41 Liviu Ionescu,  wrote:
>
> >
> >
> > > On 22 Jun 2024, at 22:02, Andrew Pinski  wrote:
> > >
> > >> GCC 12.4 was released two days ago, but I could not yet find the
> > release archive at https://ftp.gnu.org/gnu/gcc/.
> > >>
> > >> Could you upload it?
> > >
> > > It is located at https://gcc.gnu.org/ftp/gcc/releases/gcc-12.4.0/ .
> >
> > Ok, just that this url is not advertised at
> > https://www.gnu.org/prep/ftp.html.
> >
>
> That is not a page controlled by the GCC project.
>
>
> > > Looks like it was not updated to the ftp.gnu.org 
> > Is the hidden url preferable? Should I update the build scripts to get the
> > archive from gcc.gnu.org/ftp, or stick to ftp.gnu.org  > >/gnu?
> >
>
> The GCC project makes GCC releases, not the GNU project. The mirror sites
> given on the GCC website all have the release:
>
> https://gcc.gnu.org/mirrors.html

It looks like Jakub didn't upload to ftp.gnu.org or for some reason
that process didn't
succeed (though I got no upload failure message).  We split the work
on the release
so that likely resulted in this omission.  I expect this will be
fixed/investigated on Monday.

Richard.


[gcc r15-1564] tree-optimization/115579 - fix wrong code with store-motion

2024-06-23 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:8a1795bddcd34284936af4706f762d89c60fc69c

commit r15-1564-g8a1795bddcd34284936af4706f762d89c60fc69c
Author: Richard Biener 
Date:   Sat Jun 22 14:59:09 2024 +0200

tree-optimization/115579 - fix wrong code with store-motion

The recent change to relax store motion for variables that cannot have
store data races broke the optimization to share flag vars for stores
that all happen in the same single BB.  The following fixes this.

PR tree-optimization/115579
* tree-ssa-loop-im.cc (execute_sm): Return the auxiliary data
created.
(hoist_memory_references): Record the flag var that's eventually
created and re-use it when all stores are in the same BB.

* gcc.dg/pr115579.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/pr115579.c | 18 ++
 gcc/tree-ssa-loop-im.cc | 27 ++-
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr115579.c b/gcc/testsuite/gcc.dg/pr115579.c
new file mode 100644
index 000..04781056723
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115579.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-options "-Os -fno-tree-sra" } */
+
+int printf(const char *, ...);
+int a, b = 1, c;
+int main() {
+  int d[2], *e = [1];
+  while (a) {
+int *f = 
+d[1] = 0;
+*f = 0;
+  }
+  if (c)
+printf("%d\n", *e);
+  if (b != 1)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 3acbd886a0d..61c6339bc35 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -2269,7 +2269,7 @@ struct sm_aux
temporary variable is put to the preheader of the loop, and assignments
to the reference from the temporary variable are emitted to exits.  */
 
-static void
+static sm_aux *
 execute_sm (class loop *loop, im_mem_ref *ref,
hash_map _map, bool maybe_mt,
bool use_other_flag_var)
@@ -2345,6 +2345,8 @@ execute_sm (class loop *loop, im_mem_ref *ref,
   lim_data->tgt_loop = loop;
   gsi_insert_before (, load, GSI_SAME_STMT);
 }
+
+  return aux;
 }
 
 /* sm_ord is used for ordinary stores we can retain order with respect
@@ -2802,20 +2804,18 @@ hoist_memory_references (class loop *loop, bitmap 
mem_refs,
   hash_map aux_map;
 
   /* Execute SM but delay the store materialization for ordered
-sequences on exit.  */
-  bool first_p = true;
+sequences on exit.  Remember a created flag var and make
+sure to re-use it.  */
+  sm_aux *flag_var_aux = nullptr;
   EXECUTE_IF_SET_IN_BITMAP (mem_refs, 0, i, bi)
{
  ref = memory_accesses.refs_list[i];
- execute_sm (loop, ref, aux_map, true, !first_p);
- first_p = false;
+ sm_aux *aux = execute_sm (loop, ref, aux_map, true,
+   flag_var_aux != nullptr);
+ if (aux->store_flag)
+   flag_var_aux = aux;
}
 
-  /* Get at the single flag variable we eventually produced.  */
-  im_mem_ref *ref
-   = memory_accesses.refs_list[bitmap_first_set_bit (mem_refs)];
-  sm_aux *aux = *aux_map.get (ref);
-
   /* Materialize ordered store sequences on exits.  */
   edge e;
   FOR_EACH_VEC_ELT (exits, i, e)
@@ -2826,13 +2826,14 @@ hoist_memory_references (class loop *loop, bitmap 
mem_refs,
  /* Construct the single flag variable control flow and insert
 the ordered seq of stores in the then block.  With
 -fstore-data-races we can do the stores unconditionally.  */
- if (aux->store_flag)
+ if (flag_var_aux)
insert_e
  = single_pred_edge
  (execute_sm_if_changed (e, NULL_TREE, NULL_TREE,
- aux->store_flag,
+ flag_var_aux->store_flag,
  loop_preheader_edge (loop),
- >flag_bbs, append_cond_position,
+ _var_aux->flag_bbs,
+ append_cond_position,
  last_cond_fallthru));
  execute_sm_exit (loop, insert_e, seq, aux_map, sm_ord,
   append_cond_position, last_cond_fallthru);


[gcc r14-10335] tree-optimization/115278 - fix DSE in if-conversion wrt volatiles

2024-06-21 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:272e8c90af527fc1d0055ad0f17f1d97bb0bd6cb

commit r14-10335-g272e8c90af527fc1d0055ad0f17f1d97bb0bd6cb
Author: Richard Biener 
Date:   Fri May 31 10:14:25 2024 +0200

tree-optimization/115278 - fix DSE in if-conversion wrt volatiles

The following adds the missing guard for volatile stores to the
embedded DSE in the loop if-conversion pass.

PR tree-optimization/115278
* tree-if-conv.cc (ifcvt_local_dce): Do not DSE volatile stores.

* g++.dg/vect/pr115278.cc: New testcase.

(cherry picked from commit 65dbe0ab7cdaf2aa84b09a74e594f0faacf1945c)

Diff:
---
 gcc/testsuite/g++.dg/vect/pr115278.cc | 38 +++
 gcc/tree-if-conv.cc   |  4 +++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/vect/pr115278.cc 
b/gcc/testsuite/g++.dg/vect/pr115278.cc
new file mode 100644
index 000..331075fb278
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr115278.cc
@@ -0,0 +1,38 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-additional-options "-fdump-tree-optimized" }
+
+#include 
+
+const int runs = 92;
+
+union BitfieldStructUnion {
+struct {
+uint64_t a : 17;
+uint64_t padding: 39;
+uint64_t b : 8;
+} __attribute__((packed));
+
+struct {
+uint32_t value_low;
+uint32_t value_high;
+} __attribute__((packed));
+
+BitfieldStructUnion(uint32_t value_low, uint32_t value_high) : 
value_low(value_low), value_high(value_high) {}
+};
+
+volatile uint32_t *WRITE = (volatile unsigned*)0x42;
+
+void buggy() {
+for (int i = 0; i < runs; i++) {
+BitfieldStructUnion rt{*WRITE, *WRITE};
+
+rt.a = 99;
+rt.b = 1;
+
+*WRITE = rt.value_low;
+*WRITE = rt.value_high;
+}
+}
+
+// { dg-final { scan-tree-dump-times "\\\*WRITE\[^\r\n\]* ={v} " 2 "optimized" 
} }
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 09d99fb9dda..c4c3ed41a44 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -3381,7 +3381,9 @@ ifcvt_local_dce (class loop *loop)
   gimple_stmt_iterator gsiprev = gsi;
   gsi_prev ();
   stmt = gsi_stmt (gsi);
-  if (gimple_store_p (stmt) && gimple_vdef (stmt))
+  if (!gimple_has_volatile_ops (stmt)
+ && gimple_store_p (stmt)
+ && gimple_vdef (stmt))
{
  tree lhs = gimple_get_lhs (stmt);
  ao_ref write;


[gcc r14-10334] tree-optimization/115508 - fix ICE with SLP scheduling and extern vector

2024-06-21 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:65e25860f49ee7a2cfd4872db06d94ed7675e12e

commit r14-10334-g65e25860f49ee7a2cfd4872db06d94ed7675e12e
Author: Richard Biener 
Date:   Mon Jun 17 14:36:56 2024 +0200

tree-optimization/115508 - fix ICE with SLP scheduling and extern vector

When there's a permute after an extern vector we can run into a case
that didn't consider the scheduled node being a permute which lacks
a representative.

PR tree-optimization/115508
* tree-vect-slp.cc (vect_schedule_slp_node): Guard check on
representative.

* gcc.target/i386/pr115508.c: New testcase.

(cherry picked from commit 65e72b95c63a5501cf1482f3814ae8c8e672bf06)

Diff:
---
 gcc/testsuite/gcc.target/i386/pr115508.c | 15 +++
 gcc/tree-vect-slp.cc |  1 +
 2 files changed, 16 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/pr115508.c 
b/gcc/testsuite/gcc.target/i386/pr115508.c
new file mode 100644
index 000..a97b2007f7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115508.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=znver1" } */
+
+typedef long long v4di __attribute__((vector_size(4 * sizeof (long long;
+
+v4di vec_var;
+extern long long array1[];
+long long g(void)
+{
+  int total_error_4 = 0;
+  total_error_4 += array1 [0] + array1 [1] + array1 [2] + array1 [3];
+  v4di t = vec_var;
+  long long iorvar = t [1] | t [0] | t [2] | t [3];
+  return iorvar + total_error_4;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e55191c83a6..5e7e9b5bf08 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9395,6 +9395,7 @@ vect_schedule_slp_node (vec_info *vinfo,
  si = gsi_after_labels (as_a  (vinfo)->bbs[0]);
}
   else if (is_a  (vinfo)
+  && SLP_TREE_CODE (node) != VEC_PERM_EXPR
   && gimple_bb (last_stmt) != gimple_bb (stmt_info->stmt)
   && gimple_could_trap_p (stmt_info->stmt))
{


[gcc r14-10333] Avoid SLP_REPRESENTATIVE access for VEC_PERM in SLP scheduling

2024-06-21 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:85d32e6f75e7395c12b9979a47b70fe9479ca1ff

commit r14-10333-g85d32e6f75e7395c12b9979a47b70fe9479ca1ff
Author: Richard Biener 
Date:   Fri May 17 15:23:38 2024 +0200

Avoid SLP_REPRESENTATIVE access for VEC_PERM in SLP scheduling

SLP permute nodes can end up without a SLP_REPRESENTATIVE now,
the following avoids touching it in this case in vect_schedule_slp_node.

* tree-vect-slp.cc (vect_schedule_slp_node): Avoid looking
at SLP_REPRESENTATIVE for VEC_PERM nodes.

(cherry picked from commit 31e9bae0ea5e5413abfa3ca9050e66cc6760553e)

Diff:
---
 gcc/tree-vect-slp.cc | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 133606fa6f3..e55191c83a6 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9271,13 +9271,8 @@ vect_schedule_slp_node (vec_info *vinfo,
   gcc_assert (SLP_TREE_NUMBER_OF_VEC_STMTS (node) != 0);
   SLP_TREE_VEC_DEFS (node).create (SLP_TREE_NUMBER_OF_VEC_STMTS (node));
 
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
-"-->vectorizing SLP node starting from: %G",
-stmt_info->stmt);
-
-  if (STMT_VINFO_DATA_REF (stmt_info)
-  && SLP_TREE_CODE (node) != VEC_PERM_EXPR)
+  if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
+  && STMT_VINFO_DATA_REF (stmt_info))
 {
   /* Vectorized loads go before the first scalar load to make it
 ready early, vectorized stores go before the last scalar
@@ -9289,10 +9284,10 @@ vect_schedule_slp_node (vec_info *vinfo,
last_stmt_info = vect_find_last_scalar_stmt_in_slp (node);
   si = gsi_for_stmt (last_stmt_info->stmt);
 }
-  else if ((STMT_VINFO_TYPE (stmt_info) == cycle_phi_info_type
-   || STMT_VINFO_TYPE (stmt_info) == induc_vec_info_type
-   || STMT_VINFO_TYPE (stmt_info) == phi_info_type)
-  && SLP_TREE_CODE (node) != VEC_PERM_EXPR)
+  else if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
+  && (STMT_VINFO_TYPE (stmt_info) == cycle_phi_info_type
+  || STMT_VINFO_TYPE (stmt_info) == induc_vec_info_type
+  || STMT_VINFO_TYPE (stmt_info) == phi_info_type))
 {
   /* For PHI node vectorization we do not use the insertion iterator.  */
   si = gsi_none ();
@@ -9426,6 +9421,9 @@ vect_schedule_slp_node (vec_info *vinfo,
   /* Handle purely internal nodes.  */
   if (SLP_TREE_CODE (node) == VEC_PERM_EXPR)
 {
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"-->vectorizing SLP permutation node\n");
   /* ???  the transform kind is stored to STMT_VINFO_TYPE which might
 be shared with different SLP nodes (but usually it's the same
 operation apart from the case the stmt is only there for denoting
@@ -9444,7 +9442,13 @@ vect_schedule_slp_node (vec_info *vinfo,
  }
 }
   else
-vect_transform_stmt (vinfo, stmt_info, , node, instance);
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"-->vectorizing SLP node starting from: %G",
+stmt_info->stmt);
+  vect_transform_stmt (vinfo, stmt_info, , node, instance);
+}
 }
 
 /* Replace scalar calls from SLP node NODE with setting of their lhs to zero.


  1   2   3   4   5   6   7   8   9   10   >