date:20230629

[PATCH 3/3] rs6000: Teach legitimate_address_p about LEN_{LOAD, STORE} [PR110248]

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

This patch is to teach rs6000_legitimate_address_p to
handle the queried rtx constructed for LEN_{LOAD,STORE},
since lxvl and stxvl doesn't support x-form or ds-form,
so consider it as not legitimate when outer code is PLUS.

This can help to fix SPEC2017 503.bwaves_r degradation
as reported in PR110248 (note that with explicit option
--param=vect-partial-vector-usage=2), also +1.69% for
507.cactuBSSN_r, +2.35% for 510.parest_r (likely noise),
the others are neutral.

I didn't associate one test case for this as I think
checking some dumping of ivopts or final assembly insns
seems quite fragile.

Bootstrapped and regtested on powerpc64-linux-gnu
P7/P8/P9 and powerpc64le-linux-gnu P9/P10.

Is it ok for trunk?

BR,
Kewen
--
PR tree-optimization/110248

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Check if
the given code is for ifn LEN_{LOAD,STORE}, if yes then make it not
legitimate when outer code is PLUS.
---
 gcc/config/rs6000/rs6000.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index efc54528b23..f5c9289fda0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -9856,7 +9856,7 @@ use_toc_relative_ref (rtx sym, machine_mode mode)
during assembly output.  */
 static bool
 rs6000_legitimate_address_p (machine_mode mode, rtx x, bool reg_ok_strict,
-code_helper = ERROR_MARK)
+code_helper ch = ERROR_MARK)
 {
   bool reg_offset_p = reg_offset_addressing_ok_p (mode);
   bool quad_offset_p = mode_supports_dq_form (mode);
@@ -9864,6 +9864,12 @@ rs6000_legitimate_address_p (machine_mode mode, rtx x, 
bool reg_ok_strict,
   if (TARGET_ELF && RS6000_SYMBOL_REF_TLS_P (x))
 return 0;

+  /* lxvl and stxvl doesn't support any addressing modes with PLUS.  */
+  if (ch.is_internal_fn ()
+  && (ch == IFN_LEN_LOAD || ch == IFN_LEN_STORE)
+  && GET_CODE (x) == PLUS)
+return 0;
+
   /* Handle unaligned altivec lvx/stvx type addresses.  */
   if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
   && GET_CODE (x) == AND
--
2.39.3

[PATCH 2/3] ivopts: Call valid_mem_ref_p with code_helper [PR110248]

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

As PR110248 shows, to get the expected query results for
that case internal functions LEN_{LOAD,STORE} is unable to
adopt some addressing modes, we need to pass down the
related IFN code as well.  This patch is to make IVOPTs
pass down ifn code for USE_PTR_ADDRESS type uses, it
adjusts the related {strict_,}memory_address_addr_space_p
and valid_mem_ref_p functions as well.

Bootstrapped and regtested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
PR tree-optimization/110248

gcc/ChangeLog:

* recog.cc (memory_address_addr_space_p): Add one more argument ch of
type code_helper and pass it to targetm.addr_space.legitimate_address_p
instead of ERROR_MARK.
(offsettable_address_addr_space_p): Update one function pointer with
one more argument of type code_helper as its assignees
memory_address_addr_space_p and strict_memory_address_addr_space_p
have been adjusted, and adjust some call sites with ERROR_MARK.
* recog.h (tree.h): New include header file for tree_code ERROR_MARK.
(memory_address_addr_space_p): Adjust with one more unnamed argument
of type code_helper with default ERROR_MARK.
(strict_memory_address_addr_space_p): Likewise.
* reload.cc (strict_memory_address_addr_space_p): Add one unnamed
argument of type code_helper.
* tree-ssa-address.cc (valid_mem_ref_p): Add one more argument ch of
type code_helper and pass it to memory_address_addr_space_p.
* tree-ssa-address.h (valid_mem_ref_p): Adjust the declaration with
one more unnamed argument of type code_helper with default value
ERROR_MARK.
* tree-ssa-loop-ivopts.cc (get_address_cost): Use ERROR_MARK as code
by default, change it with ifn code for USE_PTR_ADDRESS type use, and
pass it to all valid_mem_ref_p calls.
---
 gcc/recog.cc| 13 ++---
 gcc/recog.h | 10 +++---
 gcc/reload.cc   |  2 +-
 gcc/tree-ssa-address.cc |  4 ++--
 gcc/tree-ssa-address.h  |  3 ++-
 gcc/tree-ssa-loop-ivopts.cc | 18 +-
 6 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/gcc/recog.cc b/gcc/recog.cc
index 692c258def6..2bff6c03e4d 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -1802,8 +1802,8 @@ pop_operand (rtx op, machine_mode mode)
for mode MODE in address space AS.  */

 bool
-memory_address_addr_space_p (machine_mode mode ATTRIBUTE_UNUSED,
-rtx addr, addr_space_t as)
+memory_address_addr_space_p (machine_mode mode ATTRIBUTE_UNUSED, rtx addr,
+addr_space_t as, code_helper ch)
 {
 #ifdef GO_IF_LEGITIMATE_ADDRESS
   gcc_assert (ADDR_SPACE_GENERIC_P (as));
@@ -1813,8 +1813,7 @@ memory_address_addr_space_p (machine_mode mode 
ATTRIBUTE_UNUSED,
  win:
   return true;
 #else
-  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as,
- ERROR_MARK);
+  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as, ch);
 #endif
 }

@@ -2430,7 +2429,7 @@ offsettable_address_addr_space_p (int strictp, 
machine_mode mode, rtx y,
   rtx z;
   rtx y1 = y;
   rtx *y2;
-  bool (*addressp) (machine_mode, rtx, addr_space_t) =
+  bool (*addressp) (machine_mode, rtx, addr_space_t, code_helper) =
 (strictp ? strict_memory_address_addr_space_p
 : memory_address_addr_space_p);
   poly_int64 mode_sz = GET_MODE_SIZE (mode);
@@ -2469,7 +2468,7 @@ offsettable_address_addr_space_p (int strictp, 
machine_mode mode, rtx y,
   *y2 = plus_constant (address_mode, *y2, mode_sz - 1);
   /* Use QImode because an odd displacement may be automatically invalid
 for any wider mode.  But it should be valid for a single byte.  */
-  good = (*addressp) (QImode, y, as);
+  good = (*addressp) (QImode, y, as, ERROR_MARK);

   /* In any case, restore old contents of memory.  */
   *y2 = y1;
@@ -2504,7 +2503,7 @@ offsettable_address_addr_space_p (int strictp, 
machine_mode mode, rtx y,

   /* Use QImode because an odd displacement may be automatically invalid
  for any wider mode.  But it should be valid for a single byte.  */
-  return (*addressp) (QImode, z, as);
+  return (*addressp) (QImode, z, as, ERROR_MARK);
 }

 /* Return true if ADDR is an address-expression whose effect depends
diff --git a/gcc/recog.h b/gcc/recog.h
index badf8e3dc1c..c6ef619c5dd 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -20,6 +20,9 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_RECOG_H
 #define GCC_RECOG_H

+/* For enum tree_code ERROR_MARK.  */
+#include "tree.h"
+
 /* Random number that should be large enough for all purposes.  Also define
a type that has at least MAX_RECOG_ALTERNATIVES + 1 bits, with the extra
bit giving an invalid value that can be used to mean "uninitialized".  */
@@ -200,11 +203,12 @@ extern void

[PATCH 1/3] targhooks: Extend legitimate_address_p with code_helper [PR110248]

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

As PR110248 shows, some middle-end passes like IVOPTs can
query the target hook legitimate_address_p with some
artificially constructed rtx to determine whether some
addressing modes are supported by target for some gimple
statement.  But for now the existing legitimate_address_p
only checks the given mode, it's unable to distinguish
some special cases unfortunately, for example, for LEN_LOAD
ifn on Power port, we would expand it with lxvl hardware
insn, which only supports one register to hold the address
(the other register is holding the length), that is we
don't support base (reg) + index (reg) addressing mode for
sure.  But hook legitimate_address_p only considers the
given mode which would be some vector mode for LEN_LOAD
ifn, and we do support base + index addressing mode for
normal vector load and store insns, so the hook will return
true for the query unexpectedly.

This patch is to introduce one extra argument of type
code_helper for hook legitimate_address_p, it makes targets
able to handle some special case like what's described
above.  The subsequent patches will show how to leverage
the new code_helper argument.

I didn't separate those target specific adjustments to
their own patches, since those changes are no function
changes.  One typical change is to add one unnamed argument
with default ERROR_MARK, some ports need to include tree.h
in their {port}-protos.h since the hook is used in some
machine description files.  I've cross-built a corresponding
cc1 successfully for at least one triple of each affected
target so I believe they are safe.  But feel free to correct
me if separating is needed for the review of this patch.

Besides, it's bootstrapped and regtested on
x86_64-redhat-linux and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
--
PR tree-optimization/110248

gcc/ChangeLog:

* coretypes.h (class code_helper): Add forward declaration.
* doc/tm.texi: Regenerate.
* lra-constraints.cc (valid_address_p): Call target hook
targetm.addr_space.legitimate_address_p with an extra parameter
ERROR_MARK as its prototype changes.
* recog.cc (memory_address_addr_space_p): Likewise.
* reload.cc (strict_memory_address_addr_space_p): Likewise.
* target.def (legitimate_address_p, addr_space.legitimate_address_p):
Extend with one more argument of type code_helper, update the
documentation accordingly.
* targhooks.cc (default_legitimate_address_p): Adjust for the
new code_helper argument.
(default_addr_space_legitimate_address_p): Likewise.
* targhooks.h (default_legitimate_address_p): Likewise.
(default_addr_space_legitimate_address_p): Likewise.
* config/aarch64/aarch64.cc (aarch64_legitimate_address_hook_p): Adjust
with extra unnamed code_helper argument with default ERROR_MARK.
* config/alpha/alpha.cc (alpha_legitimate_address_p): Likewise.
* config/arc/arc.cc (arc_legitimate_address_p): Likewise.
* config/arm/arm-protos.h (arm_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/arm/arm.cc (arm_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/avr/avr.cc (avr_addr_space_legitimate_address_p): Likewise.
* config/bfin/bfin.cc (bfin_legitimate_address_p): Likewise.
* config/bpf/bpf.cc (bpf_legitimate_address_p): Likewise.
* config/c6x/c6x.cc (c6x_legitimate_address_p): Likewise.
* config/cris/cris-protos.h (cris_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/cris/cris.cc (cris_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/csky/csky.cc (csky_legitimate_address_p): Likewise.
* config/epiphany/epiphany.cc (epiphany_legitimate_address_p):
Likewise.
* config/frv/frv.cc (frv_legitimate_address_p): Likewise.
* config/ft32/ft32.cc (ft32_addr_space_legitimate_address_p): Likewise.
* config/gcn/gcn.cc (gcn_addr_space_legitimate_address_p): Likewise.
* config/h8300/h8300.cc (h8300_legitimate_address_p): Likewise.
* config/i386/i386.cc (ix86_legitimate_address_p): Likewise.
* config/ia64/ia64.cc (ia64_legitimate_address_p): Likewise.
* config/iq2000/iq2000.cc (iq2000_legitimate_address_p): Likewise.
* config/lm32/lm32.cc (lm32_legitimate_address_p): Likewise.
* config/loongarch/loongarch.cc (loongarch_legitimate_address_p):
Likewise.
* config/m32c/m32c.cc (m32c_legitimate_address_p): Likewise.
(m32c_addr_space_legitimate_address_p): Likewise.
* config/m32r/m32r.cc (m32r_legitimate_address_p): Likewise.
* config/m68k/m68k.cc (m68k_legitimate_address_p): Likewise.
* config/mcore/mcore.cc

[PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-06-29 Thread Eugene Rozenfeld via Gcc-patches

When we collect just user events for autofdo with lbr we get some events where 
branch
sources are kernel addresses and branch targets are user addresses. Without 
kernel MMAP
events create_gcov can't make sense of kernel addresses. Currently create_gcov 
fails if
it can't map at least 95% of events. We sometimes get below this threshold with 
just
user events. The change is to collect both user events and kernel events.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Collect both kernel and user events for autofdo
* Makefile.tpl: Collect both kernel and user events for autofdo

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Collect both kernel and user events for 
autofdo
---
 Makefile.in   | 2 +-
 Makefile.tpl  | 2 +-
 gcc/testsuite/lib/target-supports.exp | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index f19a9db621e..04307ca561b 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -404,7 +404,7 @@ MAKEINFO = @MAKEINFO@
 EXPECT = @EXPECT@
 RUNTEST = @RUNTEST@
 
-AUTO_PROFILE = gcc-auto-profile -c 1000
+AUTO_PROFILE = gcc-auto-profile --all -c 1000
 
 # This just becomes part of the MAKEINFO definition passed down to
 # sub-makes.  It lets flags be given on the command line while still
diff --git a/Makefile.tpl b/Makefile.tpl
index 3a5b7ed3c92..d0fe7e2fb77 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -407,7 +407,7 @@ MAKEINFO = @MAKEINFO@
 EXPECT = @EXPECT@
 RUNTEST = @RUNTEST@
 
-AUTO_PROFILE = gcc-auto-profile -c 1000
+AUTO_PROFILE = gcc-auto-profile --all -c 1000
 
 # This just becomes part of the MAKEINFO definition passed down to
 # sub-makes.  It lets flags be given on the command line while still
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4d04df2a709..b16853d76df 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -704,7 +704,7 @@ proc check_effective_target_keeps_null_pointer_checks { } {
 # this allows parallelism of 16 and higher of parallel gcc-auto-profile
 proc profopt-perf-wrapper { } {
 global srcdir
-return "$srcdir/../config/i386/gcc-auto-profile -m8 "
+return "$srcdir/../config/i386/gcc-auto-profile --all -m8 "
 }
 
 # Return true if profiling is supported on the target.
-- 
2.25.1

[PATCH] tree.h: Hide wi::from_mpz from GENERATOR_FILE

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

Similar to r0-85707-g34917a102a4e0c for PR35051, the uses
of mpz_t should be guarded with "#ifndef GENERATOR_FILE".
This patch is to fix it and avoid some possible build
errors.

Bootstrapped and regress-tested on x86_64-redhat-linux,
and powerpc64{,le}-linux-gnu.  And cross-build well on
power for 40+ different ports.

Is it ok for trunk?

gcc/ChangeLog:

* tree.h (wi::from_mpz): Hide from GENERATOR_FILE.
---
 gcc/tree.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree.h b/gcc/tree.h
index 1854fe4a7d4..7e92a12f9cb 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -6460,7 +6460,9 @@ namespace wi

   wide_int min_value (const_tree);
   wide_int max_value (const_tree);
+#ifndef GENERATOR_FILE
   wide_int from_mpz (const_tree, mpz_t, bool);
+#endif
 }

 template 
--
2.39.3

[Bug tree-optimization/110494] [14 Regression] Wrong code at -O3 on x86_64-pc-linux-gnu

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110494

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
  iftmp.3_55 = d[_57];
  iftmp.6_54 = (short int) iftmp.3_55;
  _6 = iftmp.3_55 > 1;
  _46 = (short int) _6;
  _143 = _46 | iftmp.6_54;


  # RANGE [irange] short int [0, 1] NONZERO 0x1
  iftmp.6_54 = (short intD.17) iftmp.3_55;

Another dup of bug 110252 (just a different testcase).

Tomorrow I am going to finish up the patch.

*** This bug has been marked as a duplicate of bug 110252 ***

[Bug tree-optimization/110252] [14 Regression] Wrong code at -O2/3/s on x86_64-linux-gnu

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110252

--- Comment #12 from Andrew Pinski  ---
*** Bug 110494 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/110494] New: [14 Regression] Wrong code at -O3 on x86_64-pc-linux-gnu

2023-06-29 Thread jwzeng at nuaa dot edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110494

Bug ID: 110494
   Summary: [14 Regression] Wrong code at -O3 on
x86_64-pc-linux-gnu
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jwzeng at nuaa dot edu.cn
  Target Milestone: ---

Link to the Compiler Explorer: https://godbolt.org/z/dq38c9zea

The following code snippet:

#include 
unsigned long long int seed = 0;
unsigned int var_1 = 631366671U;
#define min(a, b) ((a) < (b) ? (a) : (b))
int main() {
short a[18];
int b = 1;
unsigned long long int c[18];
unsigned long long int d[18];
for (int i = 0; i < 18; ++i)
a[i] = 4, c[i] = 3, d[i] = 2;
for (unsigned short i = 1; i < 16; i += var_1 - 57737)
a[i] = (short)min(b % (~var_1), c[i] ? d[i] : (long long unsigned
int)b);
for (int i = 0; i < 18; ++i)
seed ^= a[i];
printf("%llu\n", seed);
}

> $ /usr/gcc-trunk/bin/gcc -O0 bug.c; ./a.out
> 5
> $ /usr/gcc-trunk/bin/gcc -O1 bug.c; ./a.out
> 5
> $ /usr/gcc-trunk/bin/gcc -O2 bug.c; ./a.out
> 5
> $ /usr/gcc-trunk/bin/gcc -O3 bug.c; ./a.out
> 7
> $ /usr/gcc-trunk/bin/gcc -Os bug.c; ./a.out
> 5
When compiled with -O3, it prints the wrong result 7 instead of 5. Earlier GCCs
do not have this bug.

> $ /usr/gcc-trunk/bin/gcc --version
Using built-in specs.
COLLECT_GCC=/home/compilers/gcc/gcc-trunk/bin/gcc
COLLECT_LTO_WRAPPER=/home/compilers/gcc/gcc-trunk/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk-source/gcc/configure
--enable-languages=c,c++,fortran --enable-checking=release
--enable-valgrind-annotations --disable-werror --disable-libstdcxx-pch
--enable-libgomp --enable-lto --enable-gold --with-plugin-ld=gold
--prefix=/usr/local/gcc-trunk
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20230619 (experimental) [master r14-1917-gf8e0270272] (GCC)

Re: PR82943 - Suggested patch to fix

2023-06-29 Thread Steve Kargl via Gcc-patches

On Thu, Jun 29, 2023 at 10:38:42PM -0500, Alexander Westbrooks via Fortran 
wrote:
> I have finished my testing, and updated my patch and relevant Changelogs. I
> added 4 new tests and all the existing tests in the current testsuite
> for gfortran passed or failed as expected. Do I need to attach the test
> results here?

Yes.  It helps others also do testing to have one self-contained
patch (which I don't know to generate with git and new files :-( ).
It may also be a good idea to attach the patch and test cases to
the PR in bugzilla so that they don't accidentally get lost.

> The platform I tested on was a Docker container running in Docker Desktop,
> running the "mcr.microsoft.com/devcontainers/universal:2-linux" image.
> 
> I also made sure that my code changes followed the coding standards. Please
> let me know if there is anything else that I need to do. I don't have
> write-access to the repository.

See the legal link that Harald provided.  At one time, one needed to
assign copyright to the FSF with a wet-ink signature on some form.
Now, I think you just need to attest that you have the right to
provide the code to the gcc project.

PS: Welcome to the gfortran development world.  Don't be put off
if there is a delay in getting feedback/review.  There are too 
few contributors and too little time.   If a week passes simply
ping the mailing list.  I'll try to carve out some time to look
over your patch this weekend.

-- 
steve


> 
> Thanks,
> 
> Alexander
> 
> On Wed, Jun 28, 2023 at 4:14 PM Harald Anlauf  wrote:
> 
> > Hi Alex,
> >
> > welcome to the gfortran community.  It is great that you are trying
> > to get actively involved.
> >
> > You already did quite a few things right: patches shall be sent to
> > the gcc-patches ML, but Fortran reviewers usually notice them only
> > where they are copied to the fortran ML.
> >
> > There are some general recommendations on the formatting of C code,
> > like indentation, of the patches, and of the commit log entries.
> >
> > Regarding coding standards, see https://www.gnu.org/prep/standards/ .
> >
> > Regarding testcases, a recommendation is to have a look at
> > existing testcases, e.g. in gcc/testsuite/gfortran.dg/, and then
> > decide if the testcase shall test the compile-time or run-time
> > behaviour, and add the necessary dejagnu directives.
> >
> > You should also verify if your patch passes regression testing.
> > For changes to gfortran, it is usually sufficient to run
> >
> > make check-fortran -j 
> >
> > where  is the number of parallel tests.
> > You would need to report also the platform where you tested on.
> >
> > There is also a legal issue to consider before non-trivial patches can
> > be accepted for incorporation: https://gcc.gnu.org/contribute.html#legal
> >
> > If your patch is accepted and if you do not have write-access to the
> > repository, one of the maintainers will likely take care of it.
> > If you become a regular contributor, you will probably want to consider
> > getting write access.
> >
> > Cheers,
> > Harald
> >
> >
> >
> > On 6/24/23 19:17, Alexander Westbrooks via Gcc-patches wrote:
> > > Hello,
> > >
> > > I am new to the GFortran community. Over the past two weeks I created a
> > > patch that should fix PR82943 for GFortran. I have attached it to this
> > > email. The patch allows the code below to compile successfully. I am
> > > working on creating test cases next, but I am new to the process so it
> > may
> > > take me some time. After I make test cases, do I email them to you as
> > well?
> > > Do I need to make a pull-request on github in order to get the patch
> > > reviewed?
> > >
> > > Thank you,
> > >
> > > Alexander Westbrooks
> > >
> > > module testmod
> > >
> > >  public :: foo
> > >
> > >  type, public :: tough_lvl_0(a, b)
> > >  integer, kind :: a = 1
> > >  integer, len :: b
> > >  contains
> > >  procedure :: foo
> > >  end type
> > >
> > >  type, public, EXTENDS(tough_lvl_0) :: tough_lvl_1 (c)
> > >  integer, len :: c
> > >  contains
> > >  procedure :: bar
> > >  end type
> > >
> > >  type, public, EXTENDS(tough_lvl_1) :: tough_lvl_2 (d)
> > >  integer, len :: d
> > >  contains
> > >  procedure :: foobar
> > >  end type
> > >
> > > contains
> > >  subroutine foo(this)
> > >  class(tough_lvl_0(1,*)), intent(inout) :: this
> > >  end subroutine
> > >
> > >  subroutine bar(this)
> > >  class(tough_lvl_1(1,*,*)), intent(inout) :: this
> > >  end subroutine
> > >
> > >  subroutine foobar(this)
> > >  class(tough_lvl_2(1,*,*,*)), intent(inout) :: this
> > >  end subroutine
> > >
> > > end module
> > >
> > > PROGRAM testprogram
> > >  USE testmod
> > >
> > >  TYPE(tough_lvl_0(1,5)) :: test_pdt_0
> > >  TYPE(tough_lvl_1(1,5,6))   :: test_pdt_1
> > >  TYPE(tough_lvl_2(1,5,6,7)) :: test_pdt_2
> > >
> > >  CALL test_pdt_0%foo()
> > >
> > >

[Bug c++/110493] 'is not a constant expression' for function-local static std::initializer_list with fsanitize=undefined

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110493

Andrew Pinski  changed:

   What|Removed |Added

  Component|libstdc++   |c++

--- Comment #3 from Andrew Pinski  ---
Note it was not libstdc++ that changed but the front-end (PR 107079  ).

[Bug libstdc++/110493] 'is not a constant expression' for function-local static std::initializer_list with fsanitize=undefined

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110493

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|DUPLICATE   |---
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=107079
 Status|RESOLVED|UNCONFIRMED

--- Comment #2 from Andrew Pinski  ---
Actually let's reopen this ...

[Bug sanitizer/71962] error: ‘((& x) != 0u)’ is not a constant expression

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71962

--- Comment #11 from Andrew Pinski  ---
*** Bug 110493 has been marked as a duplicate of this bug. ***

[Bug libstdc++/110493] 'is not a constant expression' for function-local static std::initializer_list with fsanitize=undefined

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110493

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
std::string::string is a constexpr in C++20 ...

*** This bug has been marked as a duplicate of bug 71962 ***

[Bug libstdc++/110493] New: 'is not a constant expression' for function-local static std::initializer_list with fsanitize=undefined

2023-06-29 Thread ed at catmur dot uk via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110493

Bug ID: 110493
   Summary: 'is not a constant expression' for function-local
static std::initializer_list with
fsanitize=undefined
   Product: gcc
   Version: 12.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ed at catmur dot uk
  Target Milestone: ---

Since 12.3.0, under -fsanitize=undefined -std=c++20:

#include 
#include 
void f() {
static auto a = {std::string()};
}

error: '(((const std::__cxx11::basic_string*)(& )) != 0)' is
not a constant expression

Clearly this is an instance of bug 71962, but this is in fairly unexceptional
code and could make ubsan unusable; it's not obvious that this code should have
anything to do with constant evaluation.

Since it only appears in a point release I would hope it might be possible to
fix in library for this specific case, within ?

[Bug c++/92752] [10 Regression] Bogus "ignored qualifiers" warning on const-qualified pointer-to-member-function objects

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92752

--- Comment #7 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Patrick Palka
:

https://gcc.gnu.org/g:8c12c47d0c5c40df6e5eeb8625d4708c8a42dbe0

commit r10-11483-g8c12c47d0c5c40df6e5eeb8625d4708c8a42dbe0
Author: Patrick Palka 
Date:   Fri Jan 28 15:41:15 2022 -0500

c++: bogus warning with value init of const pmf [PR92752]

Here we're emitting a -Wignored-qualifiers warning for an intermediate
compiler-generated cast of nullptr to 'method-type* const' as part of
value initialization of a const pmf.  This patch suppresses the warning
by instead casting to the corresponding unqualified type.

PR c++/92752

gcc/cp/ChangeLog:

* typeck.c (build_ptrmemfunc): Cast a nullptr constant to the
unqualified pointer type not the qualified one.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wignored-qualifiers2.C: New test.

Co-authored-by: Jason Merrill 
(cherry picked from commit e971990cbda091b4caf5e1a5bded5121068934e4)

[Bug c++/97420] [10 Regression] NTTP function reference cannot bind to noexcept function

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97420

--- Comment #12 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Patrick Palka
:

https://gcc.gnu.org/g:ad42219766d0e67bede2c9bd94c98082cde42518

commit r10-11482-gad42219766d0e67bede2c9bd94c98082cde42518
Author: Patrick Palka 
Date:   Wed May 26 08:35:31 2021 -0400

c++: Fix reference NTTP binding to noexcept fn [PR97420]

Here, in C++17 mode, convert_nontype_argument_function is rejecting
binding a non-noexcept function reference template parameter to a
noexcept function (encoded as the template argument '*(int (&) (int)) ').

The first roadblock to making this work is that the argument is wrapped
an an implicit INDIRECT_REF, so we need to unwrap it before calling
strip_fnptr_conv.

The second roadblock is that the NOP_EXPR cast converts from a function
pointer type to a reference type while simultaneously removing the
noexcept qualification, and fnptr_conv_p doesn't consider this cast to
be a function pointer conversion.  This patch fixes this by making
fnptr_conv_p treat REFERENCE_TYPEs and POINTER_TYPEs interchangeably.

Finally, in passing, this patch also simplifies noexcept_conv_p by
removing a bunch of redundant checks already performed by its only
caller fnptr_conv_p.

PR c++/97420

gcc/cp/ChangeLog:

* cvt.c (noexcept_conv_p): Remove redundant checks and simplify.
(fnptr_conv_p): Don't call non_reference.  Use INDIRECT_TYPE_P
instead of TYPE_PTR_P.
* pt.c (convert_nontype_argument_function): Look through
implicit INDIRECT_REFs before calling strip_fnptr_conv.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept68.C: New test.

(cherry picked from commit b4329e3dd6fb7c78948fcf9d2f5b9d873deec284)

[Bug libstdc++/108672] [13 Regression] g++.dg/modules/xtreme-header-2_a.H, _b.C, _c.C

2023-06-29 Thread hp at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108672

Hans-Peter Nilsson  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #5 from Hans-Peter Nilsson  ---
Re-fixed and re-closed.

Inquiry about Getting Started with GCC Projects as a Beginner

2023-06-29 Thread Vatsalya Dubey via Gcc

Dear GCC Company,

I hope this email finds you well. I am writing to express my interest in
working on GCC projects as a beginner and seeking guidance on how to get
started. I am excited about the opportunity to contribute to your projects
and gain valuable experience in the process.

As a beginner, I understand that I might need to acquire certain skills,
languages, frameworks, and tools to effectively participate in GCC
projects. I would greatly appreciate it if you could provide me with some
information on the following aspects:

   1.

   Programming Languages: Which programming languages are commonly used in
   GCC projects? Are there any specific languages that you recommend beginners
   to focus on initially?
   2.

   Frameworks: Are there any frameworks that are commonly utilized in GCC
   projects? If so, could you suggest any specific frameworks that beginners
   should become familiar with?
   3.

   Tools: What are the essential tools utilized in GCC projects? It would
   be helpful to know which tools are commonly used for tasks such as version
   control, project management, documentation, and testing.
   4.

   Learning Resources: Could you please recommend any online resources,
   tutorials, or documentation that can assist beginners in understanding GCC
   projects and the associated technologies?

I understand that getting started with any new project can be challenging,
but I am enthusiastic and committed to expanding my knowledge and skills to
contribute effectively. Any guidance or suggestions you can provide would
be greatly appreciated.

Thank you for your time and consideration. I look forward to hearing from
you and beginning my journey with GCC projects.

Yours sincerely,

Vatsalya Dubey

PR108672 re-fixed after [PATCH] libstdc++: Synchronize PSTL with upstream

2023-06-29 Thread Hans-Peter Nilsson via Gcc-patches

> Date: Mon, 26 Jun 2023 11:57:49 -0700
> From: Thomas Rodgers via Gcc-patches 

> On Wed, May 17, 2023 at 12:32 PM Jonathan Wakely  wrote:
> > All the actual code changes look good.

Unfortunately, this overwrote the fix for PR108672.  I take
it there's a step missing from the synchronization process;
a check that no local commits are overwritten?  Sounds like
something that can be fully scripted (not volunteering) or
already available (like, "list all commits affecting
contents touched by/between two named commits").

I did *not* check whether any other local commits were also
overwritten.  Also, not sure about whether better try to get
this upstreamed: __INT32_TYPE__ seems gcc-specific.

Anyway, r13-5702-g72058eea9d407e was "re-committed" per
below as obvious after regtesting cris-elf.

brgds, H-P

-- >8 --
Subject: libstdc++: Re-apply PR108672 fix (avoid use of naked int32_t in 
unseq_backend_simd.h)

The fix was overwritten by r14-2109-g3162ca09dbdc2e "libstdc++:
Synchronize PSTL with upstream".

libstdc++-v3:

PR libstdc++/108672
* include/pstl/unseq_backend_simd.h (__simd_or): Re-apply using
__INT32_TYPE__ instead of int32_t.
---
 libstdc++-v3/include/pstl/unseq_backend_simd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/pstl/unseq_backend_simd.h 
b/libstdc++-v3/include/pstl/unseq_backend_simd.h
index 69784bcdbe66..f3c38fbbbc2a 100644
--- a/libstdc++-v3/include/pstl/unseq_backend_simd.h
+++ b/libstdc++-v3/include/pstl/unseq_backend_simd.h
@@ -74,7 +74,7 @@ __simd_or(_Index __first, _DifferenceType __n, _Pred __pred) 
noexcept
 const _Index __last = __first + __n;
 while (__last != __first)
 {
-int32_t __flag = 1;
+__INT32_TYPE__ __flag = 1;
 _PSTL_PRAGMA_SIMD_REDUCTION(& : __flag)
 for (_DifferenceType __i = 0; __i < __block_size; ++__i)
 if (__pred(*(__first + __i)))
-- 
2.30.2

Re: PR82943 - Suggested patch to fix

2023-06-29 Thread Alexander Westbrooks via Gcc-patches

Hello,

I have finished my testing, and updated my patch and relevant Changelogs. I
added 4 new tests and all the existing tests in the current testsuite
for gfortran passed or failed as expected. Do I need to attach the test
results here?

The platform I tested on was a Docker container running in Docker Desktop,
running the "mcr.microsoft.com/devcontainers/universal:2-linux" image.

I also made sure that my code changes followed the coding standards. Please
let me know if there is anything else that I need to do. I don't have
write-access to the repository.

Thanks,

Alexander

On Wed, Jun 28, 2023 at 4:14 PM Harald Anlauf  wrote:

> Hi Alex,
>
> welcome to the gfortran community.  It is great that you are trying
> to get actively involved.
>
> You already did quite a few things right: patches shall be sent to
> the gcc-patches ML, but Fortran reviewers usually notice them only
> where they are copied to the fortran ML.
>
> There are some general recommendations on the formatting of C code,
> like indentation, of the patches, and of the commit log entries.
>
> Regarding coding standards, see https://www.gnu.org/prep/standards/ .
>
> Regarding testcases, a recommendation is to have a look at
> existing testcases, e.g. in gcc/testsuite/gfortran.dg/, and then
> decide if the testcase shall test the compile-time or run-time
> behaviour, and add the necessary dejagnu directives.
>
> You should also verify if your patch passes regression testing.
> For changes to gfortran, it is usually sufficient to run
>
> make check-fortran -j 
>
> where  is the number of parallel tests.
> You would need to report also the platform where you tested on.
>
> There is also a legal issue to consider before non-trivial patches can
> be accepted for incorporation: https://gcc.gnu.org/contribute.html#legal
>
> If your patch is accepted and if you do not have write-access to the
> repository, one of the maintainers will likely take care of it.
> If you become a regular contributor, you will probably want to consider
> getting write access.
>
> Cheers,
> Harald
>
>
>
> On 6/24/23 19:17, Alexander Westbrooks via Gcc-patches wrote:
> > Hello,
> >
> > I am new to the GFortran community. Over the past two weeks I created a
> > patch that should fix PR82943 for GFortran. I have attached it to this
> > email. The patch allows the code below to compile successfully. I am
> > working on creating test cases next, but I am new to the process so it
> may
> > take me some time. After I make test cases, do I email them to you as
> well?
> > Do I need to make a pull-request on github in order to get the patch
> > reviewed?
> >
> > Thank you,
> >
> > Alexander Westbrooks
> >
> > module testmod
> >
> >  public :: foo
> >
> >  type, public :: tough_lvl_0(a, b)
> >  integer, kind :: a = 1
> >  integer, len :: b
> >  contains
> >  procedure :: foo
> >  end type
> >
> >  type, public, EXTENDS(tough_lvl_0) :: tough_lvl_1 (c)
> >  integer, len :: c
> >  contains
> >  procedure :: bar
> >  end type
> >
> >  type, public, EXTENDS(tough_lvl_1) :: tough_lvl_2 (d)
> >  integer, len :: d
> >  contains
> >  procedure :: foobar
> >  end type
> >
> > contains
> >  subroutine foo(this)
> >  class(tough_lvl_0(1,*)), intent(inout) :: this
> >  end subroutine
> >
> >  subroutine bar(this)
> >  class(tough_lvl_1(1,*,*)), intent(inout) :: this
> >  end subroutine
> >
> >  subroutine foobar(this)
> >  class(tough_lvl_2(1,*,*,*)), intent(inout) :: this
> >  end subroutine
> >
> > end module
> >
> > PROGRAM testprogram
> >  USE testmod
> >
> >  TYPE(tough_lvl_0(1,5)) :: test_pdt_0
> >  TYPE(tough_lvl_1(1,5,6))   :: test_pdt_1
> >  TYPE(tough_lvl_2(1,5,6,7)) :: test_pdt_2
> >
> >  CALL test_pdt_0%foo()
> >
> >  CALL test_pdt_1%foo()
> >  CALL test_pdt_1%bar()
> >
> >  CALL test_pdt_2%foo()
> >  CALL test_pdt_2%bar()
> >  CALL test_pdt_2%foobar()
> >
> >
> > END PROGRAM testprogram
>
>


0001-Fix-fortran-PR82943-PR86148-and-PR86268.patch
Description: Binary data

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Kewen.Lin via Gcc-patches

Hi Carl,

on 2023/6/30 05:36, Carl Love wrote:
> Kewen:
> 
> On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
>>> Yea, I was going with a runnable test and didn't include the
>>> instruction counts.  Added back in.  Rather then doing by processor
>>> version (P8, P9, P10) I was able to do it by BE/LE.  The
>>> instruction
>>> counts were the same for LE accross processor versions but there
>>> are a
>>> few instruction counts that vary with BE and LE.
>>
>> But the original test case only checks for cpu-types (processor
>> version)
>> but not for endianness, it means for the bif usages, there should not
>> be
>> different for endianness.  Why does this changes with your new test
>> case?
>> Could you have a further look and make it consistent with some
>> adjustment
>> if possible?  As we know, checking insn counts sometimes are fragile,
>> so
>> I think we should try our best to make it as robust as possible in
>> the
>> first place.
>>
>> Besides, the original case also have some differences between p7/p8
>> and
>> p9.
>>   
> 
> There are differences on P8 LE versus BE.  I did a diff between the P8
> and P9 tests:
> 
>  diff vsx-vector-6.p8.c vsx-vector-6.p9.c
> 3,4c3,4
> < /* { dg-require-effective-target powerpc_p8vector_ok } */
> < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> ---
>> /* { dg-require-effective-target powerpc_p9vector_ok } */
>> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> 12c12
> < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
> ---
>> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
> 23d22
> < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> 37c36
> < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
> ---
>> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> 
> So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
> xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different
> between the two architectures.  I then wrote a script to compile the
> CPU specific test on Power 8, Power 9 and Power 10 architectures and
> then grep for the above list of instructions.  If I run the scrip on P8
> BE  and LE I get> 
> 
> Power 8 BEPower 8 LE   Power 9 LE   Power 9 BEPower 10 LE*
>(makalu-lp1)(genoa) (marlin)  (nilram)   (ltcd97-lp3)
> instruction   count countcount countcount
> vperm  1  10 00
> vpermr 0  00 00
> xxpermr0  01 01
> xvmsubadp  1  01 11
> xvmsubmdp  0  10 00
> xvsubdp1  11 11
> 

Thanks for looking into this and making this statistics.

Is there a typo for column nilram?   Otherwise, the below insn check

/* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */

would fail there.

> 
> From the diff we see 
> 
>   { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } }
> 
> This test picks up the correct subtraction instruction for LE versus BE
> so this "masks" the LE/BE difference.  I changed the check in vsx-
> vector-6-func-3op.c to match.  This eliminates the LE and BE checks and
> reduces the number of specific checks.

OK, nice.

> 
> In vsx-vector-6-func-3op.c  The new test checks the counts for
> xxpermdi, which the original test does not check.  The check for
> xxpermdi are not needed.  They are not directly related to the builtin
> tests.  I removed them.

OK.

> 
> Looking at the LE/BE checks in the other test file vsx-vector-6-func-
> 2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in
> the original test.  The functions where these instructions are used get
> inlined.  On LE, the binary instructions show up in the inlined code as
> well as what appears to be the binary for the original, non-inlined
> function.  Best I can see, the binary for the original function is dead
> code.  I don't see any calls to it.  Seems like it shouldn't be there
> as it would make the binary smaller. On BE, I don't see the binary for
> the original non-inlined function.  
> 
> I had played with putting -Wno-inline on the command line but that
> didn't seem to make any difference.  However, you suggestion of
> __attribute__ ((noipa)) does prevent the inlining and we don't get the
> second copy of the instructions showing up. The inlining eliminated the
> LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.

-Winline is a option for warning: "Warn if a function that is declared
as inline cannot be inlined.", I think what you wanted is -fno-inline,
and it's good to know noipa helps here.

> 
> The instruction count test for xxlor in vsx-vector-6-func-2lop.c
> differs on LE and BE vsx-vector-6-func-2op.c.  I believe the
> instruction is used with loads to

[Bug c++/110492] Attempted optimization of switch statement pessimizes it instead

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110492

--- Comment #1 from Andrew Pinski  ---
Created attachment 55431
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55431=edit
testcase

Next time please attach or paste inline the code instead of posting just a link
to godbolt.org .

[Bug libstdc++/108672] [13 Regression] g++.dg/modules/xtreme-header-2_a.H, _b.C, _c.C

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108672

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Hans-Peter Nilsson :

https://gcc.gnu.org/g:b22cf5f0321f6425a357c06647a4366d99ddac61

commit r14-2206-gb22cf5f0321f6425a357c06647a4366d99ddac61
Author: Hans-Peter Nilsson 
Date:   Sat Feb 4 18:38:45 2023 +0100

libstdc++: Re-apply PR108672 fix (avoid use of naked int32_t in
unseq_backend_simd.h)

The fix was overwritten by r14-2109-g3162ca09dbdc2e "libstdc++:
Synchronize PSTL with upstream".

libstdc++-v3:

PR libstdc++/108672
* include/pstl/unseq_backend_simd.h (__simd_or): Re-apply using
__INT32_TYPE__ instead of int32_t.

[Bug target/109435] overaligned structs are not passed correctly for mips64

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109435

--- Comment #4 from CVS Commits  ---
The master branch has been updated by YunQiang Su :

https://gcc.gnu.org/g:e20abdb749d0c0c8552da998ff8ec139b830f5eb

commit r14-2205-ge20abdb749d0c0c8552da998ff8ec139b830f5eb
Author: Jovan Dmitrovic 
Date:   Mon Jun 26 17:00:20 2023 +0200

mips: Fix overaligned function arguments [PR109435]

This patch changes alignment for typedef types when passed as
arguments, making the alignment equal to the alignment of
original (aliased) types.

This change makes it impossible for a typedef type to have
alignment that is less than its size.

2023-06-27  Jovan DmitroviÄ  

gcc/ChangeLog:

PR target/109435
* config/mips/mips.cc (mips_function_arg_alignment): Returns
the alignment of function argument. In case of typedef type,
it returns the aligment of the aliased type.
(mips_function_arg_boundary): Relocated calculation of the
aligment of function arguments.

gcc/testsuite/ChangeLog:

* gcc.target/mips/align-1-n64.c: New test.
* gcc.target/mips/align-1-o32.c: New test.

Re: [PATCH] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-29 Thread Peter Bergner via Gcc-patches

On 6/29/23 4:13 AM, Kewen.Lin wrote:
> on 2023/6/19 23:57, Carl Love wrote:
>> The following patch changes the return type of the
>>  __builtin_set_fpscr_rn builtin from void to double.  The return value
>> is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
>> XE, NI, RN bit positions when the builtin is called.  The builtin then
>> updated the RN field with the new value provided as an argument to the
>> builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
>> check that the builtin returns the current value of the FPSCR fields
>> and then updates the RN field.
> 
> But this patch also introduces one more overloading instance with argument
> type double, I had a check on glibc usage of mffscrn/mffscrni, I don't
> think it's necessary to add this new instance, as it takes the given
> rounding mode as integral type.

I agree with Ke Wen, I don't know why there is an extra overloaded
instance.  I assumed we'd have just the one we had before, except now
its return type is double and not void.

That said, we need to announce to users that the return type has
changed, so new code compiled with a new GCC can get the new return
value, but if that new code is compiled with an old GCC (still has void
return type), it knows it needs to use some other method to get the
FPSCR value, if it needs it.  We normally do that with a predefine
macro (#define ...) that the user can test for in their code, ala:

#ifdef __SET_FPSCR_RN_RETURNS_FPSCR__  /* Or whatever name.  */
  ret = __builtin_set_fpscr_rn (..);
#else
  __builtin_set_fpscr_rn (..);
  ret = ;
#endif

We add those predefined macros in rs6000-c.cc:rs6000_target_modify_macros()
and it should be gated via the same conditions that the built-in itself
is enabled.  Look at my addition of the __TM_FENCE__ macro that let the
user know our HTM insn patterns were fixed to now act as memory barriers.

Peter

[Bug libstdc++/108672] [13 Regression] g++.dg/modules/xtreme-header-2_a.H, _b.C, _c.C

2023-06-29 Thread hp at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108672

Hans-Peter Nilsson  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Ever confirmed|0   |1
   Last reconfirmed||2023-06-30
 Status|RESOLVED|REOPENED

--- Comment #3 from Hans-Peter Nilsson  ---
I'm briefly reopening this PR so my next commit makes sense, as
r14-2109-g3162ca09dbdc2e overwrote the fix from February, making the fails
re-appear.  I'll re-commit the fix shortly.

[Bug c++/110492] New: Attempted optimization of switch statement pessimizes it instead

2023-06-29 Thread qufanat at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110492

Bug ID: 110492
   Summary: Attempted optimization of switch statement pessimizes
it instead
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: qufanat at gmail dot com
  Target Milestone: ---

This happens on my local GCC 11.3, but you can also see it on 13.1 at this
godbolt link https://godbolt.org/z/G3qecWxPr

I'm creating a switch statement of hashed strings, which compiles to a binary
search on the hashes, all well and good.  However, with -O3 specified, GCC
peels back the last multiplication of the hash for some of the comparison
branches, but is unable to do it for others, resulting in longer assembly with
twice as many comparisons as is necessary.  Here is lines 15..19 in
get_choice_1()

(end of the hash loop)
imulesi, eax, 16777619
testdl, dl
jne .L3
(start of the switch)
cmp eax, 1954414351
je  .L8
cmp esi, 1901626525
ja  .L4

eax is the hash without the last imul, and esi is the final hash.  If we
prevent inlining of the hash function, the compiler can't make this
"optimization" and gives the assembly I expect.  Here is lines 90..93 in
get_choice_2()

callhash32_noinline(char const*)
cmp eax, 1901626525
je  .L33
jbe .L46

Now it only does one comparison per entry and uses it for both the == and <=
branches.

This isn't that important to my program but I thought you'd like to know.

[PATCH V2] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-29 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Hi, Richi and Richard.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

#include 
void
f (uint8_t *restrict a, 
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
a[i * step + base] = b[i * step + base];
}
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Thanks.

gcc/ChangeLog:

* doc/md.texi: Add LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.
* internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 17 +
 gcc/internal-fn.cc  | 32 ++--
 gcc/internal-fn.def |  8 ++--
 gcc/optabs.def  |  2 ++
 4 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9648fdc846a..b84aaab7075 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
@var{i}
 of the result should be loaded from memory and clear if element @var{i}
 of the result should be set to zero.
 
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
(operand 5)
+as well as a mask operand (operand 6). Similar to len_maskload, the 
instruction loads
+at most (operand 5) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
extra mask operand as
 operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be stored to memory.
 
+@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_scatter_store@var{m}@var{n}}
+Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand 
(operand 5)
+as well as a mask operand (operand 6). The instruction stores at most (operand 
5) elements
+of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9017176dc7a..e4b558e33d8 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
-  class expand_operand ops[6];
+  class expand_operand ops[7];
   int i = 0;
   create_address_operand ([i++], base_rtx);
   create_input_operand ([i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
@@ -3546,6 +3546,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   create_input_operand ([i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
   if (mask_index >= 0)
 {
+  if (optab == len_mask_scatter_store_optab)
+   {
+ tree len = gimple_call_arg (stmt, mask_index - 1);
+ rtx len_rtx = expand_normal (len);
+ create_convert_operand_from ([i++], len_rtx,
+  TYPE_MODE (TREE_TYPE (len)),
+  TYPE_UNSIGNED (TREE_TYPE (len)));
+   }
   tree mask = gimple_call_arg (stmt, mask_index);
   rtx mask_rtx = expand_normal (mask);
   create_input_operand ([i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
@@ -3572,7 +3580,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);

[Bug c/110491] Wrong code at -O2 on x86_64-linux-gnu (a recent regression)

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110491

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup of bug 110228.



  # RANGE [irange] int [0, 1] NONZERO 0x1
  # c_lsm.10_3 = PHI <_17(D)(2), _11(5)>
...
  if (c_lsm_flag.11_16 != 0)
goto ; [66.67%]
  else
goto ; [33.33%]
;;succ:   7 [33.3% (adjusted)]  count:357913944 (estimated locally)
(FALSE_VALUE,EXECUTABLE)
;;8 [66.7% (adjusted)]  count:715827880 (estimated locally)
(TRUE_VALUE,EXECUTABLE)

;;   basic block 7, loop depth 0, count 357913944 (estimated locally), maybe
hot
;;prev block 6, next block 8, flags: (NEW)
;;pred:   6 [33.3% (adjusted)]  count:357913944 (estimated locally)
(FALSE_VALUE,EXECUTABLE)
;;succ:   8 [always]  count:357913944 (estimated locally) (FALLTHRU)

;;   basic block 8, loop depth 0, count 1073741824 (estimated locally), maybe
hot
;;prev block 7, next block 1, flags: (NEW, REACHABLE, VISITED)
;;pred:   7 [always]  count:357913944 (estimated locally) (FALLTHRU)
;;6 [66.7% (adjusted)]  count:715827880 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
  # RANGE [irange] int [0, 1] NONZERO 0x1
  # prephitmp_27 = PHI <1(7), c_lsm.10_3(6)>

*** This bug has been marked as a duplicate of bug 110228 ***

[Bug middle-end/110228] [13/14 Regression] llvm-16 miscompiled due to an maybe uninitialized variable

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110228

--- Comment #19 from Andrew Pinski  ---
*** Bug 110491 has been marked as a duplicate of this bug. ***

[PATCH v1 4/6] LoongArch: Added Loongson ASX vector directive compilation framework.

2023-06-29 Thread Chenghui Pan

From: Lulu Cheng 

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Added compilation 
framework.
* config/loongarch/genopts/loongarch.opt.in: Ditto.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto.
* config/loongarch/loongarch-def.c: Ditto.
* config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto.
(ISA_EXT_SIMD_LASX): Ditto.
(N_SWITCH_TYPES): Ditto.
(SW_LASX): Ditto.
* config/loongarch/loongarch-driver.cc (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-opts.cc (isa_str): Ditto.
* config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto.
(ISA_HAS_LASX): Ditto.
* config/loongarch/loongarch-str.h (OPTSTR_LASX): Ditto.
* config/loongarch/loongarch.opt: Ditto.
---
 gcc/config/loongarch/genopts/loongarch-strings |  1 +
 gcc/config/loongarch/genopts/loongarch.opt.in  |  4 
 gcc/config/loongarch/loongarch-c.cc| 11 +++
 gcc/config/loongarch/loongarch-def.c   |  4 +++-
 gcc/config/loongarch/loongarch-def.h   |  6 --
 gcc/config/loongarch/loongarch-driver.cc   |  2 +-
 gcc/config/loongarch/loongarch-driver.h|  1 +
 gcc/config/loongarch/loongarch-opts.cc |  9 -
 gcc/config/loongarch/loongarch-opts.h  |  4 +++-
 gcc/config/loongarch/loongarch-str.h   |  1 +
 gcc/config/loongarch/loongarch.opt |  4 
 11 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index 24a5025061f..35d08f5967d 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -42,6 +42,7 @@ OPTSTR_DOUBLE_FLOAT   double-float
 
 # SIMD extensions
 OPTSTR_LSX lsx
+OPTSTR_LASXlasx
 
 # -mabi=
 OPTSTR_ABI_BASE  abi
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 7ea3f0dea5d..d1c2d2fef34 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -80,6 +80,10 @@ m@@OPTSTR_LSX@@
 Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@)
 Enable LoongArch SIMD Extension (LSX).
 
+m@@OPTSTR_LASX@@
+Target RejectNegative Var(la_opt_switches) Mask(LASX) 
Negative(m@@OPTSTR_LASX@@)
+Enable LoongArch Advanced SIMD Extension (LASX).
+
 ;; Base target models (implies ISA & tune parameters)
 Enum
 Name(cpu_type) Type(int)
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index b065921adc3..2747fb9e472 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -104,8 +104,19 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__loongarch_simd");
   builtin_define ("__loongarch_sx");
   builtin_define ("__loongarch_sx_width=128");
+
+  if (!ISA_HAS_LASX)
+   builtin_define ("__loongarch_simd_width=128");
 }
 
+  if (ISA_HAS_LASX)
+{
+  builtin_define ("__loongarch_asx");
+  builtin_define ("__loongarch_asx_width=256");
+  builtin_define ("__loongarch_simd_width=256");
+}
+
+
   /* Native Data Sizes.  */
   builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 28e24c62249..bff92c86532 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -54,7 +54,7 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = {
   [CPU_LA464] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
-  .simd = ISA_EXT_SIMD_LSX,
+  .simd = ISA_EXT_SIMD_LASX,
   },
 };
 
@@ -150,6 +150,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = {
   [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32,
   [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU,
   [ISA_EXT_SIMD_LSX] = OPTSTR_LSX,
+  [ISA_EXT_SIMD_LASX] = OPTSTR_LASX,
 };
 
 const char*
@@ -180,6 +181,7 @@ loongarch_switch_strings[] = {
   [SW_SINGLE_FLOAT]  = OPTSTR_SINGLE_FLOAT,
   [SW_DOUBLE_FLOAT]  = OPTSTR_DOUBLE_FLOAT,
   [SW_LSX]   = OPTSTR_LSX,
+  [SW_LASX]  = OPTSTR_LASX,
 };
 
 
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index f34cffcfb9b..0bbcdb03d22 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -64,7 +64,8 @@ extern const char* loongarch_isa_ext_strings[];
 #define ISA_EXT_FPU642
 #define N_ISA_EXT_FPU_TYPES   3
 #define ISA_EXT_SIMD_LSX  3
-#define N_ISA_EXT_TYPES  4
+#define ISA_EXT_SIMD_LASX 4
+#define N_ISA_EXT_TYPES  5
 
 /* enum abi_base */
 extern const char*

[PATCH v1 1/6] LoongArch: Added Loongson SX vector directive compilation framework.

2023-06-29 Thread Chenghui Pan

From: Lulu Cheng 

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Added compilation 
framework.
* config/loongarch/genopts/loongarch.opt.in: Ditto.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto.
* config/loongarch/loongarch-def.c: Ditto.
* config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto.
(ISA_EXT_SIMD_LSX): Ditto.
(N_SWITCH_TYPES): Ditto.
(SW_LSX): Ditto.
(struct loongarch_isa): Ditto.
* config/loongarch/loongarch-driver.cc (APPEND_SWITCH): Ditto.
(driver_get_normalized_m_opts): Ditto.
* config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): 
Ditto.
* config/loongarch/loongarch-opts.cc (loongarch_config_target): Ditto.
(isa_str): Ditto.
* config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto.
* config/loongarch/loongarch-str.h (OPTSTR_LSX): Ditto.
* config/loongarch/loongarch.opt: Ditto.
---
 .../loongarch/genopts/loongarch-strings   |  3 +
 gcc/config/loongarch/genopts/loongarch.opt.in | 12 ++-
 gcc/config/loongarch/loongarch-c.cc   |  7 ++
 gcc/config/loongarch/loongarch-def.c  |  4 +
 gcc/config/loongarch/loongarch-def.h  |  7 +-
 gcc/config/loongarch/loongarch-driver.cc  | 10 +++
 gcc/config/loongarch/loongarch-driver.h   |  1 +
 gcc/config/loongarch/loongarch-opts.cc| 82 ++-
 gcc/config/loongarch/loongarch-opts.h |  1 +
 gcc/config/loongarch/loongarch-str.h  |  2 +
 gcc/config/loongarch/loongarch.opt| 12 ++-
 11 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index a40998ead97..24a5025061f 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -40,6 +40,9 @@ OPTSTR_SOFT_FLOAT soft-float
 OPTSTR_SINGLE_FLOAT   single-float
 OPTSTR_DOUBLE_FLOAT   double-float
 
+# SIMD extensions
+OPTSTR_LSX lsx
+
 # -mabi=
 OPTSTR_ABI_BASE  abi
 STR_ABI_BASE_LP64Dlp64d
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 4b9b4ac273e..7ea3f0dea5d 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -76,6 +76,9 @@ m@@OPTSTR_DOUBLE_FLOAT@@
 Target Driver RejectNegative Var(la_opt_switches) Mask(FORCE_F64) 
Negative(m@@OPTSTR_SOFT_FLOAT@@)
 Allow hardware floating-point instructions to cover both 32-bit and 64-bit 
operations.
 
+m@@OPTSTR_LSX@@
+Target RejectNegative Var(la_opt_switches) Mask(LSX) Negative(m@@OPTSTR_LSX@@)
+Enable LoongArch SIMD Extension (LSX).
 
 ;; Base target models (implies ISA & tune parameters)
 Enum
@@ -125,11 +128,18 @@ Target RejectNegative Joined ToLower Enum(abi_base) 
Var(la_opt_abi_base) Init(M_
 Variable
 int la_opt_abi_ext = M_OPTION_NOT_SEEN
 
-
 mbranch-cost=
 Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
 -mbranch-cost=COST Set the cost of branches to roughly COST instructions.
 
+mvecarg
+Target Var(TARGET_VECARG) Init(1)
+Target pass vect arg uses vector register.
+
+mmemvec-cost=
+Target RejectNegative Joined UInteger Var(loongarch_vector_access_cost) 
IntegerRange(1, 5)
+mmemvec-cost=COST  Set the cost of vector memory access instructions.
+
 mcheck-zero-division
 Target Mask(CHECK_ZERO_DIV)
 Trap on integer divide by zero.
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 67911b78f28..b065921adc3 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -99,6 +99,13 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   else
 builtin_define ("__loongarch_frlen=0");
 
+  if (ISA_HAS_LSX)
+{
+  builtin_define ("__loongarch_simd");
+  builtin_define ("__loongarch_sx");
+  builtin_define ("__loongarch_sx_width=128");
+}
+
   /* Native Data Sizes.  */
   builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 6729c857f7c..28e24c62249 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -49,10 +49,12 @@ loongarch_cpu_default_isa[N_ARCH_TYPES] = {
   [CPU_LOONGARCH64] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
+  .simd = 0,
   },
   [CPU_LA464] = {
   .base = ISA_BASE_LA64V100,
   .fpu = ISA_EXT_FPU64,
+  .simd = ISA_EXT_SIMD_LSX,
   },
 };
 
@@ -147,6 +149,7 @@ loongarch_isa_ext_strings[N_ISA_EXT_TYPES] = {
   [ISA_EXT_FPU64] = STR_ISA_EXT_FPU64,
   [ISA_EXT_FPU32] = STR_ISA_EXT_FPU32,
   [ISA_EXT_NOFPU] = STR_ISA_EXT_NOFPU,
+  [ISA_EXT_SIMD_LSX] = OPTSTR_LSX,
 };
 
 const char*
@@ -176,6 +179,7 @@

[PATCH v1 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-06-29 Thread Chenghui Pan

These patches add the Loongson SX/ASX instruction support to the LoongArch
target, and can be utilized by using the new "-mlsx" and
"-mlasx" option.

Patches are bootstrapped and tested on loongarch64-linux-gnu target.

Lulu Cheng (6):
  LoongArch: Added Loongson SX vector directive compilation framework.
  LoongArch: Added Loongson SX base instruction support.
  LoongArch: Added Loongson SX directive builtin function support.
  LoongArch: Added Loongson ASX vector directive compilation framework.
  LoongArch: Added Loongson ASX base instruction support.
  LoongArch: Added Loongson ASX directive builtin function support.

 gcc/config.gcc|2 +-
 gcc/config/loongarch/constraints.md   |  128 +-
 .../loongarch/genopts/loongarch-strings   |4 +
 gcc/config/loongarch/genopts/loongarch.opt.in |   16 +-
 gcc/config/loongarch/lasx.md  | 5147 
 gcc/config/loongarch/lasxintrin.h | 5342 +
 gcc/config/loongarch/loongarch-builtins.cc| 2686 -
 gcc/config/loongarch/loongarch-c.cc   |   18 +
 gcc/config/loongarch/loongarch-def.c  |6 +
 gcc/config/loongarch/loongarch-def.h  |9 +-
 gcc/config/loongarch/loongarch-driver.cc  |   10 +
 gcc/config/loongarch/loongarch-driver.h   |2 +
 gcc/config/loongarch/loongarch-ftypes.def |  666 +-
 gcc/config/loongarch/loongarch-modes.def  |   39 +
 gcc/config/loongarch/loongarch-opts.cc|   89 +-
 gcc/config/loongarch/loongarch-opts.h |3 +
 gcc/config/loongarch/loongarch-protos.h   |   35 +
 gcc/config/loongarch/loongarch-str.h  |3 +
 gcc/config/loongarch/loongarch.cc | 4615 +-
 gcc/config/loongarch/loongarch.h  |  117 +-
 gcc/config/loongarch/loongarch.md |   56 +-
 gcc/config/loongarch/loongarch.opt|   16 +-
 gcc/config/loongarch/lsx.md   | 4490 ++
 gcc/config/loongarch/lsxintrin.h  | 5181 
 gcc/config/loongarch/predicates.md|  333 +-
 25 files changed, 28723 insertions(+), 290 deletions(-)
 create mode 100644 gcc/config/loongarch/lasx.md
 create mode 100644 gcc/config/loongarch/lasxintrin.h
 create mode 100644 gcc/config/loongarch/lsx.md
 create mode 100644 gcc/config/loongarch/lsxintrin.h

-- 
2.36.0

Re: Re: [PATCH] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-29 Thread juzhe.zh...@rivai.ai

Thanks for the comments.

I will fix doc's description as you suggested.

I personally prefer **NOT** to include BIAS in the gather/scatter since I don't 
known how it will be used.

Let's wait for Richi or Richard more comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-29 23:04
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; richard.sandiford; rguenther
Subject: Re: [PATCH] Machine Description: Add LEN_MASK_{GATHER_LOAD, 
SCATTER_STORE} pattern
Hi Juzhe,
 
just looking at the documentation changes.
 
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra len operand
> +as operand 5 and an extra mask operand as operand 6.  Bit @var{i} of
> +the mask is set and i < len if element @var{i} of the result should be
> +loaded from memory.  Element @var{i} of the result should be undefined
> +value when either Bit @var{i} of the mask is clear or i >= len.
> +
 
I would suggest to rephrase this slightly as:
 
"Like ... but takes an extra length operand (operand 5) as well as a
mask operand (operand 6).  Similar to len_maskload, the instruction
loads at most (operand 5) elements from memory.  
Bit @var{i} of the mask is set if element @var{i} of the result should
be loaded from memory and clear if element @var{i} of the result
should be undefined.  Mask elements @var{i} with i > (operand 5) are
ignored."
 
to make it more similar to mask_gather_load.  Further improvements
welcome though - not sure if we should refer to len_maskload at all
because it has a bias operand and mask_gather_load doesn't.
 
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra len operand as
> +operand 5 and an extra mask operand as operand 6.  Bit @var{i} of the mask
> +is set and i < len if element @var{i} of the result should be stored to 
> memory.
> +
"an extra length operand (operand 5)... The instruction stores
at most (operand 5) elements of (operand 4) to memory.
Bit @var{i} of the mask is set if element @var{i} of (operand 4)
should be stored.  Mask elements @var{i} with i > (operand 5) are
ignored."
 
Regards
Robin

RE: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread Li, Pan2 via Gcc-patches

That’s very cool, thanks Thomas for help! Let’s wait the AMD test running 
result for the final version of the patch.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, June 30, 2023 9:27 AM
To: Thomas Schwinge ; Li, Pan2 ; 
gcc-patches ; rguenther ; jakub 

Cc: Robin Dapp ; jeffreyalaw ; 
Wang, Yanzhang ; kito.cheng ; 
Tobias Burnus 
Subject: Re: Re: [PATCH v3] Streamer: Fix out of range memory access of machine 
mode

Thanks a lot!

Really appreciate your help ! That's really helpful for RVV (RISC-V vector).
Could you merge your patch after you tested?

Thanks.

juzhe.zh...@rivai.ai

From: Thomas Schwinge
Date: 2023-06-30 04:14
To: Pan Li; 
gcc-patches@gcc.gnu.org; Richard 
Biener; Jakub Jelinek
CC: juzhe.zh...@rivai.ai; 
rdapp@gmail.com; 
jeffreya...@gmail.com; 
yanzhang.w...@intel.com; 
kito.ch...@gmail.com; Tobias 
Burnus
Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode
Hi!

On 2023-06-29T11:29:57+0200, I wrote:
> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches 
> mailto:gcc-patches@gcc.gnu.org>> wrote:
>> We extend the machine mode from 8 to 16 bits already. But there still
>> one placing missing from the streamer. It has one hard coded array
>> for the machine code like size 256.
>>
>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>
>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>> lto_output_init_mode_table will touch the memory out of range unexpected.
>
> Uh.  :-O
>
>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>> array in streamer, to make sure there is no potential unexpected
>> memory access in future. Meanwhile, this patch also adjust some place
>> which has MAX_MACHINE_MODE <= 256 assumption.
>
> Thanks to Jakub and Richard for guidance re the offloading compilation
> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
> and stream-in, and a modes mapping table.
>
> However, with this patch, there are ICEs all over the place...  I'm
> having a look.

Your patch has all the right ideas, there are just a few additional
changes necessary.  Please merge in the attached
"f into Streamer: Fix out of range memory access of machine mode", with
'Co-authored-by: Thomas Schwinge 
mailto:tho...@codesourcery.com>>'.  This has
already survived compiler-side 'lto.exp' testing and
'check-target-libgomp' with Nvidia GPU offloading; AMD GPU testing is now
running (not expecting any bad surprises).  Will let you know by (my)
tomorrow morning in case there are any more problems.

Explanation:

>> --- a/gcc/lto-streamer-in.cc
>> +++ b/gcc/lto-streamer-in.cc
>> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>  internal_error ("cannot read LTO mode table from %s",
>>   file_data->file_name);
>>
>> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
>> -  file_data->mode_table = table;
>>const struct lto_simple_header_with_strings *header
>>  = (const struct lto_simple_header_with_strings *) data;
>>int string_offset;
>> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>   header->string_size, vNULL);
>>bitpack_d bp = streamer_read_bitpack ();
>>
>> +  unsigned mode_bits = bp_unpack_value (, 5);
>> +  unsigned char *table = ggc_cleared_vec_alloc (1 << 
>> mode_bits);
>> +
>> +  file_data->mode_table = table;
>> +  file_data->mode_bits = mode_bits;

Here, we set 'file_data->mode_bits' for the offloading case (where
'lto_input_mode_table' is called) -- but it's not set for the
non-offloading case (where 'lto_input_mode_table' isn't called).  (See my
'gcc/lto/lto-common.cc:lto_read_decls' change.)  That's "not currently a
problem", as 'file_data->mode_bits' isn't used anywhere...

>> --- a/gcc/lto-streamer.h
>> +++ b/gcc/lto-streamer.h
>> @@ -604,6 +604,8 @@ struct GTY(()) lto_file_decl_data
>>int order_base;
>>
>>int unit_base;
>> +
>> +  unsigned mode_bits;
>>  };

>>  inline machine_mode
>>  bp_unpack_machine_mode (struct bitpack_d *bp)
>>  {
>> -  return (machine_mode)
>> -((class lto_input_block *)
>> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +  lto_input_block *input_block = (class lto_input_block *) bp->stream;
>> +  int index = bp_unpack_enum (bp, machine_mode, last);
>> +
>> +  return (machine_mode)

[Bug c/110491] New: Wrong code at -O2 on x86_64-linux-gnu (a recent regression)

2023-06-29 Thread shaohua.li at inf dot ethz.ch via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110491

Bug ID: 110491
   Summary: Wrong code at -O2 on x86_64-linux-gnu (a recent
regression)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: shaohua.li at inf dot ethz.ch
  Target Milestone: ---

gcc at -O2 produced wrong code.

Compiler explorer: https://godbolt.org/z/cbM1q13G7

$ cat a.c
int printf(const char *, ...);
int a, c, d, e;
short b;
void f(int *g) { c &= *g; }
void h(void);
void i() {
  a = 1;
  h();
  f();
}
void h() {
  int *j = 
  *j = 5;
k:
  for (; 4 + b <= 0;)
;
  for (; d;) {
c = e == 0;
goto k;
  }
}
int main() {
  i();
  printf("%d\n", c);
}
$
$ gcc-tk -O0 a.c &&./a.out
1
$ gcc-tk -O2 a.c && ./a.out
-2003318871
$
$ gcc-tk -v
Using built-in specs.
COLLECT_GCC=gcc-tk
COLLECT_LTO_WRAPPER=/zdata/shaoli/compilers/ccbuilder-compilers/gcc-5f590ee3174cf6058ac882c3a84a96ae639349c8/libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --disable-multilib --disable-bootstrap
--enable-languages=c,c++
--prefix=/zdata/shaoli/compilers/ccbuilder-compilers/gcc-5f590ee3174cf6058ac882c3a84a96ae639349c8
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20230629 (experimental) (GCC)
$

Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread juzhe.zh...@rivai.ai

>> I've triple checked this already.
You mean you still didn't see vfwmul.vv ?

That's odd. Let's wait for kito or Robin test this patch.
Then, I believe they will know what I am saying.

>> I would strongly suggest looking at a dependency height reduction
>> pattern if you want to optimize that code further.
I did it long time ago. Turns out it's better to do that on Combine PASS in 
both GCC and LLVM.

Never mind, I always have this implementation in my downstream and won't affect 
my downstream GCC maintainment.
It's ok that this patch is not approved since I can get the perfect codegen in 
my downstream. 

Thanks.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2023-06-30 09:26
To: juzhe.zh...@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; palmer; palmer; Robin Dapp
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

On 6/29/23 19:14, juzhe.zh...@rivai.ai wrote:
> No, reduction patterns won't help.
> As I said in vfwmul patch. You should make sure your environment is 
> working then try again.
I've triple checked this already.

I checked it again and your patch does not impact behavior, nor should 
it.   I checked it on top of these trunk commits:

14bfda6084eaca07c842566a34316974907958e2
e714af12e3bee0032d8d226f87d92c9bc46f0269

I checked it with the code from the godbolt links you suggested with the 
options shown in those links.

More importantly, your explanation of what the pattern is supposed to do 
shows a misunderstanding of what combine's capabilities actually are.  A 
bridge or intermediate pattern is not needed here, combine can 
substitute multiple sources in combination attempts as can be clearly 
seen from the dump fragments I posted.

The only reason I didn't reject the patch at the outset was the 
possibility that maybe we were trying to combine more than 4 
instructions or that possibility something about the number of operands, 
unspecs, whatever were getting in the way.

This patch is not needed and does not affect code generation.

I would strongly suggest looking at a dependency height reduction 
pattern if you want to optimize that code further.

Jeff

Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/29/23 19:14, juzhe.zh...@rivai.ai wrote:

No, reduction patterns won't help.
As I said in vfwmul patch. You should make sure your environment is 
working then try again.

I've triple checked this already.

I checked it again and your patch does not impact behavior, nor should 
it.   I checked it on top of these trunk commits:


14bfda6084eaca07c842566a34316974907958e2
e714af12e3bee0032d8d226f87d92c9bc46f0269

I checked it with the code from the godbolt links you suggested with the 
options shown in those links.


More importantly, your explanation of what the pattern is supposed to do 
shows a misunderstanding of what combine's capabilities actually are.  A 
bridge or intermediate pattern is not needed here, combine can 
substitute multiple sources in combination attempts as can be clearly 
seen from the dump fragments I posted.


The only reason I didn't reject the patch at the outset was the 
possibility that maybe we were trying to combine more than 4 
instructions or that possibility something about the number of operands, 
unspecs, whatever were getting in the way.


This patch is not needed and does not affect code generation.

I would strongly suggest looking at a dependency height reduction 
pattern if you want to optimize that code further.


Jeff

Re: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread juzhe.zh...@rivai.ai

Thanks a lot!

Really appreciate your help ! That's really helpful for RVV (RISC-V vector).
Could you merge your patch after you tested?

Thanks.

juzhe.zh...@rivai.ai

From: Thomas Schwinge
Date: 2023-06-30 04:14
To: Pan Li; gcc-patches@gcc.gnu.org; Richard Biener; Jakub Jelinek
CC: juzhe.zh...@rivai.ai; rdapp@gmail.com; jeffreya...@gmail.com; 
yanzhang.w...@intel.com; kito.ch...@gmail.com; Tobias Burnus
Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode
Hi!

On 2023-06-29T11:29:57+0200, I wrote:
> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches  
> wrote:
>> We extend the machine mode from 8 to 16 bits already. But there still
>> one placing missing from the streamer. It has one hard coded array
>> for the machine code like size 256.
>>
>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>
>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>> lto_output_init_mode_table will touch the memory out of range unexpected.
>
> Uh.  :-O
>
>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>> array in streamer, to make sure there is no potential unexpected
>> memory access in future. Meanwhile, this patch also adjust some place
>> which has MAX_MACHINE_MODE <= 256 assumption.
>
> Thanks to Jakub and Richard for guidance re the offloading compilation
> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
> and stream-in, and a modes mapping table.
>
> However, with this patch, there are ICEs all over the place...  I'm
> having a look.

Your patch has all the right ideas, there are just a few additional
changes necessary.  Please merge in the attached
"f into Streamer: Fix out of range memory access of machine mode", with
'Co-authored-by: Thomas Schwinge '.  This has
already survived compiler-side 'lto.exp' testing and
'check-target-libgomp' with Nvidia GPU offloading; AMD GPU testing is now
running (not expecting any bad surprises).  Will let you know by (my)
tomorrow morning in case there are any more problems.

Explanation:

>> --- a/gcc/lto-streamer-in.cc
>> +++ b/gcc/lto-streamer-in.cc
>> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>  internal_error ("cannot read LTO mode table from %s",
>>   file_data->file_name);
>>
>> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
>> -  file_data->mode_table = table;
>>const struct lto_simple_header_with_strings *header
>>  = (const struct lto_simple_header_with_strings *) data;
>>int string_offset;
>> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>   header->string_size, vNULL);
>>bitpack_d bp = streamer_read_bitpack ();
>>
>> +  unsigned mode_bits = bp_unpack_value (, 5);
>> +  unsigned char *table = ggc_cleared_vec_alloc (1 << 
>> mode_bits);
>> +
>> +  file_data->mode_table = table;
>> +  file_data->mode_bits = mode_bits;

Here, we set 'file_data->mode_bits' for the offloading case (where
'lto_input_mode_table' is called) -- but it's not set for the
non-offloading case (where 'lto_input_mode_table' isn't called).  (See my
'gcc/lto/lto-common.cc:lto_read_decls' change.)  That's "not currently a
problem", as 'file_data->mode_bits' isn't used anywhere...

>> --- a/gcc/lto-streamer.h
>> +++ b/gcc/lto-streamer.h
>> @@ -604,6 +604,8 @@ struct GTY(()) lto_file_decl_data
>>int order_base;
>>
>>int unit_base;
>> +
>> +  unsigned mode_bits;
>>  };

>>  inline machine_mode
>>  bp_unpack_machine_mode (struct bitpack_d *bp)
>>  {
>> -  return (machine_mode)
>> -((class lto_input_block *)
>> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +  lto_input_block *input_block = (class lto_input_block *) bp->stream;
>> +  int index = bp_unpack_enum (bp, machine_mode, last);
>> +
>> +  return (machine_mode) input_block->mode_table[index];
>>  }

..., but 'file_data->mode_bits' needs to be considered here, in the
stream-in for offloading, where 'file_data->mode_bits' -- that is, the
host 'MAX_MACHINE_MODE' -- very likely is different from the offload
device 'MAX_MACHINE_MODE'.

Easiest is in 'gcc/lto-streamer.h:class lto_input_block' to capture
'lto_file_decl_data *file_data' instead of just
'unsigned char *mode_table', and adjust all users.

That's it.  :-)

>> --- a/gcc/tree-streamer.h
>> +++ b/gcc/tree-streamer.h

>> @@ -108,15 +108,19 @@ inline void
>>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>>  {
>>streamer_mode_table[mode] = 1;
>> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +
>> +  bp_pack_enum (bp, machine_mode, last, mode);
>>  }

That use of 'MAX_MACHINE_MODE'

Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread juzhe.zh...@rivai.ai

No, reduction patterns won't help. 
As I said in vfwmul patch. You should make sure your environment is working 
then try again.

Thanks.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 07:43
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
 
 
On 6/28/23 16:56, 钟居哲 wrote:
> 
> 
> 
> juzhe.zh...@rivai.ai
> 
> *From:* Jeff Law 
> *Date:* 2023-06-29 06:43
> *To:* 钟居哲 ; gcc-patches
> 
> *CC:* kito.cheng ; kito.cheng
> ; palmer ;
> palmer ; rdapp.gcc
> 
> *Subject:* Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac
> combine lowering
> On 6/28/23 16:10, 钟居哲 wrot
>  > Sure.
>  >
>  > https://godbolt.org/z/8857KzTno 
>  >
>  > Failed to match this instruction:
>  > (set (reg:VNx2DF 134 [ vect__31.47 ])
>  >  (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
>  > vect__28.44 ])))
>  >  (reg:VNx2DF 150 [ vect__8.12 ])
>  >  (reg:VNx2DF 171 [ vect__29.45 ])))
> Please attach the full dump.  I would expect to see additional attempts
> with more operands replaced.
THanks for the dump.  I think this fundamentally the same issue as the 
widening problem.
 
Drop those intermediate patterns.  They're not needed/helpful.  You may 
need a dependency height reduction pattern to get the code you want, but 
I see no evidence those extra patterns will solve anything.
 
jeff

Re: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread juzhe.zh...@rivai.ai

Hi, Jeff.

That's odd. I think maybe you should first clean up your environment ?
Or you didn't build up the toolchain correctly with this patch?

Compile option: --param=riscv-autovec-preference=scalable -O3 -ffast-math
Before this patch:
https://godbolt.org/z/Y5d44WMqs 

fail.s:

lw t5,0(sp)
ble t5,zero,.L5
.L3:
vsetvli t1,t5,e32,mf2,ta,ma
vle32.v v2,0(a4)
vle32.v v1,0(a5)
vsetvli t0,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v3,v2
vfwcvt.f.f.v v2,v1
vsetvli zero,t1,e32,mf2,ta,ma
vle32.v v5,0(a6)
vle32.v v4,0(a7)
vsetvli t0,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v1,v5
vsetvli zero,zero,e64,m1,ta,ma
vfmul.vv v5,v2,v3
vfmul.vv v2,v1,v2
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v2,0(a1)
vse64.v v5,0(a0)
vsetvli t6,zero,e64,m1,ta,ma
vfmul.vv v1,v1,v3
vsetvli zero,zero,e32,mf2,ta,ma
vfwcvt.f.f.v v2,v4
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v1,0(a2)
vsetvli t6,zero,e64,m1,ta,ma
slli t4,t1,2
slli t3,t1,3
vfmul.vv v1,v2,v3
sub t5,t5,t1
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v1,0(a3)
add a4,a4,t4
add a5,a5,t4
add a0,a0,t3
add a6,a6,t4
add a1,a1,t3
add a2,a2,t3
add a7,a7,t4
add a3,a3,t3
bne t5,zero,.L3
.L5:
ret

After this patch:
pass.s:

lw t5,0(sp)
ble t5,zero,.L5
.L3:
vsetvli t1,t5,e32,mf2,ta,ma
vle32.v v1,0(a4)
vle32.v v3,0(a5)
vle32.v v2,0(a6)
vle32.v v4,0(a7)
vsetvli t6,zero,e32,mf2,ta,ma
vfwmul.vv v5,v3,v2
vfwmul.vv v6,v1,v3
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v6,0(a0)
vse64.v v5,0(a1)
vsetvli t6,zero,e32,mf2,ta,ma
slli t4,t1,2
slli t3,t1,3
vfwmul.vv v3,v2,v1
sub t5,t5,t1
vfwmul.vv v2,v1,v4
vsetvli zero,t1,e64,m1,ta,ma
vse64.v v3,0(a2)
vse64.v v2,0(a3)
add a4,a4,t4
add a5,a5,t4
add a0,a0,t3
add a6,a6,t4
add a1,a1,t3
add a2,a2,t3
add a7,a7,t4
add a3,a3,t3
bne t5,zero,.L3
.L5:
ret

It's very obvious the codegen with this patch is perfect.

I have attached the .S in this patch.

I am not claiming that this patch solution is the only solution.

I am welcome you can provide another solution as long as you can make this 
codegen become the perfect codegen that this patch achieved.

I think maybe you should make sure you are using the correct toolchain that 
built with patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 07:48
To: juzhe.zhong
CC: gcc-patches; kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/29/23 17:46, juzhe.zhong wrote:
> You should try the example check the codegen before and after the patch. 
> You will understand it.
I've already done that.  It makes _no_ difference on the godbold example.
 
Jeff
 


fail.s
Description: Binary data


pass.s
Description: Binary data

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/29/23 17:46, juzhe.zhong wrote:
You should try the example check the codegen before and after the patch. 
You will understand it.

I've already done that.  It makes _no_ difference on the godbold example.

Jeff

Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/28/23 16:56, 钟居哲 wrote:




juzhe.zh...@rivai.ai

*From:* Jeff Law 
*Date:* 2023-06-29 06:43
*To:* 钟居哲 ; gcc-patches

*CC:* kito.cheng ; kito.cheng
; palmer ;
palmer ; rdapp.gcc

*Subject:* Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac
combine lowering
On 6/28/23 16:10, 钟居哲 wrot
 > Sure.
 >
 > https://godbolt.org/z/8857KzTno 
 >
 > Failed to match this instruction:
 > (set (reg:VNx2DF 134 [ vect__31.47 ])
 >      (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
 > vect__28.44 ])))
 >          (reg:VNx2DF 150 [ vect__8.12 ])
 >          (reg:VNx2DF 171 [ vect__29.45 ])))
Please attach the full dump.  I would expect to see additional attempts
with more operands replaced.
THanks for the dump.  I think this fundamentally the same issue as the 
widening problem.


Drop those intermediate patterns.  They're not needed/helpful.  You may 
need a dependency height reduction pattern to get the code you want, but 
I see no evidence those extra patterns will solve anything.


jeff

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/28/23 16:00, 钟居哲 wrote:

You can see here:

https://godbolt.org/z/d78646hWb 
You patch doesn't help that code and your patch is a result of 
fundamentally misunderstanding combine's capabilities AFAICT.


Jeff

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/28/23 16:00, 钟居哲 wrote:

You can see here:

https://godbolt.org/z/d78646hWb 
So just to be explicit, I see no difference with that test before/after 
your proposed change.  Nor would I expect one based on my understanding 
of the patch.


The explicit conversions I see are because we need the output of the 
conversion in multiple vfmul instructions.  That won't be helped by the 
patch you've proposed.


To be more concrete:


   vsetvli t1,t5,e32,mf2,ta,ma # 99[c=0 l=4]  vsetvldi
vle32.v v2,0(a4)# 23[c=4 l=4]  pred_movvnx2sf/1
vle32.v v1,0(a5)# 25[c=4 l=4]  pred_movvnx2sf/1
vsetvli t0,zero,e32,mf2,ta,ma   # 101   [c=0 l=4]  vsetvldi
vfwcvt.f.f.vv3,v2   # 77[c=4 l=4]  pred_extendvnx2df/0
vfwcvt.f.f.vv2,v1   # 79[c=4 l=4]  pred_extendvnx2df/0
vsetvli zero,t1,e32,mf2,ta,ma   # 102   [c=0 l=4]  
vsetvl_discard_resultdi
vle32.v v5,0(a6)# 31[c=4 l=4]  pred_movvnx2sf/1
vle32.v v4,0(a7)# 39[c=4 l=4]  pred_movvnx2sf/1
vsetvli t0,zero,e32,mf2,ta,ma   # 103   [c=0 l=4]  vsetvldi
vfwcvt.f.f.vv1,v5   # 81[c=4 l=4]  pred_extendvnx2df/0
vsetvli zero,zero,e64,m1,ta,ma  # 104   [c=16 l=4]  
vsetvl_vtype_change_only
vfmul.vvv5,v2,v3# 29[c=4 l=4]  pred_mulvnx2df/2
vfmul.vvv2,v1,v2# 34[c=4 l=4]  pred_mulvnx2df/2
vsetvli zero,t1,e64,m1,ta,ma# 105   [c=0 l=4]  
vsetvl_discard_resultdi
vse64.v v2,0(a1)# 35[c=4 l=4]  pred_storevnx2df
vse64.v v5,0(a0)# 30[c=4 l=4]  pred_storevnx2df
vsetvli t6,zero,e64,m1,ta,ma# 106   [c=0 l=4]  vsetvldi
vfmul.vvv1,v1,v3# 37[c=4 l=4]  pred_mulvnx2df/2
vsetvli zero,zero,e32,mf2,ta,ma # 107   [c=20 l=4]  
vsetvl_vtype_change_only
vfwcvt.f.f.vv2,v4   # 83[c=4 l=4]  pred_extendvnx2df/0
vsetvli zero,t1,e64,m1,ta,ma# 108   [c=0 l=4]  
vsetvl_discard_resultdi
vse64.v v1,0(a2)# 38[c=4 l=4]  pred_storevnx2df
vsetvli t6,zero,e64,m1,ta,ma# 109   [c=0 l=4]  vsetvldi
sllit4,t1,2 # 22[c=4 l=4]  ashldi3
sllit3,t1,3 # 27[c=4 l=4]  ashldi3
vfmul.vvv1,v2,v3# 42[c=4 l=4]  pred_mulvnx2df/2



Note how the output of the explicit conversion done in insn 77 is used 
by the vfmul in insns 29, 37 and 42.  Similarly for the other explcit 
conversions.


Your pattern isn't going to help that problem.

You could model this as a dependency height reduction.  I think that 
will get you were you want to go.


You'll need a pattern that matches this:

(parallel [ 
(set (reg:VNx2DF 160 [ vect__11.15 ])

(if_then_else:VNx2DF (unspec:VNx2BI [
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])  
(reg:DI 169)

(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1]) 
(const_int 7 [0x7])

(reg:SI 66 vl)
(reg:SI 67 vtype)
(reg:SI 69 frm)
] UNSPEC_VPREDICATE)
(mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 
]))
(float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
(unspec:VNx2DF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
(set (reg:VNx2DF 143 [ vect__8.14 ])
(float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 ])))
(set (reg:VNx2DF 145 [ vect__5.10 ])
(float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
])


It'll need to be a define_insn_and_split as its a 3->3 splitter.  The 
split will emit the two extensions and the widening multiply as 3 
distinct insns.


This has two positive effects.  First the widening multiply is no longer 
data dependent on the float_extend and so it can issue when ever r144 
and r146 are ready rather than when r143 and r145 are ready.


The second effect is I think this pattern will end up matching all the 
multiplies in this sample code.  As a result all the float_extend insns 
you generated when splitting become dead and should be removed by DCE.



Jeff

[Bug analyzer/110198] [14 regression] g++.dg/analyzer/pr100244.C fails after r14-1632-g9589a46ddadc8b

2023-06-29 Thread vultkayn at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110198

Benjamin Priour  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from Benjamin Priour  ---
Finally fixed as patch
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1eb90f46c16453f72dc119ba20b07053a15b452d

[Bug analyzer/110198] [14 regression] g++.dg/analyzer/pr100244.C fails after r14-1632-g9589a46ddadc8b

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110198

--- Comment #6 from CVS Commits  ---
The trunk branch has been updated by Benjamin Priour :

https://gcc.gnu.org/g:1eb90f46c16453f72dc119ba20b07053a15b452d

commit r14-2203-g1eb90f46c16453f72dc119ba20b07053a15b452d
Author: benjamin priour 
Date:   Thu Jun 22 21:39:05 2023 +0200

analyzer: Fix regression bug after r14-1632-g9589a46ddadc8b [PR110198]

g++.dg/analyzer/PR100244.C was failing after a patch of PR109439.
The reason was a spurious preemptive return of get_store_value upon
out-of-bounds read that was preventing further checks. Now instead,
a boolean value check_poisoned goes to false when a OOB is detected,
and is later on given to get_or_create_initial_value.

gcc/analyzer/ChangeLog:
PR analyzer/110198
* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Take an
optional boolean value to bypass poisoning checks
* region-model-manager.h: Update declaration of the above function.
* region-model.cc (region_model::get_store_value): No longer
returns
on OOB, but rather gives a boolean to get_or_create_initial_value.
(region_model::check_region_access): Update docstring.
(region_model::check_region_for_write): Update docstring.

Signed-off-by: benjamin priour

[Bug middle-end/110443] [14 Regression] ICE on a52dec-0.7.4: GIMPLE pass: vect SIGSEGV in vect_get_gather_scatter_ops()

2023-06-29 Thread seurer at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110443

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|FIXED   |---
 CC||seurer at gcc dot gnu.org
 Status|RESOLVED|REOPENED

--- Comment #9 from seurer at gcc dot gnu.org ---
I am still seeing this failure with r14-2201-g94c71750cdd742 

This is compiling one of the spec test cases on a powerpc BE machine.

/home/seurer/gcc/git/install/gcc-test/bin/gfortran -c -o
module_bc_time_utilities.fppized.o -I. -I./netcdf/include -m64 -O3 -mcpu=power7
-fpeel-loops -funroll-loops -ffast-math -mpopcntd -mrecip=rsqrt
-fvect-cost-model -std=legacy module_bc_time_utilities.fppized.f90
0x10cceb2b crash_signal
/home/seurer/gcc/git/gcc-test/gcc/toplev.cc:314
0x11c8596c vect_supportable_dr_alignment(vec_info*, dr_vec_info*, tree_node*,
int)
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-data-refs.cc:6817
0x11ca16c3 get_group_load_store_type
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-stmts.cc:2445
0x11ca16c3 get_load_store_type
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-stmts.cc:2564
0x11cb498f vectorizable_load
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-stmts.cc:9547
0x11cc7e5b vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*, _slp_tree*,
_slp_instance*, vec*)
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-stmts.cc:11910
0x110c1833 vect_slp_analyze_node_operations_1
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-slp.cc:5993
0x110c1833 vect_slp_analyze_node_operations
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-slp.cc:6191
0x110c1757 vect_slp_analyze_node_operations
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-slp.cc:6170
0x110c1757 vect_slp_analyze_node_operations
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-slp.cc:6170
0x110c1757 vect_slp_analyze_node_operations
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-slp.cc:6170
0x110c1757 vect_slp_analyze_node_operations
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-slp.cc:6170
0x110c3b63 vect_slp_analyze_operations(vec_info*)
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-slp.cc:6431
0x1108a10b vect_analyze_loop_2
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-loop.cc:2782
0x1108c73b vect_analyze_loop_1
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-loop.cc:3303
0x1108d093 vect_analyze_loop(loop*, vec_info_shared*)
/home/seurer/gcc/git/gcc-test/gcc/tree-vect-loop.cc:3457
0x110e2507 try_vectorize_loop_1
/home/seurer/gcc/git/gcc-test/gcc/tree-vectorizer.cc:1064
0x110e2507 try_vectorize_loop
/home/seurer/gcc/git/gcc-test/gcc/tree-vectorizer.cc:1180
0x110e2d43 execute
/home/seurer/gcc/git/gcc-test/gcc/tree-vectorizer.cc:1296

I also traced this back to r14-2117-gdd86a5a69cbda4

[Bug tree-optimization/110487] [12/13/14 Regression] invalid wide Boolean value

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110487

--- Comment #3 from Andrew Pinski  ---
/* Optimize
   # x_5 in range [cst1, cst2] where cst2 = cst1 + 1
   x_5 ? cstN ? cst4 : cst3
   # op is == or != and N is 1 or 2
   to r_6 = x_5 + (min (cst3, cst4) - cst1) or
   r_6 = (min (cst3, cst4) + cst1) - x_5 depending on op, N and which
   of cst3 and cst4 is smaller.
   This was originally done by two_value_replacement in phiopt (PR 88676).  */


is going to have the same issue I think.

[PATCH ver 3] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Carl Love via Gcc-patches

GCC maintainers:

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl


-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

The tests are broken up into a seriers of files for related tests.  The
new tests are runnable tests to verify the builtin argument types and the
functional correctness of each test rather then verifying the type and
number of instructions generated.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op.c   | 141 ++
 .../powerpc/vsx-vector-6-func-2lop.c  | 217 +++
 .../powerpc/vsx-vector-6-func-2op.c   | 133 +
 .../powerpc/vsx-vector-6-func-3op.c   | 257 ++
 .../powerpc/vsx-vector-6-func-cmp-all.c   | 211 ++
 .../powerpc/vsx-vector-6-func-cmp.c   | 121 +
 .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
 10 files changed, 1080 insertions(+), 282 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2lop.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-3op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp-all.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
new file mode 100644
index 000..52c7ae3e983
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
@@ -0,0 +1,141 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-options "-O2 -save-temps" } */
+
+/* Functional test of the one operand vector builtins.  */
+
+#include 
+#include 
+#include 
+
+#define DEBUG 0
+
+void abort (void);
+
+/* Macro to check the results for the various floating point argument tests.
+ */
+#define FLOAT_TEST(NAME)  \
+  void __attribute__ ((noipa))\
+  float_##NAME (vector float f_src, vector float f_##NAME##_expected) \
+  {  \
+vector float f_result = vec_##NAME(f_src);   \
+  \
+if

Re: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread 钟居哲

Or do you have better solution to make the case succeed to combine into vfwmul?
I am ok with any solution.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 06:59
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb 
> 
> The first case can't genreate vfwmul.vv but second case succeed.
> 
> Failed to match this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>  (if_then_else:VNx2DF (unspec:VNx2BI [
>  (const_vector:VNx2BI repeat [
>  (const_int 1 [0x1])
>  ])
>  (reg:DI 153)
>  (const_int 2 [0x2]) repeated x2
>  (const_int 1 [0x1])
>  (const_int 7 [0x7])
>  (reg:SI 66 vl)
>  (reg:SI 67 vtype)
>  (reg:SI 69 N/A)
>  ] UNSPEC_VPREDICATE)
>  (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
>  (reg:VNx2DF 148 [ vect__8.49 ]))
>  (unspec:VNx2DF [
>  (reg:SI 0 zero)
>  ] UNSPEC_VUNDEF)))
Right.  We try combining:
   24 -> 27
   25 -> 27
   23, 24 -> 27
   22, 25 -> 27
 
All of which fail, as expected.  24 -> 27 and 25-> 27 only put an 
extension on one operand of the mult.  The other two try to substitute a 
float extend of an if-then-else which I fully expect to fail.  All as 
expected.
 
The next one that gets tried is:
 
> Trying 25, 24 -> 27:
>25: r149:VNx2DF=float_extend(r141:VNx2SF)
>   REG_DEAD r141:VNx2SF
>24: r148:VNx2DF=float_extend(r139:VNx2SF)
>   REG_DEAD r139:VNx2SF
>27: 
> r150:VNx2DF={(unspec[const_vector,r153:DI,0x2,0x2,0x1,0x7,vl:SI,vtype:SI,N/A:SI]
>  69)?r148:VNx2DF*r149:VNx2DF:unspec[zero:SI] 68}
>   REG_DEAD r149:VNx2DF
>   REG_DEAD r148:VNx2DF
>   REG_DEAD N/A:SI
>   REG_DEAD zero:SI
>   REG_EQUAL r148:VNx2DF*r149:VNx2DF
> Successfully matched this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
> (if_then_else:VNx2DF (unspec:VNx2BI [
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])
> (reg:DI 153)
> (const_int 2 [0x2]) repeated x2
> (const_int 1 [0x1])
> (const_int 7 [0x7])
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> (reg:SI 69 N/A)
> ] UNSPEC_VPREDICATE)
> (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 141 [ vect__4.44 ]))
> (float_extend:VNx2DF (reg:VNx2SF 139 [ vect__7.48 ])))
> (unspec:VNx2DF [
> (reg:SI 0 zero)
> ] UNSPEC_VUNDEF)))
> allowing combination of insns 24, 25 and 27
> original costs 4 + 4 + 4 = 12
> replacement cost 4
 
Note how it replaced both operands of the mult with extended versions 
and the pattern matches, as expected.
 
The point being that I don't think those helper patterns are needed to 
handle the problem you suggested they were there to handle.  Combine 
knows how to handle multiple substitutions just fine.
 
Right now I don't see a need for this patch.
 
 
 
Jeff

Re: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread 钟居哲

>> Right now I don't see a need for this patch.
No, we need this patch.

With this patch,  this following case can be combine into vfwmul.vv:
#define TEST_TYPE(TYPE1, TYPE2)\
  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( \
TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3, \
TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,  \
TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) \
  {\
for (int i = 0; i < n; i++)\
  {\
dst[i] = (TYPE1) a[i] * (TYPE1) b[i];  \
dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];\
dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];\
dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];\
  }\
  }
TEST_TYPE (double, float)
You should try this, then you will know I am saying.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 06:59
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
 
 
On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb 
> 
> The first case can't genreate vfwmul.vv but second case succeed.
> 
> Failed to match this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
>  (if_then_else:VNx2DF (unspec:VNx2BI [
>  (const_vector:VNx2BI repeat [
>  (const_int 1 [0x1])
>  ])
>  (reg:DI 153)
>  (const_int 2 [0x2]) repeated x2
>  (const_int 1 [0x1])
>  (const_int 7 [0x7])
>  (reg:SI 66 vl)
>  (reg:SI 67 vtype)
>  (reg:SI 69 N/A)
>  ] UNSPEC_VPREDICATE)
>  (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
>  (reg:VNx2DF 148 [ vect__8.49 ]))
>  (unspec:VNx2DF [
>  (reg:SI 0 zero)
>  ] UNSPEC_VUNDEF)))
Right.  We try combining:
   24 -> 27
   25 -> 27
   23, 24 -> 27
   22, 25 -> 27
 
All of which fail, as expected.  24 -> 27 and 25-> 27 only put an 
extension on one operand of the mult.  The other two try to substitute a 
float extend of an if-then-else which I fully expect to fail.  All as 
expected.
 
The next one that gets tried is:
 
> Trying 25, 24 -> 27:
>25: r149:VNx2DF=float_extend(r141:VNx2SF)
>   REG_DEAD r141:VNx2SF
>24: r148:VNx2DF=float_extend(r139:VNx2SF)
>   REG_DEAD r139:VNx2SF
>27: 
> r150:VNx2DF={(unspec[const_vector,r153:DI,0x2,0x2,0x1,0x7,vl:SI,vtype:SI,N/A:SI]
>  69)?r148:VNx2DF*r149:VNx2DF:unspec[zero:SI] 68}
>   REG_DEAD r149:VNx2DF
>   REG_DEAD r148:VNx2DF
>   REG_DEAD N/A:SI
>   REG_DEAD zero:SI
>   REG_EQUAL r148:VNx2DF*r149:VNx2DF
> Successfully matched this instruction:
> (set (reg:VNx2DF 150 [ vect__11.50 ])
> (if_then_else:VNx2DF (unspec:VNx2BI [
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])
> (reg:DI 153)
> (const_int 2 [0x2]) repeated x2
> (const_int 1 [0x1])
> (const_int 7 [0x7])
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> (reg:SI 69 N/A)
> ] UNSPEC_VPREDICATE)
> (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 141 [ vect__4.44 ]))
> (float_extend:VNx2DF (reg:VNx2SF 139 [ vect__7.48 ])))
> (unspec:VNx2DF [
> (reg:SI 0 zero)
> ] UNSPEC_VUNDEF)))
> allowing combination of insns 24, 25 and 27
> original costs 4 + 4 + 4 = 12
> replacement cost 4
 
Note how it replaced both operands of the mult with extended versions 
and the pattern matches, as expected.
 
The point being that I don't think those helper patterns are needed to 
handle the problem you suggested they were there to handle.  Combine 
knows how to handle multiple substitutions just fine.
 
Right now I don't see a need for this patch.
 
 
 
Jeff

[Bug libstdc++/110149] std::format for pointer arguments allows a '0' option

2023-06-29 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110149

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jonathan Wakely  ---
Fixed for GCC 13.2

[Bug libstdc++/110239] [14 regression] std/format/functions/format.cc fails after r14-1648-g628ba410b9265d

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110239

--- Comment #5 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:2d40cd2f199e32f185d4b72db2043e91313ab7f2

commit r13-7511-g2d40cd2f199e32f185d4b72db2043e91313ab7f2
Author: Jonathan Wakely 
Date:   Mon Jun 26 14:46:46 2023 +0100

libstdc++: Fix std::format for pointers [PR110239]

The formatter for pointers was casting to uint64_t which sign extends a
32-bit pointer and produces a value that won't fit in the provided
buffer. Cast to uintptr_t instead.

There was also a bug in the __parse_integer helper when converting a
wide string to a narrow string in order to use std::from_chars on it.
The function would always try to read 32 characters, even if the format
string was shorter than that. Fix that bug, and remove the constexpr
implementation of __parse_integer by just using __from_chars_alnum
instead of from_chars, because that's usable in constexpr even in
C++20.

libstdc++-v3/ChangeLog:

PR libstdc++/110239
* include/std/format (__format::__parse_integer): Fix buffer
overflow for wide chars.
(formatter::format): Cast to uintptr_t instead
of uint64_t.
* testsuite/std/format/string.cc: Test too-large widths.

(cherry picked from commit 3bb9f9329c378934541ae4cff9977b7487e97cf0)

[Bug libstdc++/110149] std::format for pointer arguments allows a '0' option

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110149

--- Comment #7 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:ae7cdc8c0f5278e7941f1de7c72ffe9f1fed2775

commit r13-7510-gae7cdc8c0f5278e7941f1de7c72ffe9f1fed2775
Author: Jonathan Wakely 
Date:   Thu Jun 8 21:35:21 2023 +0100

libstdc++: Fix P2510R3 "Formatting pointers" [PR110149]

I had intended to support the P2510R3 proposal unconditionally in C++20
mode, but I left it half implemented. The parse function supported the
new extensions, but the format function didn't.

This adds the missing pieces, and makes it only enabled for C++26 and
non-strict modes.

libstdc++-v3/ChangeLog:

PR libstdc++/110149
* include/std/format (formatter::parse):
Only alow 0 and P for C++26 and non-strict modes.
(formatter::format): Use toupper for P
type, and insert zero-fill characters for 0 option.
* testsuite/std/format/functions/format.cc: Check pointer
formatting. Only check P2510R3 extensions conditionally.
* testsuite/std/format/parse_ctx.cc: Only check P2510R3
extensions conditionally.

(cherry picked from commit 628ba410b9265dbd4278c1f1b1fadf05348adef2)

[Bug libstdc++/109741] alignas(64) in libstdc++-v3/src/c++11/shared_ptr.cc

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109741

--- Comment #17 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:dbd4acd72274f3b3d542ebf68f9962eda8f8b769

commit r13-7509-gdbd4acd72274f3b3d542ebf68f9962eda8f8b769
Author: Jonathan Wakely 
Date:   Tue May 16 15:09:20 2023 +0100

libstdc++: Disable cacheline alignment for DJGPP [PR109741]

DJGPP (and maybe other targets) uses MAX_OFILE_ALIGNMENT=16 which means
that globals (and static objects) can't have alignment greater than 16.
This causes an error for the locks defined in src/c++11/shared_ptr.cc
because we try to align them to the cacheline size, to avoid false
sharing.

Add a configure check for the increased alignment, and live with false
sharing where we can't increase the alignment.

libstdc++-v3/ChangeLog:

PR libstdc++/109741
* acinclude.m4 (GLIBCXX_CHECK_ALIGNAS_CACHELINE): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_ALIGNAS_CACHELINE.
* src/c++11/shared_ptr.cc (__gnu_internal::get_mutex): Do not
align lock table if not supported. use __GCC_DESTRUCTIVE_SIZE
instead of hardcoded 64.

(cherry picked from commit 94a311abf783de754f0f1b2d4c1f00a9788e795b)

[Bug libstdc++/100285] experimental/net/socket/socket_base.cc fails on arm-eabi (r12-137)

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100285

--- Comment #16 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:132015b9c6c9f9156ff31f7d66ba92cf01d0bc90

commit r13-7508-g132015b9c6c9f9156ff31f7d66ba92cf01d0bc90
Author: Jonathan Wakely 
Date:   Fri Jun 9 12:15:21 2023 +0100

libstdc++: Add preprocessor checks to  [PR100285]

We can't define endpoints and resolvers without the relevant OS support.
If IPPROTO_TCP and IPPROTO_UDP are both undefined then we won't need
basic_endpoint and basic_resolver anyway, so make them depend on those
macros.

libstdc++-v3/ChangeLog:

PR libstdc++/100285
* include/experimental/internet [IPPROTO_TCP || IPPROTO_UDP]
(basic_endpoint, basic_resolver_entry, resolver_base)
(basic_resolver_results, basic_resolver): Only define if the tcp
or udp protocols will be defined.

(cherry picked from commit 793ed718b522b15e2d758eca953feeec1979fe2c)

[Bug libstdc++/52799] deque::emplace(iterator, ...) tries to call push_front(...), which doesn't exist

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52799

--- Comment #6 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:52997b14311726447850341ecaf286b68587ff32

commit r13-7505-g52997b14311726447850341ecaf286b68587ff32
Author: Jonathan Wakely 
Date:   Thu Jun 8 12:19:26 2023 +0100

libstdc++: Improve tests for emplace member of sequence containers

Our existing tests for std::deque::emplace, std::list::emplace and
std::vector::emplace are poor. We only have compile tests for PR 52799
and the equivalent for a const_iterator as the insertion point. This
fails to check that the value is actually inserted correctly and the
right iterator is returned.

Add new tests that cover the existing 52799.cc and const_iterator.cc
compile-only tests, as well as verifying the effects are correct.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/deque/modifiers/emplace/52799.cc:
Removed.
*
testsuite/23_containers/deque/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/vector/modifiers/emplace/52799.cc:
Removed.
*
testsuite/23_containers/vector/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/deque/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/list/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/vector/modifiers/emplace/1.cc: New
test.

(cherry picked from commit 3ec1d76a359542ed4c8370390efa9ee9e25e757f)

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/28/23 16:00, 钟居哲 wrote:

You can see here:

https://godbolt.org/z/d78646hWb 

The first case can't genreate vfwmul.vv but second case succeed.

Failed to match this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
     (if_then_else:VNx2DF (unspec:VNx2BI [
                 (const_vector:VNx2BI repeat [
                         (const_int 1 [0x1])
                     ])
                 (reg:DI 153)
                 (const_int 2 [0x2]) repeated x2
                 (const_int 1 [0x1])
                 (const_int 7 [0x7])
                 (reg:SI 66 vl)
                 (reg:SI 67 vtype)
                 (reg:SI 69 N/A)
             ] UNSPEC_VPREDICATE)
         (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 149 [ vect__5.45 ]))
             (reg:VNx2DF 148 [ vect__8.49 ]))
         (unspec:VNx2DF [
                 (reg:SI 0 zero)
             ] UNSPEC_VUNDEF)))

Right.  We try combining:
  24 -> 27
  25 -> 27
  23, 24 -> 27
  22, 25 -> 27

All of which fail, as expected.  24 -> 27 and 25-> 27 only put an 
extension on one operand of the mult.  The other two try to substitute a 
float extend of an if-then-else which I fully expect to fail.  All as 
expected.


The next one that gets tried is:


Trying 25, 24 -> 27:
   25: r149:VNx2DF=float_extend(r141:VNx2SF)
  REG_DEAD r141:VNx2SF
   24: r148:VNx2DF=float_extend(r139:VNx2SF)
  REG_DEAD r139:VNx2SF
   27: 
r150:VNx2DF={(unspec[const_vector,r153:DI,0x2,0x2,0x1,0x7,vl:SI,vtype:SI,N/A:SI]
 69)?r148:VNx2DF*r149:VNx2DF:unspec[zero:SI] 68}
  REG_DEAD r149:VNx2DF
  REG_DEAD r148:VNx2DF
  REG_DEAD N/A:SI
  REG_DEAD zero:SI
  REG_EQUAL r148:VNx2DF*r149:VNx2DF
Successfully matched this instruction:
(set (reg:VNx2DF 150 [ vect__11.50 ])
(if_then_else:VNx2DF (unspec:VNx2BI [
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])
(reg:DI 153)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(const_int 7 [0x7])
(reg:SI 66 vl)
(reg:SI 67 vtype)
(reg:SI 69 N/A)
] UNSPEC_VPREDICATE)
(mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 141 [ vect__4.44 ]))
(float_extend:VNx2DF (reg:VNx2SF 139 [ vect__7.48 ])))
(unspec:VNx2DF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
allowing combination of insns 24, 25 and 27
original costs 4 + 4 + 4 = 12
replacement cost 4


Note how it replaced both operands of the mult with extended versions 
and the pattern matches, as expected.


The point being that I don't think those helper patterns are needed to 
handle the problem you suggested they were there to handle.  Combine 
knows how to handle multiple substitutions just fine.


Right now I don't see a need for this patch.



Jeff

[Bug target/110478] RISC-V multilib gcc zicsr in the -march causing incorrect libgcc to be used

2023-06-29 Thread palmer at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110478

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #5 from palmer at gcc dot gnu.org ---
(In reply to Bin Meng from comment #4)
> I can't get the build to pass with the same configure scripts on current GCC
> HEAD :(
> 
> --host=x86_64-linux-gnu --build=aarch64-linux --target=riscv64-linux
> --enable-targets=all
> --prefix=/home/arnd/cross/x86_64/gcc-13.1.0-nolibc/riscv64-linux
> --enable-languages=c --without-headers --disable-bootstrap --disable-nls
> --disable-threads --disable-shared --disable-libmudflap --disable-libssp
> --disable-libgomp --disable-decimal-float --disable-libquadmath
> --disable-libatomic --disable-libcc1 --disable-libmpx
> --enable-checking=release --with-static-standard-libraries
> 
> Error below:
> 
> Checking multilib configuration for libgcc...
> make[2]: Entering directory '/home/bmeng/git/gcc/riscv64-linux/libgcc'
> Makefile:183: ../.././gcc/libgcc.mvars: No such file or directory
> make[2]: *** No rule to make target '../.././gcc/libgcc.mvars'.  Stop.
> make[2]: Leaving directory '/home/bmeng/git/gcc/riscv64-linux/libgcc'
> make[1]: *** [Makefile:12855: all-target-libgcc] Error 2

It's building for me using riscv-gnu-toolchain and 070a6bf0bdc ("Update
documentation to clarify a GCC extension [PR c/77650]").  If the failure is
still reproducing on a HEAD can you give me a pointer to the exact commit? 
Also might be better to put that in a different bug, it's probably not the same
issue.

gcc-11-20230629 is now available

2023-06-29 Thread GCC Administrator via Gcc

Snapshot gcc-11-20230629 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/11-20230629/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 11 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-11 revision 22e5d295199714f9fcebe970e3e8c29c204d30a6

You'll find:

 gcc-11-20230629.tar.xz   Complete GCC

  SHA256=ddf79dab57bbba4a36e034dcd25cc5b26a1bd6009c51a7b2ad962d6472779f97
  SHA1=aba12782d01d44b7ee2dfe364077c1bb03326b5c

Diffs from 11-20230622 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-11
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: [PATCH] c++: fix up caching of level lowered ttps

2023-06-29 Thread Jason Merrill via Gcc-patches


On 6/1/23 17:42, Patrick Palka wrote:

Due to level/depth mismatches between the template parameters of a level
lowered ttp and the original ttp, the ttp comparison check added by
r14-418-g0bc2a1dc327af9 never actually holds outside of erroneous cases.
Moreover, it'd be good to cache the overall TEMPLATE_TEMPLATE_PARM
instead of just the corresponding TEMPLATE_PARM_INDEX.

It's tricky to cache all level lowered ttps since the result of level
lowering may depend on more than just the depth of the arguments, e.g.
for TT in

   template
   struct A
   {
 template class TT>
 void f();
   }

the substitution T=int yields a different level-lowerd ttp than T=char.
But these kinds of ttps seem to be rare in practice, and "simple" ttps
that don't depend on outer template parameters are easy enough to
cache like so.  Unfortunately, this means we're back to expecting a
duplicate error in nontype12.C again since the ttp in question is
not "simple" so caching of the (erroneous) lowered ttp doesn't happen.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  This reduces memory usage of range-v3's zip.cpp by 1%.

gcc/cp/ChangeLog:

* cp-tree.h (TEMPLATE_PARM_DESCENDANTS): Harden.
(TEMPLATE_TYPE_DESCENDANTS): Define.
(TEMPLATE_TEMPLATE_PARM_SIMPLE_P): Define.
* pt.cc (reduce_template_parm_level): Revert
r14-418-g0bc2a1dc327af9 change.
(process_template_parm): Set TEMPLATE_TEMPLATE_PARM_SIMPLE_P
appropriately.
(uses_outer_template_parms): Determine the outer depth of
a template template parm without relying on DECL_CONTEXT.
(tsubst) : Cache lowering a
simple template template parm.  Consistently use 'code'.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype12.C: Expect a duplicate error again.
---
  gcc/cp/cp-tree.h  | 10 +-
  gcc/cp/pt.cc  | 37 +--
  gcc/testsuite/g++.dg/template/nontype12.C |  3 +-
  3 files changed, 31 insertions(+), 19 deletions(-)

+++ b/gcc/testsuite/g++.dg/template/nontype12.C
@@ -4,8 +4,7 @@
  template struct A
  {
template int foo();// { dg-error "double" "" { 
target c++17_down } }
-  template class> int bar();// { dg-bogus 
{double.*C:7:[^\n]*double} }


Let's xfail the duplicate error rather than remove the dg-bogus.  OK 
with that change.


Jason

Re: [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector

2023-06-29 Thread Jason Merrill via Gcc-patches


On 6/28/23 09:41, Tamar Christina wrote:

Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C and C++
as gfortan does for FORTRAN and what ICX/ICX does for C and C++.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c-family/ChangeLog:

* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
* c-pragma.cc (init_pragma): Use it.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement, c_parser_statement_after_labels,
c_parse_pragma_novector, c_parser_pragma): Wire through novector and
default to false.


I'll let the C maintainers review the C changes.


gcc/cp/ChangeLog:

* cp-tree.def (RANGE_FOR_STMT): Update comment.
* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.
* tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0.
* tree.h (TREE_LANG_FLAG_7): New.


This doesn't seem necessary; I think only flags 1 and 6 are currently 
used in RANGE_FOR_STMT.



gcc/testsuite/ChangeLog:

* g++.dg/vect/vect-novector-pragma.cc: New test.
* gcc.dg/vect/vect-novector-pragma.c: New test.

--- inline copy of patch --
...
@@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
 not included. */
  
  static tree

-cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
+cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
+  bool novector)


I wonder about combining the ivdep and novector parameters here and in 
other functions?  Up to you.



@@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser, enum 
pragma_context context, bool *if_p)
break;
  }
const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
-   unsigned short unroll;
+   unsigned short unroll = 0;
+   bool novector = false;
cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
-   if (tok->type == CPP_PRAGMA
-   && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
+
+   while (tok->type == CPP_PRAGMA)
  {
-   tok = cp_lexer_consume_token (parser->lexer);
-   unroll = cp_parser_pragma_unroll (parser, tok);
-   tok = cp_lexer_peek_token (the_parser->lexer);
+   switch (cp_parser_pragma_kind (tok))
+ {
+   case PRAGMA_UNROLL:
+ {
+   tok = cp_lexer_consume_token (parser->lexer);
+   unroll = cp_parser_pragma_unroll (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   case PRAGMA_NOVECTOR:
+ {
+   tok = cp_lexer_consume_token (parser->lexer);
+   novector = cp_parser_pragma_novector (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   default:
+ gcc_unreachable ();
+ }
  }


Repeating this pattern three times for the three related pragmas is too 
much; please combine the three cases into one.


Jason

[Bug tree-optimization/110490] tree-ssa/clz-* and tree-ssa/ctz-* fail on s390x with arch14

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110490

--- Comment #3 from Andrew Pinski  ---
I was wrong about where the problem was but not wrong about the missed
optimization.

Anyways the issue is in expression_expensive_p where the issue of clzdi2 and
not having clzsi2 ...

expression_expensive_p should check if you can do a widening of the mode ...

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Carl Love via Gcc-patches

Kewen:

On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
> > Yea, I was going with a runnable test and didn't include the
> > instruction counts.  Added back in.  Rather then doing by processor
> > version (P8, P9, P10) I was able to do it by BE/LE.  The
> > instruction
> > counts were the same for LE accross processor versions but there
> > are a
> > few instruction counts that vary with BE and LE.
> 
> But the original test case only checks for cpu-types (processor
> version)
> but not for endianness, it means for the bif usages, there should not
> be
> different for endianness.  Why does this changes with your new test
> case?
> Could you have a further look and make it consistent with some
> adjustment
> if possible?  As we know, checking insn counts sometimes are fragile,
> so
> I think we should try our best to make it as robust as possible in
> the
> first place.
> 
> Besides, the original case also have some differences between p7/p8
> and
> p9.
>   

There are differences on P8 LE versus BE.  I did a diff between the P8
and P9 tests:

 diff vsx-vector-6.p8.c vsx-vector-6.p9.c
3,4c3,4
< /* { dg-require-effective-target powerpc_p8vector_ok } */
< /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
---
> /* { dg-require-effective-target powerpc_p9vector_ok } */
> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
12c12
< /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
23d22
< /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
37c36
< /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */

So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different
between the two architectures.  I then wrote a script to compile the
CPU specific test on Power 8, Power 9 and Power 10 architectures and
then grep for the above list of instructions.  If I run the scrip on P8
BE  and LE I get

Power 8 BEPower 8 LE   Power 9 LE   Power 9 BEPower 10 LE*
   (makalu-lp1)(genoa) (marlin)  (nilram)   (ltcd97-lp3)
instruction   count countcount countcount
vperm  1  10 00
vpermr 0  00 00
xxpermr0  01 01
xvmsubadp  1  01 11
xvmsubmdp  0  10 00
xvsubdp1  11 11

>From the diff we see 

  { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } }

This test picks up the correct subtraction instruction for LE versus BE
so this "masks" the LE/BE difference.  I changed the check in vsx-
vector-6-func-3op.c to match.  This eliminates the LE and BE checks and
reduces the number of specific checks.

In vsx-vector-6-func-3op.c  The new test checks the counts for
xxpermdi, which the original test does not check.  The check for
xxpermdi are not needed.  They are not directly related to the builtin
tests.  I removed them.

Looking at the LE/BE checks in the other test file vsx-vector-6-func-
2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in
the original test.  The functions where these instructions are used get
inlined.  On LE, the binary instructions show up in the inlined code as
well as what appears to be the binary for the original, non-inlined
function.  Best I can see, the binary for the original function is dead
code.  I don't see any calls to it.  Seems like it shouldn't be there
as it would make the binary smaller. On BE, I don't see the binary for
the original non-inlined function.  

I had played with putting -Wno-inline on the command line but that
didn't seem to make any difference.  However, you suggestion of
__attribute__ ((noipa)) does prevent the inlining and we don't get the
second copy of the instructions showing up. The inlining eliminated the
LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.

The instruction count test for xxlor in vsx-vector-6-func-2lop.c
differs on LE and BE vsx-vector-6-func-2op.c.  I believe the
instruction is used with loads to reorder the data.  I don't see anyway
to get around the extra xxlor instructions and verify the vec_or
builtin test generates the instruction.  

I was able to eliminate all of the LE/BE qualifiers in the instruction
counts with the exception of xxlor.  By using the same checks that look
for multiple versions of xvmsumb*, as was done in the original test, we
can also eliminate LE/BE specific tests and account for different
instructions across CPU versions.  We could go back to checking for
specific instructions being generated on Power 8, Power 9, Power 10 if
you prefer not using checks that cover multiple flavors of a given

[Bug tree-optimization/110490] tree-ssa/clz-* and tree-ssa/ctz-* fail on s390x with arch14

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110490

Andrew Pinski  changed:

   What|Removed |Added

  Component|testsuite   |tree-optimization
 Ever confirmed|0   |1
   Last reconfirmed||2023-06-29
   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW

--- Comment #2 from Andrew Pinski  ---
So the problem is a missing optimization.
In this case, arch14 (which enables TARGET_EXTIMM) only defines a clzdi2 and
not a clzsi2 so the code in build_cltz_expr does not realize that it could
still support clzsi2 ...

Re: [PATCH v1] RISC-V: Refactor vxrm_mode attr for type attr equal

2023-06-29 Thread Jeff Law via Gcc-patches





On 6/29/23 00:00, pan2...@intel.com wrote:

From: Pan Li 

This patch would like to refactor the vxrm_mode attr for duplicated
eq_attr condition. The common condition of attr is extraced to one
place instead of many places.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/vector.md: Refactor the common condition.

OK
jeff

Extend ipa-fnsummary to skip __builtin_expect

2023-06-29 Thread Jan Hubicka via Gcc-patches

Compute ipa-predicates for conditionals involving __builtin_expect_p

std::vector allocator looks as follows:

__attribute__((nodiscard))
struct pair * std::__new_allocator 
>::allocate (struct __new_allocator * const this, size_type __n, const void * 
D.27753)
{
  bool _1;
  long int _2;
  long int _3;
  long unsigned int _5;
  struct pair * _9;

   [local count: 1073741824]:
  _1 = __n_7(D) > 1152921504606846975;
  _2 = (long int) _1;
  _3 = __builtin_expect (_2, 0);
  if (_3 != 0)
goto ; [10.00%]
  else
goto ; [90.00%]

   [local count: 107374184]:
  if (__n_7(D) > 2305843009213693951)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 53687092]:
  std::__throw_bad_array_new_length ();

   [local count: 53687092]:
  std::__throw_bad_alloc ();

   [local count: 966367641]:
  _5 = __n_7(D) * 8;
  _9 = operator new (_5);
  return _9;
}


So there is check for allocated block size being greater than max_size which is
wrapper in __builtin_expect.  This makes ipa-fnsummary to give up analyzing
predicates and it will miss the fact that the two different calls to __throw
will be optimized out if __n is larady smaller than 1152921504606846975 which
it is after _M_check_len.

This patch extends ipa-fnsummary to understand functions that return their
parameter.

We still do not get the value range propagated sicne _M_check_len is not
inlined early and ipa-prop misses return functions, but we get closer :)

Bootstrapped/regtested x86_64-linux, comitted.


gcc/ChangeLog:

PR tree-optimization/109849
* ipa-fnsummary.cc (decompose_param_expr): Skip
functions returning its parameter.
(set_cond_stmt_execution_predicate): Return early
if predicate was constructed.

gcc/testsuite/ChangeLog:

PR tree-optimization/109849
* gcc.dg/ipa/pr109849.c: New test.

diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index 78cbb60d056..a09f6305c63 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -1516,6 +1516,19 @@ decompose_param_expr (struct ipa_func_body_info *fbi,
 
   if (TREE_CODE (expr) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (expr))
break;
+  stmt = SSA_NAME_DEF_STMT (expr);
+
+  if (gcall *call = dyn_cast  (stmt))
+   {
+ int flags = gimple_call_return_flags (call);
+ if (!(flags & ERF_RETURNS_ARG))
+   goto fail;
+ int arg = flags & ERF_RETURN_ARG_MASK;
+ if (arg >= (int)gimple_call_num_args (call))
+   goto fail;
+ expr = gimple_call_arg (stmt, arg);
+ continue;
+   }
 
   if (!is_gimple_assign (stmt = SSA_NAME_DEF_STMT (expr)))
break;
@@ -1664,6 +1677,7 @@ set_cond_stmt_execution_predicate (struct 
ipa_func_body_info *fbi,
}
}
   vec_free (param_ops);
+  return;
 }
 
   if (TREE_CODE (op) != SSA_NAME)
diff --git a/gcc/testsuite/gcc.dg/ipa/pr109849.c 
b/gcc/testsuite/gcc.dg/ipa/pr109849.c
new file mode 100644
index 000..09b62f90d70
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr109849.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-ipa-inline-details" } */
+void bad (void);
+void
+test(int a)
+{
+   if (__builtin_expect (a>3, 0))
+   {
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   bad ();
+   }
+}
+void
+foo (int a)
+{
+   if (a>0)
+   __builtin_unreachable ();
+   test (a);
+}
+/* { dg-final { scan-ipa-dump "Inlined 2 calls" "inline"  } } */
+/* { dg-final { scan-ipa-dump "Inlining test" "inline"  } } */

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #19 from CVS Commits  ---
The master branch has been updated by Jan Hubicka :

https://gcc.gnu.org/g:9dc18fca431626404b0692c689a2e103666e7adb

commit r14-2202-g9dc18fca431626404b0692c689a2e103666e7adb
Author: Jan Hubicka 
Date:   Thu Jun 29 22:45:37 2023 +0200

Compute ipa-predicates for conditionals involving __builtin_expect_p

std::vector allocator looks as follows:

__attribute__((nodiscard))
struct pair * std::__new_allocator
>::allocate (struct __new_allocator * const this, size_type __n, const void *
D.27753)
{
  bool _1;
  long int _2;
  long int _3;
  long unsigned int _5;
  struct pair * _9;

   [local count: 1073741824]:
  _1 = __n_7(D) > 1152921504606846975;
  _2 = (long int) _1;
  _3 = __builtin_expect (_2, 0);
  if (_3 != 0)
goto ; [10.00%]
  else
goto ; [90.00%]

   [local count: 107374184]:
  if (__n_7(D) > 2305843009213693951)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 53687092]:
  std::__throw_bad_array_new_length ();

   [local count: 53687092]:
  std::__throw_bad_alloc ();

   [local count: 966367641]:
  _5 = __n_7(D) * 8;
  _9 = operator new (_5);
  return _9;
}

So there is check for allocated block size being greater than max_size
which is
wrapper in __builtin_expect.  This makes ipa-fnsummary to give up analyzing
predicates and it will miss the fact that the two different calls to
__throw
will be optimized out if __n is larady smaller than 1152921504606846975
which
it is after _M_check_len.

This patch extends ipa-fnsummary to understand functions that return their
parameter.

gcc/ChangeLog:

PR tree-optimization/109849
* ipa-fnsummary.cc (decompose_param_expr): Skip
functions returning its parameter.
(set_cond_stmt_execution_predicate): Return early
if predicate was constructed.

gcc/testsuite/ChangeLog:

PR tree-optimization/109849
* gcc.dg/ipa/pr109849.c: New test.

[Bug testsuite/110490] tree-ssa/clz-* and tree-ssa/ctz-* fail on s390x with arch14

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110490

Andrew Pinski  changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu.org

--- Comment #1 from Andrew Pinski  ---
I suspect this is a testsuite issue:
>/* { dg-require-effective-target clzl } */

basically the check_effective_target_clz* (ctz*) test is passing for some
reason when it should not be ...

Re: [PATCH] analyzer: Fix regression bug after r14-1632-g9589a46ddadc8b [PR110198]

2023-06-29 Thread David Malcolm via Gcc-patches

On Thu, 2023-06-29 at 20:45 +0200, priour...@gmail.com wrote:
> From: benjamin priour 
> 
> See below formatting updates on my patch.
> In mail
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623140.html,
> David Malcolm says regtesting failed for him.
> 
> So I did it once more this morning rebased on fresh trunk dc93a0f633b
> and target x86_64-linux-gnu, the output was similar to last time:
> 
>     # from gcc_sources_testing 
>     gcc/contrib/compare_tests ../gcc_sources_control/build build
> 
>     # Comparing directories
>     ## Dir1=../gcc_sources_control/build: 8 sum files
>     ## Dir2=build: 8 sum files
>     
>     # Comparing 8 common sum files
>     ## /bin/sh gcc/contrib/compare_tests  /tmp/gxx-sum1.750468
> /tmp/gxx-sum2.750468
>     Tests that now work, but didn't before (3 tests):
> 
>     g++.dg/analyzer/pr100244.C  -std=c++14  (test for warnings, line
> 17)
>     g++.dg/analyzer/pr100244.C  -std=c++17  (test for warnings, line
> 17)
>     g++.dg/analyzer/pr100244.C  -std=c++20  (test for warnings, line
> 17)
>     
>     # No differences found in 8 common sum files
> 
> Can you confirm formatting of the patch below, and perhaps regtest it
> ?

Looks good to me; you can go ahead and push this to trunk.

Thanks
Dave

Re: [PATCH] testsuite: Use -fno-report-bug in gcc.dg/plugin/

2023-06-29 Thread David Malcolm via Gcc-patches

On Thu, 2023-06-29 at 15:03 -0400, Marek Polacek wrote:
> Certain downstream compilers (for example, in Fedora) default to
> -freport-bug.  The extra output breaks the following tests.  We can
> use
> -fno-report-bug to fix that.  Patch verified with:
> 
> $ make check RUNTESTFLAGS='--target_board=unix\{,-freport-bug\}
> plugin.exp'
> 
> Tested x86_64-pc-linux-gnu, ok for trunk/13?

Looks good to me; thanks

Dave

[Bug testsuite/110490] New: tree-ssa/clz-* and tree-ssa/ctz-* fail on s390x with arch14

2023-06-29 Thread mpolacek at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110490

Bug ID: 110490
   Summary: tree-ssa/clz-* and tree-ssa/ctz-* fail on s390x with
arch14
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mpolacek at gcc dot gnu.org
  Target Milestone: ---

(Not sure what component is most appropriate for this.)

The following FAILs can be seen in

(s390x-ibm-linux-gnu arch14)
but not in

(s390x-ibm-linux-gnu default)

It would be great to fix them.

FAIL: gcc.dg/tree-ssa/clz-char.c scan-tree-dump-times optimized
"__builtin_clz|.CLZ" 1
FAIL: gcc.dg/tree-ssa/clz-complement-char.c scan-tree-dump-times optimized
"__builtin_clz|.CLZ" 1
FAIL: gcc.dg/tree-ssa/clz-complement-int.c scan-tree-dump-times optimized
"__builtin_clz|.CLZ" 1
FAIL: gcc.dg/tree-ssa/clz-complement-long.c scan-tree-dump-times optimized
"__builtin_clz|.CLZ" 1
FAIL: gcc.dg/tree-ssa/clz-int.c scan-tree-dump-times optimized
"__builtin_clz|.CLZ" 1
FAIL: gcc.dg/tree-ssa/clz-long.c scan-tree-dump-times optimized
"__builtin_clz|.CLZ" 1
FAIL: gcc.dg/tree-ssa/ctz-char.c scan-tree-dump-times optimized
"__builtin_ctz|.CTZ" 1
FAIL: gcc.dg/tree-ssa/ctz-complement-char.c scan-tree-dump-times optimized
"__builtin_ctz|.CTZ" 1
FAIL: gcc.dg/tree-ssa/ctz-complement-int.c scan-tree-dump-times optimized
"__builtin_ctz|.CTZ" 1
FAIL: gcc.dg/tree-ssa/ctz-complement-long-long.c scan-tree-dump-times optimized
"__builtin_ctz|.CTZ" 1
FAIL: gcc.dg/tree-ssa/ctz-complement-long.c scan-tree-dump-times optimized
"__builtin_ctz|.CTZ" 1
FAIL: gcc.dg/tree-ssa/ctz-int.c scan-tree-dump-times optimized
"__builtin_ctz|.CTZ" 1
FAIL: gcc.dg/tree-ssa/ctz-long-long.c scan-tree-dump-times optimized
"__builtin_ctz|.CTZ" 1
FAIL: gcc.dg/tree-ssa/ctz-long.c scan-tree-dump-times optimized
"__builtin_ctz|.CTZ" 1

[Bug analyzer/110483] Several gcc.dg/analyzer/out-of-bounds-diagram-*.c tests FAIL

2023-06-29 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110483

--- Comment #1 from David Malcolm  ---
Thanks for filing this; sorry about the failures.

What's the endianness of the hosts that this is happening on?

Is there a machine in the GCC compile farm that this happens on?

The row of indices is is created here in
string_region_spatial_item::make_table:
if (m_show_full_string)
  {
   for (byte_offset_t byte_idx = bytes.get_start_byte_offset ();
byte_idx < bytes.get_next_byte_offset ();
byte_idx = byte_idx + 1)
 add_column_for_byte (t, btm, sm, byte_idx,
  byte_idx_table_y, byte_val_table_y);
where class string_region_spatial_item has:
  void add_column_for_byte (table , const bit_to_table_map ,
style_manager ,
const byte_offset_t byte_idx,
const int byte_idx_table_y,
const int byte_val_table_y) const
  {
tree string_cst = get_string_cst ();
gcc_assert (byte_idx >= 0);
gcc_assert (byte_idx < TREE_STRING_LENGTH (string_cst));

const byte_range bytes (byte_idx, 1);
if (1) // show_byte_indices
  {
const table::rect_t idx_table_rect
  = btm.get_table_rect (_string_reg, bytes, byte_idx_table_y, 1);
t.set_cell_span (idx_table_rect,
 fmt_styled_string (sm, "[%li]",
byte_idx.ulow ()));
  }

so presumably an issue with:

 fmt_styled_string (sm, "[%li]",
byte_idx.ulow ()));
on those hosts.

Possibly an endianness-handling mistake by me?

Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-29 Thread Thomas Schwinge

Hi!

On 2023-06-29T11:29:57+0200, I wrote:
> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches  
> wrote:
>> We extend the machine mode from 8 to 16 bits already. But there still
>> one placing missing from the streamer. It has one hard coded array
>> for the machine code like size 256.
>>
>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>
>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>> lto_output_init_mode_table will touch the memory out of range unexpected.
>
> Uh.  :-O
>
>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>> array in streamer, to make sure there is no potential unexpected
>> memory access in future. Meanwhile, this patch also adjust some place
>> which has MAX_MACHINE_MODE <= 256 assumption.
>
> Thanks to Jakub and Richard for guidance re the offloading compilation
> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
> and stream-in, and a modes mapping table.
>
> However, with this patch, there are ICEs all over the place...  I'm
> having a look.

Your patch has all the right ideas, there are just a few additional
changes necessary.  Please merge in the attached
"f into Streamer: Fix out of range memory access of machine mode", with
'Co-authored-by: Thomas Schwinge '.  This has
already survived compiler-side 'lto.exp' testing and
'check-target-libgomp' with Nvidia GPU offloading; AMD GPU testing is now
running (not expecting any bad surprises).  Will let you know by (my)
tomorrow morning in case there are any more problems.

Explanation:

>> --- a/gcc/lto-streamer-in.cc
>> +++ b/gcc/lto-streamer-in.cc
>> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>  internal_error ("cannot read LTO mode table from %s",
>>   file_data->file_name);
>>
>> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
>> -  file_data->mode_table = table;
>>const struct lto_simple_header_with_strings *header
>>  = (const struct lto_simple_header_with_strings *) data;
>>int string_offset;
>> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
>> *file_data)
>>   header->string_size, vNULL);
>>bitpack_d bp = streamer_read_bitpack ();
>>
>> +  unsigned mode_bits = bp_unpack_value (, 5);
>> +  unsigned char *table = ggc_cleared_vec_alloc (1 << 
>> mode_bits);
>> +
>> +  file_data->mode_table = table;
>> +  file_data->mode_bits = mode_bits;

Here, we set 'file_data->mode_bits' for the offloading case (where
'lto_input_mode_table' is called) -- but it's not set for the
non-offloading case (where 'lto_input_mode_table' isn't called).  (See my
'gcc/lto/lto-common.cc:lto_read_decls' change.)  That's "not currently a
problem", as 'file_data->mode_bits' isn't used anywhere...

>> --- a/gcc/lto-streamer.h
>> +++ b/gcc/lto-streamer.h
>> @@ -604,6 +604,8 @@ struct GTY(()) lto_file_decl_data
>>int order_base;
>>
>>int unit_base;
>> +
>> +  unsigned mode_bits;
>>  };

>>  inline machine_mode
>>  bp_unpack_machine_mode (struct bitpack_d *bp)
>>  {
>> -  return (machine_mode)
>> -((class lto_input_block *)
>> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +  lto_input_block *input_block = (class lto_input_block *) bp->stream;
>> +  int index = bp_unpack_enum (bp, machine_mode, last);
>> +
>> +  return (machine_mode) input_block->mode_table[index];
>>  }

..., but 'file_data->mode_bits' needs to be considered here, in the
stream-in for offloading, where 'file_data->mode_bits' -- that is, the
host 'MAX_MACHINE_MODE' -- very likely is different from the offload
device 'MAX_MACHINE_MODE'.

Easiest is in 'gcc/lto-streamer.h:class lto_input_block' to capture
'lto_file_decl_data *file_data' instead of just
'unsigned char *mode_table', and adjust all users.

That's it.  :-)

>> --- a/gcc/tree-streamer.h
>> +++ b/gcc/tree-streamer.h

>> @@ -108,15 +108,19 @@ inline void
>>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>>  {
>>streamer_mode_table[mode] = 1;
>> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>> +
>> +  bp_pack_enum (bp, machine_mode, last, mode);
>>  }

That use of 'MAX_MACHINE_MODE' is safe, as that only concerns the
stream-out phase.

>> --- a/gcc/tree-streamer.cc
>> +++ b/gcc/tree-streamer.cc
>> @@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
>> During streaming in, we translate the on the disk mode using this
>> table.  For normal LTO it is set to identity, for ACCEL_COMPILER
>> depending on the mode_table content.  */
>> -unsigned char streamer_mode_table[1 << 8];
>> +unsigned char streamer_mode_table[MAX_MACHINE_MODE];

Likewise.


Grüße
 Thomas

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Marek Polacek via Gcc-patches

On Fri, Jun 30, 2023 at 04:11:44AM +0800, Xi Ruoyao wrote:
> On Fri, 2023-06-30 at 04:08 +0800, Xi Ruoyao wrote:
> > On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> > > These tests fail when the testsuite is executed with -fstack-
> > > protector-strong.
> > > To avoid this, this patch adds -fno-stack-protector to dg-options.
> > > 
> > > Tested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > LGTM, we've noticed these two failures in Linux From Scratch [1].  But
> > this is not an approval because I'm not a maintainer.

Thanks.

> And can we backport them to gcc-13 branch too?  These two tests were
> added in the cycle of GCC 13, so we could consider the failures
> "regression".

Yeah, it would be good for Fedora gcc as well.  I'll put the patch in 13
as well.
 
Marek

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Jakub Jelinek via Gcc-patches

On Fri, Jun 30, 2023 at 04:11:44AM +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2023-06-30 at 04:08 +0800, Xi Ruoyao wrote:
> > On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> > > These tests fail when the testsuite is executed with -fstack-
> > > protector-strong.
> > > To avoid this, this patch adds -fno-stack-protector to dg-options.
> > > 
> > > Tested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > LGTM, we've noticed these two failures in Linux From Scratch [1].  But
> > this is not an approval because I'm not a maintainer.
> 
> And can we backport them to gcc-13 branch too?  These two tests were
> added in the cycle of GCC 13, so we could consider the failures
> "regression".

It is ok even for 13 branch.

Jakub

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Xi Ruoyao via Gcc-patches

On Fri, 2023-06-30 at 04:08 +0800, Xi Ruoyao wrote:
> On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> > These tests fail when the testsuite is executed with -fstack-
> > protector-strong.
> > To avoid this, this patch adds -fno-stack-protector to dg-options.
> > 
> > Tested on x86_64-pc-linux-gnu, ok for trunk?
> 
> LGTM, we've noticed these two failures in Linux From Scratch [1].  But
> this is not an approval because I'm not a maintainer.

And can we backport them to gcc-13 branch too?  These two tests were
added in the cycle of GCC 13, so we could consider the failures
"regression".

> 
> [1]:https://www.linuxfromscratch.org/lfs/view/development/chapter08/gcc.html
> 
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/i386/pr104610.c: Use -fno-stack-protector.
> > * gcc.target/i386/pr69482-1.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr104610.c  | 2 +-
> >  gcc/testsuite/gcc.target/i386/pr69482-1.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c
> > b/gcc/testsuite/gcc.target/i386/pr104610.c
> > index fe39cbe5b8a..5173fc8898c 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr104610.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr104610.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256" } */
> > +/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256 -fno-stack-
> > protector" } */
> >  /* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
> >  /* { dg-final { scan-assembler-times {sete} 1 } } */
> >  /* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > index f192261b104..99bb6ad5a37 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O3" } */
> > +/* { dg-options "-O3 -fno-stack-protector" } */
> >  
> >  static inline void memset_s(void* s, int n) {
> >    volatile unsigned char * p = s;
> > 
> > base-commit: 070a6bf0bdc6761ad77ac97404c98f00a7007d54
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Jakub Jelinek via Gcc-patches

On Thu, Jun 29, 2023 at 04:01:20PM -0400, Marek Polacek via Gcc-patches wrote:
> These tests fail when the testsuite is executed with -fstack-protector-strong.
> To avoid this, this patch adds -fno-stack-protector to dg-options.
> 
> Tested on x86_64-pc-linux-gnu, ok for trunk?
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr104610.c: Use -fno-stack-protector.
>   * gcc.target/i386/pr69482-1.c: Likewise.

Ok, thanks.

Jakub

[Bug c++/110468] [12/13/14 regression] Internal compiler error in nothrow_spec_p

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110468

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Patrick Palka :

https://gcc.gnu.org/g:9479da4515f7d019b4ef282d0e21536431c44f71

commit r14-2199-g9479da4515f7d019b4ef282d0e21536431c44f71
Author: Patrick Palka 
Date:   Thu Jun 29 16:10:18 2023 -0400

c++: NSDMI instantiation during overload resolution [PR110468]

Here we find ourselves instantiating the NSDMI for A<1>::m when
computing argument conversions during overload resolution, and
thus tf_conv is set.  The flag causes mark_used for the constructor
used in the NSDMI to exit early and not instantiate its noexcept-spec,
which eventually leads to an ICE from nothrow_spec_p.

This patch fixes this by clearing any special tsubst flags during
instantiation of an NSDMI, since the result should be independent of
the context that requires the instantiation.

PR c++/110468

gcc/cp/ChangeLog:

* init.cc (maybe_instantiate_nsdmi_init): Mask out all
tsubst flags except for tf_warning_or_error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept79.C: New test.

[Bug c++/110463] [13/14 Regression] Mutable subobject is usable in a constant expression

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110463

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Patrick Palka :

https://gcc.gnu.org/g:fd8a1be04d4cdbfefea457b99ed8404d77b35dd6

commit r14-2198-gfd8a1be04d4cdbfefea457b99ed8404d77b35dd6
Author: Patrick Palka 
Date:   Thu Jun 29 16:02:04 2023 -0400

c++: unpropagated CONSTRUCTOR_MUTABLE_POISON [PR110463]

Here we're incorrectly accepting the mutable member accesses because
cp_fold neglects to propagate CONSTRUCTOR_MUTABLE_POISON when folding a
CONSTRUCTOR.

PR c++/110463

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : Propagate
CONSTRUCTOR_MUTABLE_POISON.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-mutable6.C: New test.

Re: [PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Xi Ruoyao via Gcc-patches

On Thu, 2023-06-29 at 16:01 -0400, Marek Polacek via Gcc-patches wrote:
> These tests fail when the testsuite is executed with -fstack-
> protector-strong.
> To avoid this, this patch adds -fno-stack-protector to dg-options.
> 
> Tested on x86_64-pc-linux-gnu, ok for trunk?

LGTM, we've noticed these two failures in Linux From Scratch [1].  But
this is not an approval because I'm not a maintainer.

[1]:https://www.linuxfromscratch.org/lfs/view/development/chapter08/gcc.html

> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/pr104610.c: Use -fno-stack-protector.
> * gcc.target/i386/pr69482-1.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr104610.c  | 2 +-
>  gcc/testsuite/gcc.target/i386/pr69482-1.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c
> b/gcc/testsuite/gcc.target/i386/pr104610.c
> index fe39cbe5b8a..5173fc8898c 100644
> --- a/gcc/testsuite/gcc.target/i386/pr104610.c
> +++ b/gcc/testsuite/gcc.target/i386/pr104610.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256" } */
> +/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256 -fno-stack-
> protector" } */
>  /* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
>  /* { dg-final { scan-assembler-times {sete} 1 } } */
>  /* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> index f192261b104..99bb6ad5a37 100644
> --- a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3" } */
> +/* { dg-options "-O3 -fno-stack-protector" } */
>  
>  static inline void memset_s(void* s, int n) {
>    volatile unsigned char * p = s;
> 
> base-commit: 070a6bf0bdc6761ad77ac97404c98f00a7007d54

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-06-29 Thread anlauf at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #30 from anlauf at gcc dot gnu.org ---
BTW: you can get a traceback on FP exceptions by adding to the linker options:

 -ffpe-trap=zero,overflow,invalid

[Bug analyzer/94355] analyzer support for C++ new expression

2023-06-29 Thread vultkayn at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94355

--- Comment #15 from Benjamin Priour  ---
(In reply to Jonathan Wakely from comment #14)
> [...snip...]
> 
> See the -fcheck-new option:
> 
>   Check that the pointer returned by "operator new" is non-null before
> attempting to modify the storage allocated.  This check is normally
> unnecessary because the C++ standard specifies that "operator new" only
> returns 0 if it is declared "throw()", in which case the compiler always
> checks the return value even without this option.  In all other cases, when
> "operator new" has a non-empty exception specification, memory exhaustion is
> signalled by throwing "std::bad_alloc".  See also new (nothrow).


Should we use the above flag's value to also optionally disable the analyzer
warnings on operator new possibly returning NULL?

Or maybe there could be an additional flag -fanalyzer-new-returns-null, turned
'on' by default.

Having such capability would be useful to run the analyzer against itself, as
GCC is built without exceptions (thus every operator new possibly returns
NULL).

[PATCH] i386: add -fno-stack-protector to two tests

2023-06-29 Thread Marek Polacek via Gcc-patches

These tests fail when the testsuite is executed with -fstack-protector-strong.
To avoid this, this patch adds -fno-stack-protector to dg-options.

Tested on x86_64-pc-linux-gnu, ok for trunk?

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104610.c: Use -fno-stack-protector.
* gcc.target/i386/pr69482-1.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr104610.c  | 2 +-
 gcc/testsuite/gcc.target/i386/pr69482-1.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c 
b/gcc/testsuite/gcc.target/i386/pr104610.c
index fe39cbe5b8a..5173fc8898c 100644
--- a/gcc/testsuite/gcc.target/i386/pr104610.c
+++ b/gcc/testsuite/gcc.target/i386/pr104610.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256" } */
+/* { dg-options "-O2 -mavx -mmove-max=256 -mstore-max=256 
-fno-stack-protector" } */
 /* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
 /* { dg-final { scan-assembler-times {sete} 1 } } */
 /* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr69482-1.c 
b/gcc/testsuite/gcc.target/i386/pr69482-1.c
index f192261b104..99bb6ad5a37 100644
--- a/gcc/testsuite/gcc.target/i386/pr69482-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr69482-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -fno-stack-protector" } */
 
 static inline void memset_s(void* s, int n) {
   volatile unsigned char * p = s;

base-commit: 070a6bf0bdc6761ad77ac97404c98f00a7007d54
-- 
2.41.0

[Bug middle-end/110489] Slow building virtual.c.i from p11-kit

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #2 from Andrew Pinski  ---
So I took a look at the sources, there are very many small functions.
This might be the reason why dump files Timevar takes a long time, it is called
for each pass and for each function. Maybe that can be improved.

the register allocator and schedule costs I suspect is due to there being a
small setup cost which multiply by many functions add up.

[Bug fortran/86277] Presence of optional arguments not recognized for zero length arrays

2023-06-29 Thread anlauf at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86277

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |13.2
 Status|ASSIGNED|RESOLVED

--- Comment #36 from anlauf at gcc dot gnu.org ---
Fixed on mainline for gcc-14, and backported to 13-branch for 13.2.

The backport required a backport of r14-90 concerning some poor scan patterns.

Closing.

[Bug fortran/100297] FAIL: gfortran.dg/allocatable_function_1.f90 gfortran.dg/reshape_8.f90

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100297

--- Comment #7 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Harald Anlauf
:

https://gcc.gnu.org/g:1a023e688186ea4cd284f5d269f2ecde9f80438c

commit r13-7501-g1a023e688186ea4cd284f5d269f2ecde9f80438c
Author: Harald Anlauf 
Date:   Tue Apr 18 21:24:20 2023 +0200

testsuite: fix scan-tree-dump patterns [PR83904,PR100297]

Adjust scan-tree-dump patterns so that they do not accidentally match a
valid path.

gcc/testsuite/ChangeLog:

PR testsuite/83904
PR fortran/100297
* gfortran.dg/allocatable_function_1.f90: Use "__builtin_free "
instead of the naive "free".
* gfortran.dg/reshape_8.f90: Extend pattern from a simple "data".

(cherry picked from commit 6fc8e25cb6b5d720bedd85194b0ad740d75082f4)

[Bug fortran/86277] Presence of optional arguments not recognized for zero length arrays

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86277

--- Comment #35 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Harald Anlauf
:

https://gcc.gnu.org/g:74ef4221b5ebb1122349ad48422ddc40e98c267d

commit r13-7502-g74ef4221b5ebb1122349ad48422ddc40e98c267d
Author: Harald Anlauf 
Date:   Mon Jun 12 23:08:48 2023 +0200

Fortran: fix passing of zero-sized array arguments to procedures [PR86277]

gcc/fortran/ChangeLog:

PR fortran/86277
* trans-array.cc (gfc_trans_allocate_array_storage): When passing a
zero-sized array with fixed (= non-dynamic) size, allocate
temporary
by the caller, not by the callee.

gcc/testsuite/ChangeLog:

PR fortran/86277
* gfortran.dg/zero_sized_14.f90: New test.
* gfortran.dg/zero_sized_15.f90: New test.

Co-authored-by: Mikael Morin 
(cherry picked from commit c1691509e5a8875f36c068a5ea101bf13f140948)

[Bug testsuite/83904] gfortran.dg allocatable_function_1 failures

2023-06-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83904

--- Comment #4 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Harald Anlauf
:

https://gcc.gnu.org/g:1a023e688186ea4cd284f5d269f2ecde9f80438c

commit r13-7501-g1a023e688186ea4cd284f5d269f2ecde9f80438c
Author: Harald Anlauf 
Date:   Tue Apr 18 21:24:20 2023 +0200

testsuite: fix scan-tree-dump patterns [PR83904,PR100297]

Adjust scan-tree-dump patterns so that they do not accidentally match a
valid path.

gcc/testsuite/ChangeLog:

PR testsuite/83904
PR fortran/100297
* gfortran.dg/allocatable_function_1.f90: Use "__builtin_free "
instead of the naive "free".
* gfortran.dg/reshape_8.f90: Extend pattern from a simple "data".

(cherry picked from commit 6fc8e25cb6b5d720bedd85194b0ad740d75082f4)

[Bug middle-end/110489] Slow building virtual.c.i from p11-kit

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #1 from Andrew Pinski  ---
The only ones that stick out are:
 dump files :   1.07 (  4%)   0.24 (  5%)   1.58 (  5%)
0  (  0%)
 integrated RA  :   1.75 (  7%)   0.11 (  2%)   2.10 (  7%)
  147M ( 24%)
 scheduling 2   :   1.55 (  6%)   0.14 (  3%)   1.35 (  4%)
 2653k (  0%)


Nothing else sticks out really. (but they do add up).

[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-06-29 Thread juergen.reuter at desy dot de via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #29 from Jürgen Reuter  ---
(In reply to anlauf from comment #28)
> Update: recompiling that file with 13-branch fails for me, too.
> Playing with the one-line patch for pr86277 makes no difference, fortunately.
> 
> Compiling the file with gfortran-12 seems to work ok.
> 
> So is this really a 14-only regression, or is 13-branch already suspicious?

We have gcc 13.1 in our CI, everything works fine there. I am still working on
a smaller test, but have very bad connection rn.

[Bug c/110489] New: Slow building virtual.c.i from p11-kit

2023-06-29 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

Bug ID: 110489
   Summary: Slow building virtual.c.i from p11-kit
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: compile-time-hog
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sjames at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55430
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55430=edit
virtual.c.i.xz

I fear this is a degenerate case as it's a largeish generated file (made during
the build process), but it's noticeable enough for me to raise it anyway.

When building p11-kit, I noticed a handful of files took considerably longer to
build. This is with release checking.

The standout seems to be `virtual.c.i`:

```
$ time gcc -c virtual.c.i -O2 -fPIC

real0m12.429s
user0m12.137s
sys 0m0.238s

$ gcc --version
gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -c virtual.c.i -O2 -pipe -fPIC -ftime-report
Time variable   usr   sys  wall
  GGC
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
 1326k (  0%)
 phase parsing  :   1.24 (  5%)   1.01 ( 20%)   2.26 (  7%)
   53M (  9%)
 phase lang. deferred   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   96  (  0%)
 phase opt and generate :  24.16 ( 95%)   4.08 ( 80%)  28.74 ( 93%)
  570M ( 91%)
 phase finalize :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
0  (  0%)
 garbage collection :   0.08 (  0%)   0.00 (  0%)   0.08 (  0%)
0  (  0%)
 dump files :   1.07 (  4%)   0.24 (  5%)   1.58 (  5%)
0  (  0%)
 callgraph construction :   0.14 (  1%)   0.01 (  0%)   0.22 (  1%)
   18M (  3%)
 callgraph optimization :   0.36 (  1%)   0.12 (  2%)   0.50 (  2%)
   13k (  0%)
 callgraph functions expansion  :  20.14 ( 79%)   3.12 ( 61%)  23.71 ( 76%)
  390M ( 62%)
 callgraph ipa passes   :   3.61 ( 14%)   0.84 ( 17%)   4.48 ( 14%)
  132M ( 21%)
 ipa function summary   :   0.14 (  1%)   0.04 (  1%)   0.12 (  0%)
   13M (  2%)
 ipa dead code removal  :   0.09 (  0%)   0.00 (  0%)   0.07 (  0%)
0  (  0%)
 ipa devirtualization   :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
0  (  0%)
 ipa cp :   0.14 (  1%)   0.04 (  1%)   0.11 (  0%)
 6933k (  1%)
 ipa inlining heuristics:   0.14 (  1%)   0.02 (  0%)   0.10 (  0%)
  135k (  0%)
 ipa function splitting :   0.39 (  2%)   0.06 (  1%)   0.34 (  1%)
   40M (  7%)
 ipa comdats:   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
0  (  0%)
 ipa reference  :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
0  (  0%)
 ipa profile:   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
0  (  0%)
 ipa pure const :   0.07 (  0%)   0.02 (  0%)   0.16 (  1%)
 2416  (  0%)
 ipa icf:   0.14 (  1%)   0.00 (  0%)   0.14 (  0%)
 8112  (  0%)
 ipa SRA:   0.10 (  0%)   0.00 (  0%)   0.11 (  0%)
 1116k (  0%)
 ipa free lang data :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
0  (  0%)
 ipa free inline summary:   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)
0  (  0%)
 ipa modref :   0.08 (  0%)   0.00 (  0%)   0.08 (  0%)
 2874k (  0%)
 cfg construction   :   0.02 (  0%)   0.02 (  0%)   0.06 (  0%)
 1323k (  0%)
 cfg cleanup:   0.23 (  1%)   0.06 (  1%)   0.34 (  1%)
 3108k (  0%)
 trivially dead code:   0.07 (  0%)   0.02 (  0%)   0.14 (  0%)
0  (  0%)
 df scan insns  :   0.20 (  1%)   0.03 (  1%)   0.25 (  1%)
  282k (  0%)
 df reaching defs   :   0.21 (  1%)   0.03 (  1%)   0.37 (  1%)
0  (  0%)
 df live regs   :   0.35 (  1%)   0.08 (  2%)   0.31 (  1%)
0  (  0%)
 df live regs   :   0.23 (  1%)   0.01 (  0%)   0.17 (  1%)
0  (  0%)
 df must-initialized regs   :   0.06 (  0%)   0.00 (  0%)   0.03 (  0%)
0  (  0%)
 df use-def / def-use chains:   0.10 (  0%)   0.02 (  0%)   0.17 (  1%)
0  (  0%)
 df reg dead/unused notes   :   0.40 (  2%)   0.02 (  0%)   0.32 (  1%)
 6334k (  1%)
 register information   :   0.14 (  1%)   0.01 (  0%)   0.06 (  0%)
0  (  0%)
 alias analysis :   0.46 (  2%)   0.03 (  1%)   0.31 (  1%)
   11M (  2%)
 alias stmt walking :   0.13 (  1%)   0.04 (

[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-06-29 Thread anlauf at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #28 from anlauf at gcc dot gnu.org ---
Update: recompiling that file with 13-branch fails for me, too.
Playing with the one-line patch for pr86277 makes no difference, fortunately.

Compiling the file with gfortran-12 seems to work ok.

So is this really a 14-only regression, or is 13-branch already suspicious?

Re: types in GIMPLE IR

2023-06-29 Thread Andrew Pinski via Gcc

On Thu, Jun 29, 2023 at 12:10 PM Krister Walfridsson via Gcc
 wrote:
>
> On Thu, 29 Jun 2023, Richard Biener wrote:
>
> > IIRC we have some simplification rules that turn bit operations into
> > arithmetics.  Arithmetic is allowed if it keeps the values inside
> > [-1,0] for signed bools or [0, 1] for unsigned bools.
> >
> >> I have now verified that all cases seems to be just one operation of this
> >> form (where _127 has the value 0 or 1), so it cannot construct values
> >> such as 42. But the wide signed Boolean can have the three different
> >> values 1, 0, and -1, which I still think is at least one too many. :)
> >
> > Yeah, I'd be interested in a testcase that shows this behavior.
>
> I created PR 110487 with one example.
>
>
> >> I'll update my tool to complain if the value is outside the range [-1, 1].
>
> It is likely that all issues I have seen so far are due to PR 110487, so
> I'll keep the current behavior that complains if the value is outside the
> range [-1, 0].

Yes there are many similar to this all over GCC's folding.
In this case checking TYPE_PRECISION as described in the match.pd is
not even the right approach.
The whole TYPE_PRECISION on boolean types is definitely a big can of worms.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102622 is related but
that was signed boolean:1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106053 is another related case.

For this weekend, I am going to audit some of the match patterns for
these issues.

>
> /Krister

[Bug ada/110488] New: Legal deferred constant rejected

2023-06-29 Thread lo50goa3 at duck dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110488

Bug ID: 110488
   Summary: Legal deferred constant rejected
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ada
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lo50goa3 at duck dot com
CC: dkm at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55429
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55429=edit
Minimal reproducer

Xubuntu 23.04 X86_64

GNAT 13 installed with

apt install gnat-13

Compiling the attached with

gnatmake -gnatc b_strings_prob.ads

gives

b_strings_prob.ads:11:04: error: subtype does not statically match deferred
declaration at line 4
gnatmake: "b_strings_prob.ads" compilation error

According to Randy Brukardt, ARG member and ARM editor, this is legal Ada and
should be accepted.

Output of

x86_64-linux-gnu-gcc-13 -v

Using built-in specs.
COLLECT_GCC=x86_64-linux-gnu-gcc-13
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
13-20230320-1ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr
--with-gcc-major-version-only --program-suffix=-13
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet
--with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-13-oz92yP/gcc-13-13-20230320/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-oz92yP/gcc-13-13-20230320/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.1 20230320 (experimental) [master r13-6759-g5194ad1958c]
(Ubuntu 13-20230320-1ubuntu1)

[Bug libstdc++/107852] [12 Regression] Spurious warnings stringop-overflow and array-bounds copying data as bytes into vector

2023-06-29 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107852

--- Comment #18 from Jonathan Wakely  ---
That's not a crash, it's a warning. And it looks like a separate problem, since
it comes from vector::reserve not vector::insert. Please file a new bug for it
instead.

[Bug tree-optimization/110487] [12/13/14 Regression] invalid wide Boolean value

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110487

Andrew Pinski  changed:

   What|Removed |Added

Summary|invalid wide Boolean value  |[12/13/14 Regression]
   ||invalid wide Boolean value
 Target||aarch64-linux-gnu
   Target Milestone|--- |12.4

[Bug tree-optimization/110487] invalid wide Boolean value

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110487

--- Comment #2 from Andrew Pinski  ---
Part of the issue is what does TYPE_PRECISION on BOOLEAN_TYPE really mean. 
there are many more of these issues all over GCC about boolean types assuming
being TYPE_PRECISION == 1 even and such.

[Bug tree-optimization/110487] invalid wide Boolean value

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110487

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Keywords||wrong-code
   Last reconfirmed||2023-06-29

--- Comment #1 from Andrew Pinski  ---
Applying pattern match.pd:4705, gimple-match-10.cc:15878
gimple_simplified to _38 = () _16;
_66 = -_38;
Global Exported: _66 = [irange]  [-1, 0]
Folded into: _66 = -_38;


/* a ? -1 : 0 -> -a.  No need to check the TYPE_PRECISION not being 1
   here as the powerof2cst case above will handle that case correctly.  */
(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1))
 (negate (convert (convert:boolean_type_node @0))


I guess is mine

1 2 3 >

1 - 100 of 292 matches

Mail list logo